1 | <html xmlns="http://www.w3.org/1999/xhtml"> |
---|
2 | <head> |
---|
3 | <title>Foolscap Failure Reporting</title> |
---|
4 | <style src="stylesheet-unprocessed.css"></style> |
---|
5 | </head> |
---|
6 | |
---|
7 | <body> |
---|
8 | <h1>Foolscap Failure Reporting</h1> |
---|
9 | |
---|
10 | <h2>Signalling Remote Exceptions</h2> |
---|
11 | |
---|
12 | <p>The <code>remote_</code> -prefixed methods which Foolscap invokes, just |
---|
13 | like their local counterparts, can either return a value or raise an |
---|
14 | exception. Foolscap callers can use the normal Twisted conventions for |
---|
15 | handling asyncronous failures: <code>callRemote</code> returns a Deferred |
---|
16 | object, which will eventually either fire its callback function (if the |
---|
17 | remote method returned a normal value), or its errback function (if the |
---|
18 | remote method raised an exception).</p> |
---|
19 | |
---|
20 | <p>There are several other reasons that the Deferred returned |
---|
21 | by <code>callRemote</code> might fire its errback:</p> |
---|
22 | |
---|
23 | <ul> |
---|
24 | <li>local outbound schema violation: the outbound method arguments did not |
---|
25 | match the <code>RemoteInterface</code> that is in force. This is an |
---|
26 | optional form of typechecking for remote calls, and is activated when |
---|
27 | the remote object describes itself as conforming to a named |
---|
28 | <code>RemoteInterface</code> which is also declared in a local class. |
---|
29 | The local constraints are checked before the message is transmitted over |
---|
30 | the wire. A constraint violation is indicated by |
---|
31 | raising <code>foolscap.schema.Violation</code>, which is delivered |
---|
32 | through the Deferred's errback.</li> |
---|
33 | <li>network partition: if the underlying TCP connection is lost before the |
---|
34 | response has been received, the Deferred will errback with |
---|
35 | a <code>foolscap.ipb.DeadReferenceError</code> exception. Several things |
---|
36 | can cause this: the remote process shutting down (intentionally or |
---|
37 | otherwise), a network partition or timeout, or the local process |
---|
38 | shutting down (<code>Tub.stopService</code> will terminate all |
---|
39 | outstanding remote messages before shutdown).</li> |
---|
40 | <li>remote inbound schema violation: as the serialized method arguments were |
---|
41 | unpacked by the remote process, one of them violated that processes |
---|
42 | inbound <code>RemoteInterface</code>. This check serves to protect each |
---|
43 | process from incorrect types which might either confuse the subsequent |
---|
44 | code or consume a lot of memory. These constraints are enforced as the |
---|
45 | tokens are read off the wire, and are signalled with the |
---|
46 | same <code>Violation</code> exception as above.</li> |
---|
47 | <li>remote method exception: if the <code>remote_</code> method raises an |
---|
48 | exception, or returns a Deferred which subsequently fires its errback, |
---|
49 | the caller will see an errback which attempts to replicate the remote |
---|
50 | exception. This errback will receive a <code>CopiedFailure</code> |
---|
51 | instance, described below.</li> |
---|
52 | <li>remote outbound schema violation: as the remote method's return value is |
---|
53 | serialized and put on the wire, the values are compared against the |
---|
54 | return-value constraint (if a <code>RemoteInterface</code> is in |
---|
55 | effect). If it does not match the constraint, a Violation will be |
---|
56 | raised.</li> |
---|
57 | <li>local inbound schema violation: when the serialized return value arrives |
---|
58 | on the original caller's side of the wire, the return-value constraint |
---|
59 | of any effective <code>RemoteInterface</code> will be applied. This |
---|
60 | protects the caller's response code from unexpected values. Any |
---|
61 | mismatches will be signalled with a Violation exception.</li> |
---|
62 | </ul> |
---|
63 | |
---|
64 | |
---|
65 | <h2>CopiedFailures</h2> |
---|
66 | |
---|
67 | <p>Twisted uses the <code>twisted.python.failure.Failure</code> class to |
---|
68 | encapsulate Python exceptions in an instance which can be passed around, |
---|
69 | tested, and examined in an asynchronous fashion. It does this by copying much |
---|
70 | of the information out of the original exception context (including a stack |
---|
71 | trace and the exception instance itself) into the <code>Failure</code> |
---|
72 | instance. When an exception is raised during a Deferred callback function, it |
---|
73 | is converted into a Failure instance and passed to the next errback handler |
---|
74 | in the chain.</p> |
---|
75 | |
---|
76 | <p><code>RemoteReference.callRemote</code> uses the same convention: any |
---|
77 | exceptions that occur during the remote method call are delivered to the |
---|
78 | errback handler. However, several exceptions can occur on the remote process, |
---|
79 | and Failure objects contain references to local state which cannot be |
---|
80 | precisely replicated on a different system (stack frames and exception |
---|
81 | classes). So, when an exception happens on the remote side of |
---|
82 | a <code>callRemote</code> invocation, the errback handler will receive |
---|
83 | a <code>CopiedFailure</code> instance instead.</p> |
---|
84 | |
---|
85 | <p><code>CopiedFailure</code> is designed to behave very much like a |
---|
86 | regular <code>Failure</code> object. The <code>check</code> |
---|
87 | and <code>trap</code> methods work on <code>CopiedFailure</code>s just like |
---|
88 | they do on <code>Failure</code>s</p> |
---|
89 | |
---|
90 | <p>However, all of the Failure's attributes must be converted into strings |
---|
91 | for serialization. As a result, the original <code>.value</code> attribute |
---|
92 | (which contains the exception instance, which might contain additional |
---|
93 | information about the problem) is replaced by a stringified representation. |
---|
94 | The frames of the original stack trace are also replaced with a string, so |
---|
95 | they can be printed but not examined. The exception class is also passed as a |
---|
96 | string (using Twisted's <code>reflect.qual</code> fully-qualified-name |
---|
97 | utility), but <code>check</code> and <code>trap</code> both compare by string |
---|
98 | name instead of object equality, so most applications won't notice the |
---|
99 | difference.</p> |
---|
100 | |
---|
101 | <p>The default behavior of CopiedFailure is to include a string copy of the |
---|
102 | stack trace, generated with <code>printTraceback()</code>, which will include |
---|
103 | lines of source code when available. To reduce the amount of information sent |
---|
104 | over the wire, stack trace strings larger than about 2000 bytes are truncated |
---|
105 | in a fashion that tries to preserve the top and bottom of the stack.</p> |
---|
106 | |
---|
107 | <h3>unsafeTracebacks</h3> |
---|
108 | |
---|
109 | <p>Applications which consider their lines of source code or their |
---|
110 | exceptions' list of (filename, line number) tuples to be sensitive |
---|
111 | information can set the "unsafeTracebacks" flag in their Tub to False; the |
---|
112 | server will then remove stack information from the CopiedFailure objects it |
---|
113 | sends to other systems.</p> |
---|
114 | |
---|
115 | <pre class="python"> |
---|
116 | t = Tub() |
---|
117 | t.unsafeTracebacks = False |
---|
118 | </pre> |
---|
119 | |
---|
120 | <p>When unsafeTracebacks is False, the <code>CopiedFailure</code> will only |
---|
121 | contain the stringified exception type, value, and parent class names.</p> |
---|
122 | |
---|
123 | <h2>Distinguishing Remote Exceptions</h2> |
---|
124 | |
---|
125 | <p>The original caller can tell the difference between exceptions that |
---|
126 | occurred locally and ones that occurred on the remote end. The most common |
---|
127 | use for this is to re-raise exceptions that resulted from programming errors |
---|
128 | in the local code, while cleanly handling or ignoring errors that were caused |
---|
129 | by the code at the remote end. The general idea is that remote code may be |
---|
130 | maliciously trying to confuse or subvert your program's control flow by |
---|
131 | returning unexpected exceptions, but that exceptions which occur locally (and |
---|
132 | are not otherwise caught and handled) are probably bugs which need to be made |
---|
133 | visible. The philosophy of how to best handle errors is beyond the scope of |
---|
134 | this document, but Foolscap tries to provide the tools to allow programmers |
---|
135 | to implement whatever approach they choose.</p> |
---|
136 | |
---|
137 | <p>It is useful to distinguish a remote exception from a local one, |
---|
138 | especially when the code involves multiple processing steps (some local, some |
---|
139 | remote). For example, the following snippet performs a local processing step, |
---|
140 | then asks a remote server for information, then adds that information into a |
---|
141 | local database. All three steps are asynchronous.</p> |
---|
142 | |
---|
143 | <pre class="python"> |
---|
144 | def get_and_store_record(name): |
---|
145 | d = local_db.getIDNumber(name) |
---|
146 | d.addCallback(lambda idnum: rref.callRemote("get_record", idnum)) |
---|
147 | d.addCallback(lambda record: local_db.storeRecord(name)) |
---|
148 | return d |
---|
149 | </pre> |
---|
150 | |
---|
151 | <p>The caller of <code>get_and_store_record</code> might like to distinguish |
---|
152 | between a problem that occurred in <code>getIDNumber</code> from one that |
---|
153 | occurred during the remote call to <code>remote_get_record</code>.</p> |
---|
154 | |
---|
155 | <p>For each Foolscap event that can raise a remote exception described above |
---|
156 | (i.e. remote inbound schema Violation, remote method exception, remote |
---|
157 | outbound schema Violation), the original caller will receive |
---|
158 | a <code>CopiedFailure</code> instance. For Foolscap events that raise |
---|
159 | exceptions locally (local outbound schema Violation, local inbound schema |
---|
160 | Violation), the caller will receive a regular <code>Failure</code> instance. |
---|
161 | Any non-Foolscap exception events (i.e. the <code>getIDNumber</code> |
---|
162 | and <code>storeRecord</code> calls in the example above) will also get |
---|
163 | a <code>CopiedFailure</code>.</p> |
---|
164 | |
---|
165 | <p>Application code should use <code>foolscap.ipb.failure_is_remote()</code> |
---|
166 | to distinguish between local and remote failures. This returns True |
---|
167 | for <code>CopiedFailure</code> instances and False for |
---|
168 | regular <code>Failure</code>s. A future version of Foolscap may change the |
---|
169 | way <code>CopiedFailure</code> is used (ideally Failure and CopiedFailure |
---|
170 | should be the same class), but <code>failure_is_remote</code> will continue |
---|
171 | to work correctly.</p> |
---|
172 | |
---|
173 | <pre class="python"> |
---|
174 | d = get_and_store_record("bob") |
---|
175 | def handle_remote_exception(f): |
---|
176 | if not failure_is_remote(f): |
---|
177 | return f |
---|
178 | print "Remote caller failed:", f |
---|
179 | print "no record stored" |
---|
180 | return None |
---|
181 | d.addErrback(handle_remote_exception) |
---|
182 | </pre> |
---|
183 | |
---|