| 1 | <html xmlns="http://www.w3.org/1999/xhtml"> |
|---|
| 2 | <head> |
|---|
| 3 | <title>Foolscap Failure Reporting</title> |
|---|
| 4 | <style src="stylesheet-unprocessed.css"></style> |
|---|
| 5 | </head> |
|---|
| 6 | |
|---|
| 7 | <body> |
|---|
| 8 | <h1>Foolscap Failure Reporting</h1> |
|---|
| 9 | |
|---|
| 10 | <h2>Signalling Remote Exceptions</h2> |
|---|
| 11 | |
|---|
| 12 | <p>The <code>remote_</code> -prefixed methods which Foolscap invokes, just |
|---|
| 13 | like their local counterparts, can either return a value or raise an |
|---|
| 14 | exception. Foolscap callers can use the normal Twisted conventions for |
|---|
| 15 | handling asyncronous failures: <code>callRemote</code> returns a Deferred |
|---|
| 16 | object, which will eventually either fire its callback function (if the |
|---|
| 17 | remote method returned a normal value), or its errback function (if the |
|---|
| 18 | remote method raised an exception).</p> |
|---|
| 19 | |
|---|
| 20 | <p>There are several other reasons that the Deferred returned |
|---|
| 21 | by <code>callRemote</code> might fire its errback:</p> |
|---|
| 22 | |
|---|
| 23 | <ul> |
|---|
| 24 | <li>local outbound schema violation: the outbound method arguments did not |
|---|
| 25 | match the <code>RemoteInterface</code> that is in force. This is an |
|---|
| 26 | optional form of typechecking for remote calls, and is activated when |
|---|
| 27 | the remote object describes itself as conforming to a named |
|---|
| 28 | <code>RemoteInterface</code> which is also declared in a local class. |
|---|
| 29 | The local constraints are checked before the message is transmitted over |
|---|
| 30 | the wire. A constraint violation is indicated by |
|---|
| 31 | raising <code>foolscap.schema.Violation</code>, which is delivered |
|---|
| 32 | through the Deferred's errback.</li> |
|---|
| 33 | <li>network partition: if the underlying TCP connection is lost before the |
|---|
| 34 | response has been received, the Deferred will errback with |
|---|
| 35 | a <code>foolscap.ipb.DeadReferenceError</code> exception. Several things |
|---|
| 36 | can cause this: the remote process shutting down (intentionally or |
|---|
| 37 | otherwise), a network partition or timeout, or the local process |
|---|
| 38 | shutting down (<code>Tub.stopService</code> will terminate all |
|---|
| 39 | outstanding remote messages before shutdown).</li> |
|---|
| 40 | <li>remote inbound schema violation: as the serialized method arguments were |
|---|
| 41 | unpacked by the remote process, one of them violated that processes |
|---|
| 42 | inbound <code>RemoteInterface</code>. This check serves to protect each |
|---|
| 43 | process from incorrect types which might either confuse the subsequent |
|---|
| 44 | code or consume a lot of memory. These constraints are enforced as the |
|---|
| 45 | tokens are read off the wire, and are signalled with the |
|---|
| 46 | same <code>Violation</code> exception as above.</li> |
|---|
| 47 | <li>remote method exception: if the <code>remote_</code> method raises an |
|---|
| 48 | exception, or returns a Deferred which subsequently fires its errback, |
|---|
| 49 | the caller will see an errback which attempts to replicate the remote |
|---|
| 50 | exception. This errback will receive a <code>CopiedFailure</code> |
|---|
| 51 | instance, described below.</li> |
|---|
| 52 | <li>remote outbound schema violation: as the remote method's return value is |
|---|
| 53 | serialized and put on the wire, the values are compared against the |
|---|
| 54 | return-value constraint (if a <code>RemoteInterface</code> is in |
|---|
| 55 | effect). If it does not match the constraint, a Violation will be |
|---|
| 56 | raised.</li> |
|---|
| 57 | <li>local inbound schema violation: when the serialized return value arrives |
|---|
| 58 | on the original caller's side of the wire, the return-value constraint |
|---|
| 59 | of any effective <code>RemoteInterface</code> will be applied. This |
|---|
| 60 | protects the caller's response code from unexpected values. Any |
|---|
| 61 | mismatches will be signalled with a Violation exception.</li> |
|---|
| 62 | </ul> |
|---|
| 63 | |
|---|
| 64 | |
|---|
| 65 | <h2>CopiedFailures</h2> |
|---|
| 66 | |
|---|
| 67 | <p>Twisted uses the <code>twisted.python.failure.Failure</code> class to |
|---|
| 68 | encapsulate Python exceptions in an instance which can be passed around, |
|---|
| 69 | tested, and examined in an asynchronous fashion. It does this by copying much |
|---|
| 70 | of the information out of the original exception context (including a stack |
|---|
| 71 | trace and the exception instance itself) into the <code>Failure</code> |
|---|
| 72 | instance. When an exception is raised during a Deferred callback function, it |
|---|
| 73 | is converted into a Failure instance and passed to the next errback handler |
|---|
| 74 | in the chain.</p> |
|---|
| 75 | |
|---|
| 76 | <p><code>RemoteReference.callRemote</code> uses the same convention: any |
|---|
| 77 | exceptions that occur during the remote method call are delivered to the |
|---|
| 78 | errback handler. However, several exceptions can occur on the remote process, |
|---|
| 79 | and Failure objects contain references to local state which cannot be |
|---|
| 80 | precisely replicated on a different system (stack frames and exception |
|---|
| 81 | classes). So, when an exception happens on the remote side of |
|---|
| 82 | a <code>callRemote</code> invocation, the errback handler will receive |
|---|
| 83 | a <code>CopiedFailure</code> instance instead.</p> |
|---|
| 84 | |
|---|
| 85 | <p><code>CopiedFailure</code> is designed to behave very much like a |
|---|
| 86 | regular <code>Failure</code> object. The <code>check</code> |
|---|
| 87 | and <code>trap</code> methods work on <code>CopiedFailure</code>s just like |
|---|
| 88 | they do on <code>Failure</code>s</p> |
|---|
| 89 | |
|---|
| 90 | <p>However, all of the Failure's attributes must be converted into strings |
|---|
| 91 | for serialization. As a result, the original <code>.value</code> attribute |
|---|
| 92 | (which contains the exception instance, which might contain additional |
|---|
| 93 | information about the problem) is replaced by a stringified representation. |
|---|
| 94 | The frames of the original stack trace are also replaced with a string, so |
|---|
| 95 | they can be printed but not examined. The exception class is also passed as a |
|---|
| 96 | string (using Twisted's <code>reflect.qual</code> fully-qualified-name |
|---|
| 97 | utility), but <code>check</code> and <code>trap</code> both compare by string |
|---|
| 98 | name instead of object equality, so most applications won't notice the |
|---|
| 99 | difference.</p> |
|---|
| 100 | |
|---|
| 101 | <p>The default behavior of CopiedFailure is to include a string copy of the |
|---|
| 102 | stack trace, generated with <code>printTraceback()</code>, which will include |
|---|
| 103 | lines of source code when available. To reduce the amount of information sent |
|---|
| 104 | over the wire, stack trace strings larger than about 2000 bytes are truncated |
|---|
| 105 | in a fashion that tries to preserve the top and bottom of the stack.</p> |
|---|
| 106 | |
|---|
| 107 | <h3>unsafeTracebacks</h3> |
|---|
| 108 | |
|---|
| 109 | <p>Applications which consider their lines of source code or their |
|---|
| 110 | exceptions' list of (filename, line number) tuples to be sensitive |
|---|
| 111 | information can set the "unsafeTracebacks" flag in their Tub to False; the |
|---|
| 112 | server will then remove stack information from the CopiedFailure objects it |
|---|
| 113 | sends to other systems.</p> |
|---|
| 114 | |
|---|
| 115 | <pre class="python"> |
|---|
| 116 | t = Tub() |
|---|
| 117 | t.unsafeTracebacks = False |
|---|
| 118 | </pre> |
|---|
| 119 | |
|---|
| 120 | <p>When unsafeTracebacks is False, the <code>CopiedFailure</code> will only |
|---|
| 121 | contain the stringified exception type, value, and parent class names.</p> |
|---|
| 122 | |
|---|
| 123 | <h2>Distinguishing Remote Exceptions</h2> |
|---|
| 124 | |
|---|
| 125 | <p>The original caller can tell the difference between exceptions that |
|---|
| 126 | occurred locally and ones that occurred on the remote end. The most common |
|---|
| 127 | use for this is to re-raise exceptions that resulted from programming errors |
|---|
| 128 | in the local code, while cleanly handling or ignoring errors that were caused |
|---|
| 129 | by the code at the remote end. The general idea is that remote code may be |
|---|
| 130 | maliciously trying to confuse or subvert your program's control flow by |
|---|
| 131 | returning unexpected exceptions, but that exceptions which occur locally (and |
|---|
| 132 | are not otherwise caught and handled) are probably bugs which need to be made |
|---|
| 133 | visible. The philosophy of how to best handle errors is beyond the scope of |
|---|
| 134 | this document, but Foolscap tries to provide the tools to allow programmers |
|---|
| 135 | to implement whatever approach they choose.</p> |
|---|
| 136 | |
|---|
| 137 | <p>It is useful to distinguish a remote exception from a local one, |
|---|
| 138 | especially when the code involves multiple processing steps (some local, some |
|---|
| 139 | remote). For example, the following snippet performs a local processing step, |
|---|
| 140 | then asks a remote server for information, then adds that information into a |
|---|
| 141 | local database. All three steps are asynchronous.</p> |
|---|
| 142 | |
|---|
| 143 | <pre class="python"> |
|---|
| 144 | def get_and_store_record(name): |
|---|
| 145 | d = local_db.getIDNumber(name) |
|---|
| 146 | d.addCallback(lambda idnum: rref.callRemote("get_record", idnum)) |
|---|
| 147 | d.addCallback(lambda record: local_db.storeRecord(name)) |
|---|
| 148 | return d |
|---|
| 149 | </pre> |
|---|
| 150 | |
|---|
| 151 | <p>The caller of <code>get_and_store_record</code> might like to distinguish |
|---|
| 152 | between a problem that occurred in <code>getIDNumber</code> from one that |
|---|
| 153 | occurred during the remote call to <code>remote_get_record</code>.</p> |
|---|
| 154 | |
|---|
| 155 | <p>For each Foolscap event that can raise a remote exception described above |
|---|
| 156 | (i.e. remote inbound schema Violation, remote method exception, remote |
|---|
| 157 | outbound schema Violation), the original caller will receive |
|---|
| 158 | a <code>CopiedFailure</code> instance. For Foolscap events that raise |
|---|
| 159 | exceptions locally (local outbound schema Violation, local inbound schema |
|---|
| 160 | Violation), the caller will receive a regular <code>Failure</code> instance. |
|---|
| 161 | Any non-Foolscap exception events (i.e. the <code>getIDNumber</code> |
|---|
| 162 | and <code>storeRecord</code> calls in the example above) will also get |
|---|
| 163 | a <code>CopiedFailure</code>.</p> |
|---|
| 164 | |
|---|
| 165 | <p>Application code should use <code>foolscap.ipb.failure_is_remote()</code> |
|---|
| 166 | to distinguish between local and remote failures. This returns True |
|---|
| 167 | for <code>CopiedFailure</code> instances and False for |
|---|
| 168 | regular <code>Failure</code>s. A future version of Foolscap may change the |
|---|
| 169 | way <code>CopiedFailure</code> is used (ideally Failure and CopiedFailure |
|---|
| 170 | should be the same class), but <code>failure_is_remote</code> will continue |
|---|
| 171 | to work correctly.</p> |
|---|
| 172 | |
|---|
| 173 | <pre class="python"> |
|---|
| 174 | d = get_and_store_record("bob") |
|---|
| 175 | def handle_remote_exception(f): |
|---|
| 176 | if not failure_is_remote(f): |
|---|
| 177 | return f |
|---|
| 178 | print "Remote caller failed:", f |
|---|
| 179 | print "no record stored" |
|---|
| 180 | return None |
|---|
| 181 | d.addErrback(handle_remote_exception) |
|---|
| 182 | </pre> |
|---|
| 183 | |
|---|