Opened 18 years ago
Last modified 16 years ago
#101 new defect
traceback from remote host doesn't come with identifying information about which remote host
| Reported by: | Zooko | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | undecided |
| Component: | error-handling | Version: | 0.3.0 |
| Keywords: | Cc: | Zooko, zooko@… |
Description
ax5avvay#1693941923 21:24:12.428: error during query: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/foolscap/eventual.py", line 26, in _turn
cb(*args, **kwargs)
File "/usr/lib/python2.5/site-packages/foolscap/broker.py", line 536, in doNextCall
d.addCallback(lambda res: self._doCall(delivery))
File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 191, in addCallback
callbackKeywords=kw)
File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 182, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 317, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/usr/lib/python2.5/site-packages/foolscap/broker.py", line 536, in <lambda>
d.addCallback(lambda res: self._doCall(delivery))
File "/usr/lib/python2.5/site-packages/foolscap/broker.py", line 571, in _doCall
return obj.doRemoteCall(delivery.methodname, args, kwargs)
File "/usr/lib/python2.5/site-packages/foolscap/referenceable.py", line 59, in doRemoteCall
res = meth(*args, **kwargs)
File "/usr/lib/python2.5/site-packages/allmydata/storage.py", line 1202, in remote_slot_readv
msf = MutableShareFile(filename, self)
File "/usr/lib/python2.5/site-packages/allmydata/storage.py", line 383, in __init__
data = f.read(self.HEADER_SIZE)
exceptions.IOError: [Errno 5] Input/output error
] [Errno 5] Input/output error
I would really like to know which storage server is having I/O errors. Too bad this stack trace doesn't tell me. Any identifying information would work -- tubid or IP address would be fine.
Change History (2)
comment:1 Changed 18 years ago by
comment:2 Changed 16 years ago by
| Component: | logging → error-handling |
|---|
Note: See
TracTickets for help on using
tickets.

So, for background, the following sequence of events occurred:
callRemoteto some node BobaddErrbackhandler, which usesfoolscap.logging.log.msgto record the error, with a bit of code that looks something like:def _query_failed(self, f, peerid): ... self.log(format="error during query: %(f_value)s", f_value=str(f.value), failure=f, level=level, umid="IHXuQg")My usual practice is to throw enough information into the log.msg() arguments that we can figure out the context of the message later on, but obviously I failed to do that here. The sort of thing that I would add would either be the peerid that we sent the query to, or a parent= which referenced an earlier message which had that same information (and which would mention the storageindex we were querying, etc). Using parent= would mean less extra context that needed to be passed to the errback handler, but also a higher risk that the parent message might be pruned (if it were at a lower severity level). So something like this:
def _query_failed(self, f, peerid): ... self.log(format="error during query to %(peerid)s: %(f_value)s", peerid=idlib.nodeid_b2a(peerid), f_value=str(f.value), failure=f, level=level, umid="IHXuQg")But clearly, a more general solution would be to see if we could get this information embedded in the
CopiedFailure. Now, since processes can have multiple Tubs, it wouldn't be appropriate to put this in the original Failure, but the act of serializing the Failure involves a specific Tub, so that first serialization would be a good place to stuff in the tubid. Failures are serialized into bits, those bits are deserialized intoCopiedFailures. TheCopiedFailurecan be shipped around after the fact (CopiedFailureto bits to a newCopiedFailure), so it's just the initial serialization from Failure to bits where this should be performed.Also, we need a way to get the nodeid into the str() or repr() of the
CopiedFailure. At the moment we're just subclassing Failure, and not overriding the stringification code at all, so that would need to be changed.