Opened 12 years ago

Last modified 12 years ago

#186 new defect

Remote tracebacks often elide important lines

Reported by: davidsarah Owned by:
Priority: major Milestone: undecided
Component: error-handling Version: 0.6.1
Keywords: traceback elided Cc:

Description

Remote exception tracebacks often have -- TRACEBACK ELIDED -- replacing the lines that would be most useful for debugging. I think this is a consequence of the default stringification of Twisted Failures. In foolscap/tokens.py@b9bc4a4#L91, we have

    def __str__(self):
        return "<RemoteException around '%s'>" % str(self.failure)

which should probably instead be using the traceback module to get a more complete traceback from self.failure.value.

Change History (4)

comment:1 Changed 12 years ago by davidsarah

Actually it's more complicated than the description says, the case I'm interested in occurs when the RemoteException is around a CopiedFailure. I think that whatever sets the 'traceback' element of CopiedFailure's state is at fault; still investigating.

Here's an example of a mutilated error report (edited for long lines):

<class 'allmydata.mutable.common.NotEnoughServersError'>:
('Publish ran out of good servers, last failure was:
[Failure instance: Traceback (failure with no frames):
<class \'foolscap.tokens.RemoteException\'>: <@RemoteException around
\'[@CopiedFailure instance: @Traceback from remote host --
Traceback (most recent call last):\n
 File "/usr/local/lib/python2.6/dist-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/tcp.py", line 277, in connectionLost\n protocol.connectionLost(reason)\n
 File "/usr/local/lib/python2.6/dist-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/web/client.py", line 191, in connectionLost\n
 self.factory._disconnectedDeferred.callback(None)\n
 File "/usr/local/lib/python2.6/dist-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 368, in callback\n
 self._startRunCallbacks(result)\n
 File "/usr/local/lib/python2.6/dist-packages/Twisted-12.0.0-py2.6-linux-i686.egg/twisted/internet/defer.py", line 464, in _startRunC\n
\n-- TRACEBACK ELIDED --\n\n7 6f 61 76 6a 7a 6b 76 66 2f</StringToSignBytes>
[AWS credentials redacted by David-Sarah]</Error>\')\n]\'>\n]', None)

(The '@' characters are where I edited the local foolscap to identify which parts were coming from that code.)

comment:2 Changed 12 years ago by davidsarah

Oh, it's just this code in FailureSlicer.getStateToCopy:

        if len(state['traceback']) > 1900:
            state['traceback'] = (state['traceback'][:700] +
                                  "\n\n-- TRACEBACK ELIDED --\n\n"
                                  + state['traceback'][-1200:])

I commented that out changed that to:

        if len(state['traceback']) > 10000:
            state['traceback'] = (state['traceback'][:5000] +
                                  "\n\n-- TRACEBACK ELIDED --\n\n"
                                  + state['traceback'][-5000:])

on the remote foolscap, and changed the constraint to

('traceback', ByteStringConstraint(10100))

on both the remote and local, and got a traceback that was sufficiently unmutilated ;-)

I think 1900 bytes is way too short.

Last edited 12 years ago by davidsarah (previous) (diff)

comment:3 Changed 12 years ago by Brian Warner

Seems reasonable. The 2k limitation was 1: to limit DoS sorts of attacks (and in #127 I decided to give up on resource-consumption-attack defenses), and 2: to keep the important stuff visible (to the eventual log-reading human trying to debug the problem) despite tracebacks that include overly-deep stacks (recursion gone wrong and running up against sys.setrecursionlimit. I also remember having problems with printable forms that include large (usually binary) strings, like tahoe shares or blocks, where the actual contents weren't interesting but were obscuring the useful metadata.

I was expecting the traceback to only contain filenames, line numbers, and code snippets. The value of the exception might contain larger strings, as created by a __repr__ on the Exception class. Perhaps there's a bug/misdesign such that the value is being delivered as part of the traceback instead of as a separate attribute? If we fixed that, and delivered repr(f.value) in its own separately-size-limited field, then would a 1900-byte stack trace be sufficient?

comment:4 in reply to:  3 Changed 12 years ago by davidsarah

Replying to warner:

I was expecting the traceback to only contain filenames, line numbers, and code snippets. The value of the exception might contain larger strings, as created by a __repr__ on the Exception class.

In the specific case that prompted me to file this ticket, the __repr__ was a few thousand bytes/characters, and I needed both the elided stack trace and the elided part of the __repr__. I've also run into this a couple of times before, and my memory of those cases isn't sufficient to recall whether the exception __repr__ was unusually large.

A conversation about one of them was pasted at http://codepad.org/eXchc6Xz:

<davidsarah> josipl: unfortunately, the -- TRACEBACK ELIDED -- in the original traceback
(http://gist.github.com/411241) would have been the interesting bit
<davidsarah> that's what we're trying to get at

Looking at that gist, the interesting part of the stack trace was a slightly deeper than usual but not especially so, and the Exception __repr__ was small.

Perhaps there's a bug/misdesign such that the value is being delivered as part of the traceback instead of as a separate attribute? If we fixed that, and delivered repr(f.value) in its own separately-size-limited field, then would a 1900-byte stack trace be sufficient?

Judging by the second example above, no.

Note: See TracTickets for help on using tickets.