Opened 15 years ago

Closed 12 years ago

#106 closed defect (fixed)

KeyError from self.myReferenceByCLID[clid]

Reported by: Zooko Owned by:
Priority: major Milestone: 0.6.3
Component: unknown Version: 0.3.0
Keywords: KeyError Cc: davidsarah

Description (last modified by Brian Warner)

I don't know if this is normal or is indicative of a problem. This is from running some unit tests in tahoe-lafs.

local#4813 09:57:13.711: deep-check done
local#4814 09:57:13.725: shutting down SystemTest services
local#4815 09:57:13.726: Node.stopService
local#4816 09:57:13.728: Node._really_stopService
local#4817 09:57:13.774: Node.stopService
local#4818 09:57:13.777: Node._really_stopService
local#4819 09:57:13.825: Node.stopService
local#4820 09:57:13.828: Node._really_stopService
local#4821 09:57:13.871: Node.stopService
local#4822 09:57:13.873: Node._really_stopService
local#4823 09:57:13.916: Node.stopService
local#4824 09:57:13.918: Node._really_stopService
local#4825 09:57:13.969: Node.stopService
local#4826 09:57:13.972: Node._really_stopService
local#4827 09:57:14.053: an inbound callRemote that we [bduf] executed (on behalf of someone else, TubID ncie) failed
local#4828 09:57:14.053:  reqID=248, rref=< object at 0x39653b0>, methname=RIBroker.decref
local#4829 09:57:14.053:  args=[]
local#4830 09:57:14.053:  kwargs={'count': 1, 'clid': 29}
local#4831 09:57:14.053:  the LOCAL failure was:
 [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/", line 26, in _turn
     cb(*args, **kwargs)
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/", line 536, in doNextCall
     d.addCallback(lambda res: self._doCall(delivery))
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/Twisted-8.1.0_r25700-py2.5-macosx-10.3-i386.egg/twisted/internet/", line 195, in 
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/pytho
 /twisted/internet/", line 186, in addCallbacks
 --- <exception caught here> ---
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/Twisted-8.1.0_r25700-py2.5-macosx-10.3-i386.egg/twisted/internet/", line 328, in 
     self.result = callback(self.result, *args, **kw)
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/", line 536, in <lambda>
     d.addCallback(lambda res: self._doCall(delivery))
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/", line 571, in _doCall
     return obj.doRemoteCall(delivery.methodname, args, kwargs)
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/", line 59, in doRemoteCall
     res = meth(*args, **kwargs)
   File "/usr/local/stow/python-release25-maint-2008-12-23/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/", line 421, in remote_decref
     tracker = self.myReferenceByCLID[clid]
 exceptions.KeyError: 29

Change History (4)

comment:1 Changed 13 years ago by davidsarah

Keywords: KeyError added

I also saw this in the twistd.log of a storage server running on one of LAE's EC2 instances:

2011-11-20 04:12:11+0000 [Negotiation,1,] Banana.reportReceiveError: an error occured during receive
2011-11-20 04:12:11+0000 [Negotiation,1,] Unhandled Error
        Traceback (most recent call last):
          File "/usr/local/lib/python2.6/dist-packages/Twisted-11.1.0-py2.6-linux-i686.egg/twisted/internet/", line 199, in doRead
            rval = self.protocol.dataReceived(data)
          File "/usr/local/lib/python2.6/dist-packages/Twisted-11.1.0-py2.6-linux-i686.egg/twisted/protocols/", line 413, in dataReceived
          File "/usr/local/lib/python2.6/dist-packages/Twisted-11.1.0-py2.6-linux-i686.egg/twisted/protocols/", line 383, in _flushReceiveBIO
            ProtocolWrapper.dataReceived(self, bytes)
          File "/usr/local/lib/python2.6/dist-packages/Twisted-11.1.0-py2.6-linux-i686.egg/twisted/protocols/", line 118, in dataReceived
        --- <exception caught here> ---
          File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.2-py2.6.egg/foolscap/", line 639, in dataReceived
          File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.2-py2.6.egg/foolscap/", line 993, in handleData
          File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.2-py2.6.egg/foolscap/", line 1050, in handleToken
            top.receiveChild(token, ready_deferred)
          File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.2-py2.6.egg/foolscap/", line 474, in receiveChild
            self.obj =
          File "/usr/local/lib/python2.6/dist-packages/foolscap-0.6.2-py2.6.egg/foolscap/", line 413, in getMyReferenceByCLID
            return self.myReferenceByCLID[clid].obj
        exceptions.KeyError: 1

Unfortunately I don't have much context about what might have caused it, since I only noticed these log entries much later.

comment:2 Changed 13 years ago by davidsarah

Cc: davidsarah added

comment:3 Changed 12 years ago by Brian Warner

Description: modified (diff)

The first exception is pretty normal. When two Tubs in the same process are shut down (such as at the end of a unit test), they drop references, which triggers the cross-Tub GC messages, which may be delivered after the other end has forgotten about the object being decref'ed. It's harmless, although it'd be a good idea for Foolscap to tolerate them silently: fixed in [8d1ca85].

The second one is probably similar. It indicates that a message was sent to a local object which had gone out-of-scope by the time the message was processed. The object in question was the first one allocated (clid=1, since 0 is used for the Broker itself, for remote_decref), which (if this is a storage server) is probably the RIStorageServer instance (whatever object was the target of the first inbound getReference() call).

That sort of problem *ought* to be handled by returning an error message to the remote caller, not with log.err(). The requestID is the very first token in the CALL sequence, so we know who to send the error to. I think it's a bug/oversight in which allows that error to escape, and worth fixing.

comment:4 Changed 12 years ago by Brian Warner

Milestone: undecided0.6.3
Resolution: fixed
Status: newclosed

Ok, the second problem is fixed in [7616c61].

Note: See TracTickets for help on using tickets.