Opened 16 years ago

Closed 16 years ago

#55 closed defect (fixed)

log_gatherer.furl can be used too early

Reported by: Brian Warner Owned by:
Priority: major Milestone: 0.3.0
Component: logging Version: 0.2.4
Keywords: Cc:

Description

I observed a traceback in Tahoe as it started up.. it looks like it managed to connect to the log gatherer (running on the same host) extremely quickly, before the Tahoe node had managed to do a Tub.setLocation. As a result, the getLogPortFURL call failed, and the node never gave its logport to the introducer:

  File "/usr/lib/python2.5/site-packages/foolscap/reconnector.py", line 95, in _connected
    cb(rref, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/foolscap/pb.py", line 376, in _log_gatherer_connected
    ignored = self.getLogPortFURL()
  File "/usr/lib/python2.5/site-packages/foolscap/pb.py", line 410, in getLogPortFURL
    furlFile=furlfile)
  File "/usr/lib/python2.5/site-packages/foolscap/pb.py", line 660, in registerReference
    raise RuntimeError("you must setLocation() before "
exceptions.RuntimeError: you must setLocation() before you can registerReference()

One event later, Tub.setLocation was called.

I think this is partly the result of Tahoe's setup sequence: in Node.init it creates the Tub, setServiceParent's it to the node, then does the setOption that triggers the use of log_gatherer.furl . This results in the gatherer code doing Tub.connectTo before Tub.startService is even called.

Once startService happens, the connection occurs quickly. Here are a few lines of the trace:

# 18:59:05 0.7869 [0]: UNUSUAL Tub.connectTo((log-gatherer.furl)) queued until Tub.startService called
# 18:59:06 0.4523 [8]: Node.startService
# 18:59:06 0.4527 [9]: Node._startService
# 18:59:06 0.4645 [12]: TubConnector created from ... to log-gatherer..
# [52]: duplicate connection closed
# [64]: negotiationComplete
# [67]: the exception shown above
# [68]: Tub location set

(some events are missing, this trace was gathered half an hour afterwards using flogtool tail --catch-up).

I need to examine the sequencing and figure out how the connection managed to complete before the setLocation was called. It might be a good idea to stall the gatherer connection until.. what, after setLocation happens? If it ever happens?

Change History (2)

comment:1 Changed 16 years ago by Brian Warner

rob hit a similar problem: same traceback. I'm not sure what his startup code looked like, though. He told me that he used setLocationAutomatically, but waited until its Deferred had fired before calling the setOption that established the log-gatherer furl.

comment:2 Changed 16 years ago by Brian Warner

Milestone: undecided0.3.0
Resolution: fixed
Status: newclosed

Found it. The problem can be fixed by deferring the connection to the log gatherer until after setLocation has been called. I had avoided this because I didn't want to make setLocation obligatory, but the code implicitly enforces this restriction already. So the fix is to make the restriction explicit, and then wait on the log gatherer connectTo call. Fixed by [a715b3012953df8a7c04b35385406f80a3886b7d].

Note: See TracTickets for help on using tickets.