Opened 17 years ago

Closed 17 years ago

#2 closed defect (fixed)

failure in gift establishment is not handled correctly

Reported by: Brian Warner Owned by: Brian Warner
Priority: major Milestone: 0.1.5
Component: introduction Version: 0.1.4
Keywords: Cc:

Description

Alice has a connection to Bob and Carol. Alice sends a message to Bob containing a reference to Carol. Bob's tub initiates a connection to Carol and waits for it to complete before the message can be delivered to Bob. If this Bob-to-Carol connection fails, or times out, Alice's message is supposed to fail with an error.

robk discovered that there is at least one of the following two problems in 0.1.4:

  1. establishing a connection to a new Tub, when the network is silently dropping all packets to that Tub, does not cause a connection establishment timeout. getReference() never returns.
  2. or, if the getReference() performed inside the introduction handler signals an errback, that error is not reported back to the original sender

He had a FURL pointing at a host+port target which was supposed to be port-forwarded to a Tub, but it turns out that the firewall was not configured as expected and the packets were instead being silently discarded. The Alice-to-Bob message never finished (neither with success nor failure), even after about 10 minutes of waiting.

The intention is that there should be a timer that starts the moment getReference() is called, and if the reference has not been established or abandoned by the time it expires, the getReference should errback.

Another intention is that any errors during the gift-acquisition phase should be reported in the same way as errors during deserialization and during delivery of the message: all should get reported as an errback back to the original caller.

As a side note, the sort of logging that would have helped track this down faster would be to make it easy to see messages being sent, received, acknowledged, and retired. Bonus points for being able to see the set of messages that are currently outstanding, since application programmers have a good idea about how long messages should take to be processed, and seeing a 'introduce' message sit in the pending list for more than a moment is grounds for suspicion.

Change History (2)

comment:1 Changed 17 years ago by Brian Warner

In addition to the timeout, if the getReference() performed by foolscap.referenceable.TheirReferenceUnslicer?.receiveClose fires an errback, the error falls off the end of the Deferred chain and disappears.

The ready_deferred should receive both successes and failures of the getReference.

comment:2 Changed 17 years ago by Brian Warner

Resolution: fixed
Status: newclosed

I've fixed this in [83b35c8882e6943c3a0814ba53a3dca8c3fd0547]. The timeout is there and has been all along: it defaults to 60 seconds and is started in TubConnector?.connect() (which means as soon as you do a getReference() for a new target Tub).

There is a separate per-connection timeout which is only used on inbound connections (so on a Listener) which also defaults to 60 seconds, and exists to get rid of callers who aren't speaking foolscap (like someone who points their SMTP or POP client at a foolscap port, and both ends are waiting for the other to speak).

There are four new unit tests to make sure that various kinds of gift-resolution failures are reported to the original caller. Some of these error messages are more useful than others.. we could probably use some improvement here.

Note: See TracTickets for help on using tickets.