Opened 11 years ago

Last modified 6 years ago

#207 new enhancement

deprecate notifyOnDisconnect()

Reported by: Zooko Owned by:
Priority: major Milestone: 0.14.0
Component: usability Version: 0.6.4
Keywords: Cc:

Description

In my opinion, people should not try to write code that uses disconnection-notification. Tahoe-LAFS has one bit of code left that tries to use notifyOnDisconnect, and there is an open ticket to remove that last usage from Tahoe-LAFS: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1975

Please see also my complaint in https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1768#comment:3, which says that foolscap's current notifyOnDisconnect sometimes fires 0 times (#140) and sometimes fires more than once when it was supposed to fire once.

Deprecating notifyOnDisconnect would solve #140 and #42, would make for less work for the foolscap maintainers in the long run, and would spur Tahoe-LAFS (which is probably the only user of notifyOnDisconnect currently) to hurry up and move off of it, and would deter other users from starting to rely on it.

Change History (7)

comment:1 Changed 10 years ago by Brian Warner

Component: unknownusability
Milestone: undecided0.8.0

comment:2 Changed 9 years ago by Brian Warner

Milestone: 0.9.00.9.1

comment:3 Changed 8 years ago by Brian Warner

Milestone: 0.10.00.11.0

comment:4 Changed 8 years ago by Brian Warner

Milestone: 0.11.00.12.0

Milestone renamed

comment:5 Changed 8 years ago by Brian Warner

Ok, ok, I'll deprecate this. As usual, after a couple of years, I've come around to Zooko's position.

Foolscap is my attempt to bring E (specifically CapTP/VatTP) to Python+Twisted. E's connection model is very connection-oriented: a Far Reference has a distinct lifetime, starting as an unresolved Promise, graduating to a Far Reference, then becoming Broken. The state you construct on the live reference all goes away when it becomes Broken, so that last fatal transition is important to track. The E equivalent of notifyOnDisconnect() is whenBroken, or maybe the inverse (target-side) reactToLostClient. I should ask the E folks how important/reliable whenBroken is, and if there are patterns that simply can't be written correctly without it.

(meanwhile, non-connection-oriented models are looking more promising, like Waterken, where message sends are try-forever-then-die, and there's no such thing as a "live" reference)

The one bit of resistance I had was Tahoe's uploader, where I'd like connections to be established ahead of time (which can be viewed as front-loading the connection establishment phase, so uploads can start faster), but also I wanted to "correctly" assign shares to servers before making the initial will-you-hold-my-share request.

My thinking was that servers will probably say "yes", if they're still around. So we'll get accurate placement (share0 on the first available server, share1 on the next, etc) if the combination of automatic-reconnect and notifyOnDisconnect will give us a mostly-recent view of connectability.

The remaining inaccuracy would be the result of:

  • a connection going away, but notifyOnDisconnect hasn't fired yet (either because it's buggy, or we just haven't yet noticed that they're gone)
  • servers rejecting our request (because they're full)

But: the consequences of mis-placing shares is not really a big deal (non-primary shares going to the first available server), and I've decided to let it go. We could always fix it by adding a second pass: the first pass says "will you hold *a* share for me?", and we hold off assigning the specific share *number* until the second pass (or just before we start uploading data).

Without notifyOnDisconnect, the Reconnector will give Tahoe's StorageFarmBroker an rref for each known server that's in one of four states:

  • 1: None, because we've haven't established the first connection to them yet
  • 2: A RemoteReference that might work (but might not)
  • 3: A RemoteReference that is dead, but we don't realize it (because we no longer use notifyOnDisconnect). Our next attempt to use it will fail. The Reconnector will be working to replace this.
  • 4: a RemoteReference that's we should know is dead (we've already experienced a DeadReferenceError on it), but we keep it around because we're too lazy to prune it.

In the old approach (using notifyOnDisconnect), the StorageFarmBroker cleared out the known-dead rrefs (3+4). In the simplest new approach, we'll keep them around forever. We can improve the accuracy by being less lazy: have the BucketWriter or something notice the DeadReferenceError and inform the StorageFarmBroker about it, who can then clear it (until the Reconnector successfully re-establishes a connection). Servers will hang out in the "3: dead but we don't realize it" state until we try to use them, but we'll avoid the "4: we should know it's dead" state.

In the new approach, we'll pre-assign a share-number to everything in state 2/3/4, then discover the state=3+4 cases when we send the please-store message. At that point, we'll re-assign those shnums to other servers that appear to be functional (some of which may fail, etc). If we eventually succeed in placing all shares, the shnums may be pretty jumbled up. But that's probably going to be ok.

comment:6 Changed 8 years ago by Brian Warner

Milestone: 0.12.00.13.0

Moving these tickets from 0.12 to 0.13, since we need 0.12 out quickly to resolve #263

comment:7 Changed 6 years ago by Brian Warner

Milestone: 0.13.00.14.0
Note: See TracTickets for help on using tickets.