id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
25	slow remote_foo methods stall all further deliveries	Brian Warner		"Much to my embarrassment, foolscap-0.1.6 has a major bug in the call-delivery
code. This bug effectively breaks any application which takes a long time to
execute a remote_ method (i.e. it returns a Deferred instead of completing
synchronously).

The problem is in {{{Broker.doNextCall}}}, where we have a queue of
InboundDelivery objects, one per remote call. This queue exists to make sure
that messages which take a long time to deserialize (because they contain
gifts that must be claimed) do not race ahead of other messages which were
sent later. If the top-most message is not yet ready (meaning that we've
received all of the tokens for it, but we haven't finished deserializing if
because of the presence of not-yet-claimed gifts), then we stall the queue
until it becomes ready.

The specific bug is that this queue is also stalling inbound messages until
the previous call has finished *running*. Each remote_foo() method is allowed
to either return synchronously, or to return a Deferred that will fire with
the message's result at some later point in the future. The caller (the
remote side which runs {{{rref.callRemote}}}) always gets their own deferred,
and that one doesn't fire until the message has been delivered and the
response comes back.

But with the current queue logic, the remote_foo's Deferred also prevents
subsequent inbound messages from being delivered. Allmydata had a method that
failed to complete (or stalled for several hours, we're not sure which), and
that stall caused significant secondary damage, since no further messages
could be delivered. This also makes it a lot easier to deadlock, if two Tubs
are sending slow messages at each other.

The fix will be to change the queueing logic, using the queue for the
ready_deferred, but once that fires, dispatch each message+response onto its
own chain. And of course the first step is always to write a unit test which
reproduces the problem, in test_call.py

"	defect	closed	critical	0.1.7	unknown	0.1.6	fixed