id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
149	O(n**2) CPU/malloc during receive of large strings	Brian Warner		"While tracking down some performance problems in Tahoe, I found a smoking gun
in Foolscap's receive-side handling of large tokens (in particular large
strings, more than about 10MB).

Specifically, the inbound token parser, which maintains a single linear
buffer of all unhandled inbound data, performs a copy for each dataReceived
chunk. For large strings, each copy handles more bytes than the previous,
until the whole token has been received. In addition, the start of the parser
loop performs a copy of the post-typebyte body, which again grows in size for
large strings.

Some quick performance tests (sending a large string to a receiver which
merely discards it) shows the following. ""pack"" is the time for the sender to
serialize the data, ""unpack"" is the time for the receipient to deserialize it
(and also includes network transit time, but this was on a LAN), and the
right-most number is the increase in size of the recipient's ""{{{VmPeak}}}"" value,
giving a rough estimate of the impact on memory footprint:

{{{
# size     sender            receipient   
#  40kB: 742us pack,  40ms unpack,       0B
# 0.5MB: 2.4ms pack,  56ms unpack,   1.44MB
#   1MB: 4.2ms pack, 198ms unpack,   2.96MB
#   2MB: 7.3ms pack, 726ms unpack,   5.96MB
#   4MB:  14ms pack,  2.8s unpack,  11.95MB
#   8MB:  26ms pack,  9.6s unpack,  23.96MB
#  16MB:  52ms pack,   36s unpack,  47.96MB
#  32MB: 103ms pack,  150s unpack,  95.95MB
#  64MB: 206ms pack,  593s unpack, 191.96MB
}}}

Clearly the sender's time is linear, the receipient's memory is linear (but
about 3x, much larger than I'd like), and the unpacking time is quadratic.

I have some plans to fix this.. more in a followup comment.
"	defect	closed	major	0.5.1	performance	0.5.0	fixed