id,summary,reporter,owner,description,type,status,priority,milestone,component,version,resolution,keywords,cc 149,O(n**2) CPU/malloc during receive of large strings,Brian Warner,,"While tracking down some performance problems in Tahoe, I found a smoking gun in Foolscap's receive-side handling of large tokens (in particular large strings, more than about 10MB). Specifically, the inbound token parser, which maintains a single linear buffer of all unhandled inbound data, performs a copy for each dataReceived chunk. For large strings, each copy handles more bytes than the previous, until the whole token has been received. In addition, the start of the parser loop performs a copy of the post-typebyte body, which again grows in size for large strings. Some quick performance tests (sending a large string to a receiver which merely discards it) shows the following. ""pack"" is the time for the sender to serialize the data, ""unpack"" is the time for the receipient to deserialize it (and also includes network transit time, but this was on a LAN), and the right-most number is the increase in size of the recipient's ""{{{VmPeak}}}"" value, giving a rough estimate of the impact on memory footprint: {{{ # size sender receipient # 40kB: 742us pack, 40ms unpack, 0B # 0.5MB: 2.4ms pack, 56ms unpack, 1.44MB # 1MB: 4.2ms pack, 198ms unpack, 2.96MB # 2MB: 7.3ms pack, 726ms unpack, 5.96MB # 4MB: 14ms pack, 2.8s unpack, 11.95MB # 8MB: 26ms pack, 9.6s unpack, 23.96MB # 16MB: 52ms pack, 36s unpack, 47.96MB # 32MB: 103ms pack, 150s unpack, 95.95MB # 64MB: 206ms pack, 593s unpack, 191.96MB }}} Clearly the sender's time is linear, the receipient's memory is linear (but about 3x, much larger than I'd like), and the unpacking time is quadratic. I have some plans to fix this.. more in a followup comment. ",defect,closed,major,0.5.1,performance,0.5.0,fixed,,