Opened 16 years ago
#77 new enhancement
automatic VOCAB-compression of common strings
Reported by: | Brian Warner | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | banana | Version: | 0.2.5 |
Keywords: | Cc: |
Description
At some point I need to finally implement a long-standing TODO item: automatic VOCABization of strings that are sent over the wire many times. The VOCAB token is a small integer that represents a larger string, effectively compressing the wire protocol. The first 128 such strings are represented by two bytes on the wire, and the next 16k strings are represented by 3 bytes.
Foolscaps pre-populates this table with the strings that are used by Foolscap itself, things like "my-reference" and "call" and "unicode". There is a mechanism (the ADDVOCAB token, or maybe it's the add-vocab sequence, I forget) to add new items to the receiver's table.
So what's missing is code in the sender to watch the strings being sent over the wire for ones that are repeated frequently: remote method names are the most likely. If this code decides that we would save space by assigning a VOCAB number to the string, it will send the add-vocab sequence, then start using that VOCAB token every time it sees the string.
An outstanding question is how to do this efficiently.. nominally this new code would manage a dictionary that maps strings to numbers, counting how many times it has seen that string go by. When the count goes above, say, 3, it should add the vocab entry, then remove it from the dict. (This code should look at the strings after they've passed the VOCAB table, so it won't see strings that were already in the VOCAB table). But this would get expensive for applications which send a lot of strings. Maybe Bloom filters..
Note that this is a backwards-compatible change. The add-vocab sequence has been supported by Foolscap since forever, so even old recipients will be able to handle it.