Opened 15 years ago

Last modified 14 years ago

#127 new task

give up on resource-consumption defenses

Reported by: Brian Warner Owned by:
Priority: major Milestone: eventually
Component: schemas Version: 0.4.1
Keywords: Cc:

Description

One of the original motivations for building Foolscap (relative to its Perspective Broker ancestry) was to create something that would defend your code against resource-consumption attacks. A PB server can be trivially damaged by sending it a long series of OPEN tokens: the receiver will blow the Python stack limit, and StackOverflow errors typically cause a lot of collateral damage (unrelated code can find itself running in low-stack environments, causing errors in weird places). Likewise, sending a dense tree of lists can consume target memory at a rate faster than one byte per byte, and can exhaust the receiver's memory (and MemoryError is even more damaging than a stack overflow).

So when Glyph and I first started talking about "PB2" (which turned into Foolscap), one of my desires was to have code which would protect you from this sort of thing. Since we were talking about having optional explicit typechecking too (which is rather non-Pythonic for local code, but we felt it would make remotely-reachable code safer), I figured we could use the same information to do fine-grained size checking of the inbound token stream, letting us hang up on a caller who was sending more data than the enclosing RemoteInterface said they should be doing.

The Approach

One way to think about the receive-side deserialization code in Foolscap is to pretend that it is holding a parse tree, something which has one first-level child node for each Referenceable object which is remotely accessible (indexed by the CLID: the connection-local ID). Each such object then has a child node for each method that could be invoked on it: everything declared in that object's RemoteInterface. These method nodes then have child nodes for each argument which they accept, which are either simple atoms (foo= takes a string, bar= takes an int) with maximum sizes, or composite types (ListOf(DictOf(int,str))) with upper bounds on number of elements, etc.

The receive code then knows, at any given moment, where in this parse tree we're currently sitting. It knows which low-level tokens are allowed to come next: if we just received the CLID integer, then the next token must be the method name, so we require a string with a max length of the longest declared methodname. The arguments stage requires a dictionary (i.e. an OPEN token followed by the string "dict"), which itself is limited in the number and length of its keys. And so on.

The tokens themselves have a 65-byte header+type limit, at which point we know what the token type is and how long the body can be, so we can test it against the current constraint. If we're accepting a STRING token but the length is limited to 13 bytes (because that's the longest method name that's been declared), we can know to reject a 4 billion byte string.

One additional use of this approach is basic bug-detection. It's always a good idea to push the explosion closer to the trigger (well, unless you're a demolitionist). If something gets out of sync on the wire and starts sending strings where it's supposed to be sending integers, it's nice to detect it early. Doing this on the inbound token stream is hypothetically better than doing it after deserialization.

The Reality

This approach has two main problems. The first is that, to make it work, there must be no holes. To start with, all objects that are ever published or passed over the wire must have a RemoteInterface, and their schemas must not use the "Any" wildcard (which turns out to be hard to do in the face of forward references). Every size limit that you impose in these schemas affects the overall resource-consumption-attack exposure, but to actually measure that exposure (which is equivalent to calculating the deepest path in the parse tree) is a hassle. There is some half-written code to look at your RemoteInterface and tell you the maximum depth (or throw an UnboundedSchema exception if it's infinite), but it was never really completed.

Moreover, every node in the parse tree must be bounded, even the ones that aren't explicitly tied to a RemoteInterface. The 3rd-party reference handoff "gifting" mechanism (one of the biggest features that Foolscap adds over PB) relies upon sending the reference's FURL to the gift-recipient, generally in an argument that they merely describe as expecting a reference of some sort (not necessarily a 3rd-party one). So the parse-tree node for this sort of slot must be able to accept a "their-reference" sequence. What constraints should it impose?

Since I had resource-consumption attacks and constraints in mind as I wrote the code, I tried to impose limits everywhere (since leaving them unconstrained would expose the whole thing to attack). But the limits I imposed (like the 200-byte limit on "their-reference" FURLs) were arbitrary, and based upon my (narrow) expectations of how people would use the system. I didn't consider that multi-homed Solaris users would have 5 IP addresses in their FURLs and blow that 200-byte limit, or that people would want to sent 100MB strings and be surprised by the default 2KB limit, or the 30-key limit on dictionaries. Foolscap-0.1.7 removed the default sizelimits from user-controllable constraints, but all the internal ones were left in place (since otherwise it would be impossible to create something that was actually bounded).

So it's easy to have an unbounded schema, and it's hard to really plug all the holes. I strongly suspect that, even with perfectly bounded schemas, there are at least a few unconstrained points in the deserialization code.

The second problem is that these limits all surprise the programmer. Hidden limits always do. You write your code, you run your tests, everything works fine (because your datasets are small), you deploy it, things are still fine (because the datasets are medium), and it takes years before someone uses the thing enough to run into the hidden limitation. Even when the original programmer set the constraints to match their use of it, some other programmer down the line can make a different set of assumptions, change the use case without knowing to update the constraints, and then run into limits that they never knew existed. A distributed system makes it worse, because now some of the servers (but not all) are enforcing some of the constraints too, depending upon which versions they're running..

The Advice

When I first described Foolscap to Mark Miller, Mark Stiegler, and Tyler Close, in perhaps 2005 or 2006, they discouraged me from going down the count-every-byte route, for all of the reasons described above. Their recommendation was instead to use a per-user proxy. The idea is to funnel all of Bob's traffic through a single process, upon which you could impose coarse OS-based resource limits (like setting a ulimit on memory size, or on CPU time spent). The proxy receives the messages, deserializes them (thus exposing itself to any attacks), then packages them back up and sends them to the real server. The outside world never gets access to the real server, only to a proxy-builder.

If done right, a resource-consumption attack will kill the proxy, but not the real server, and since Bob is the only one using Bob's proxy, he's only hurting himself, and can't deny service to Alice or Carol (who have their own proxies).

But this felt too hard to implement (it still does). Should the proxies be per-remote-reference? (eek, one process per remote object?). I expected performance to be a problem, since every incoming message gets an extra deserialize+reserialize pass (and of course I commit the usual mistake of letting hypothetical performance impacts drive design decisions without actually implementing and measuring them first). Keeping the CLIDs in sync between the proxy and the real server sounded hard, and not doing that requires the proxy to also behave as a Membrane. Building the proxy-generator sounded hard. And how would this work on a platform that makes it difficult or expensive to spawn off new processes, like windows?

Plus, I had already sunk a lot of time into the token-checking scheme, so I was reluctant to give it up. Also a common mistake. So I stuck with it.

The Present

As of May 2009, every programmer on Tahoe to date (well, Brian, Zooko, and Rob) has run into a mysterious failure that ultimately resulted from a Foolscap constraint being violated. (I've hit fewer of them, of course, since I tend to program under the same assumptions that I baked into Foolscap in the first place). In many cases, these errors were mysterious because the Violations were not reported very well, or because they resembled remote errors that Tahoe normally tries to work around by asking alternate servers ("no shares found", well because everybody we asked gave us an error, oh and all the errors were constraint Violations: you had to dig a lot to discover the triply-wrapped Failure, but of course "no shares" makes you assume that you've got the filecap wrong so you investigate that for a while first).

The maxKeys=30 limit on dictionaries hit us once (I think, in the introducer, when we grew to more than 30 servers). The 2MB sizelimit that I declared for Tahoe shares bit us when we raised the max_segment_size for mutable files. We increased it, but then it bit us again when we used older servers that had not yet been upgraded, and the mismatch between the two sides caused confusion for a release or two (was max size 2MB or 2MiB? oops). The internal 200-byte limit on FURL sizes bit two separate end users who happened to have a lot of IP addresses.

The fact that schemas no longer have default sizelimits on their constraints means that to get a bounded schema, you must set a sizelimit on absolutely everything. The fact that nobody ever finished the depth-checking code means that it's hard to discover where the holes are in your schemas. The fact that forward references require the use of Any (#18) makes less likely that you can even try to get a bounded schema. The internal limitations (like FURL size in Gifts) cause surprises even if the RemoteInterface writer didn't add any. And the likely remaining unbounded holes in the deserialization code mean that all of this work is probably moot anyways.

The Future

So, I'm going to remove deserialization-based resource-consumption-attack mitigation from Foolscap. RemoteInterface will still be there, and any typelimits and sizelimits you impose will still be honored, but they'll be applied to the deserialized arguments, rather than to the incoming token stream. Your foo=StringConstraint(1999) argument will still be guaranteed to never see a 2kB string (or a list or int), but an attacker will be able to crash your process by sending a 2TB string (or an infinite list, etc). Removing the token-type checkers from the deserialization code should simplify things, and maybe help performance a little bit.

I'll also open up a ticket about implementing the proxy-based approach. It might be feasible, and would be nice to have. It's certainly more likely to solve the problem than the deserialization-based approach.

(note: Tahoe#690 is an expression of the 200-byte gift-FURL limit problem)

Change History (2)

comment:1 Changed 15 years ago by Brian Warner

The 200-byte FURL limitation has been removed, in [b86bf9e2afd7d31a7d17ac82c95e4402bbe7df53].

comment:2 Changed 14 years ago by Brian Warner

My list of thing to remove:

  • move BananaError-raising checkToken/openerCheckToken tests into Unslicer.receiveChild
  • remove checkToken/openerCheckToken calls
  • remove 'raise Violation' from receiveChild, make sure corresponding checks are applied to args before delivery to remote_methods
  • remove Banana.handleViolation
  • remove Banana's ABORT handler, since no sending code emits it
  • remove Banana's rejected/discardCount code
Note: See TracTickets for help on using tickets.