-*- outline -*- non-independent things left to do on newpb. These require deeper magic or can not otherwise be done casually. Many of these involve fundamental protocol issues, and therefore need to be decided sooner rather than later. * summary ** protocol issues *** negotiation *** VOCABADD/DEL/SET sequences *** remove 'copy' prefix from RemoteCopy type sequences? *** smaller scope for OPEN-counter reference numbers? ** implementation issues *** cred *** oldbanana compatibility *** Copyable/RemoteCopy default to __getstate__ or self.__dict__ ? *** RIFoo['bar'] vs RIFoo.bar (should RemoteInterface inherit from Interface?) *** constrain ReferenceUnslicer *** serialize target.remote_foo usefully * decide whether to accept positional args in non-constrained methods DEFERRED until after 2.0 warner: that would be awesome but let's do it _later_ This is really a backwards-source-compatibility issue. In newpb, the preferred way of invoking callRemote() is with kwargs exclusively: glyph's felt positional arguments are more fragile. If the client has a RemoteInterface, then they can convert any positional arguments into keyword arguments before sending the request. The question is what to do when the client is not using a RemoteInterface. Until recently, callRemote("bar") would try to find a matching RI. I changed that to have callRemote("bar") never use an RI, and instead you would use callRemote(RIFoo['bar']) to indicate that you want argument-checking. That makes positional arguments problematic in more situations than they were before. The decision to be made is if the OPEN(call) sequence should provide a way to convey positional args to the server (probably with numeric "names" in the (argname, argvalue) tuples). If we do this, the server (which always has the RemoteInterface) can do the positional-to-keyword mapping. But putting this in the protocol will oblige other implementations to handle them too. * change the method-call syntax to include an interfacename DONE Scope the method name to the interface. This implies (I think) one of two things: callRemote() must take a RemoteInterface argument each RemoteReference handles just a single Interface Probably the latter, maybe have the RR keep both default RI and a list of all implemented ones, then adapting the RR to a new RI can be a simple copy (and change of the default one) if the Referenceable knows about the RI. Otherwise something on the local side will need to adapt one RI to another. Need to handle reference-counting/DECREF properly for these shared RRs. From glyph: callRemote(methname, **args) # searches RIs callRemoteInterface(remoteinterface, methname, **args) # single RI getRemoteURL(url, *interfaces) URL-RRefs should turn into the original Referenceable (in args/results) (map through the factory's table upon receipt) URL-RRefs will not survive round trips. leave reference exchange for later. (like def remote_foo(): return GlobalReference(self) ) move method-invocation code into pb.Referenceable (or IReferenceable adapter). Continue using remote_ prefix for now, but make it a property of that code so it can change easily. ok, for today I'm just going to stick with remote_foo() as a low-budget decorator, so the current restrictions are 1: subclass pb.Referenceable, 2: implements() a RemoteInterface with method named "foo", 3: implement a remote_foo method and #1 will probably go away within a week or two, to be replaced by #1a: subclass pb.Referenceable OR #1b: register an IReferenceable adapter try serializing with ISliceable first, then try IReferenceable. The IReferenceable adapter must implements() some RemoteInterfaces and gets serialized with a MyReferenceSlicer. http://svn.twistedmatrix.com/cvs/trunk/pynfo/admin.py?view=markup&rev=44&root=pynfo ** use the methods of the RemoteInterface as the "method name" DONE (provisional), using RIFoo['add'] rr.callRemote(RIFoo.add, **args) Nice and concise. However, #twisted doesn't like it, adding/using arbitrary attributes of Interfaces is not clean (think about IFoo.implements colliding with RIFoo.something). rr.callRemote(RIFoo['add'], **args) RIFoo(rr).callRemote('add', **args) adaptation, or narrowing? glyph: I'm adding callRemote(RIFoo.bar, **args) to newpb right now wow. seemed like a simpler interface than callRemoteInterface("RIFoo", "bar", **args) warner: Does this mean that IPerspective can be parameterized now? warner: bad idea warner: Zope hates you! warner: zope interfaces don't support that syntax zi does support multi-adapter syntax but i don't really know what that is warner: callRemote(RIFoo.getDescriptionFor("bar"), *a, **k) glyph: yeah, I fake it. In RemoteInterfaceClass, I remove those attributes, call InterfaceClass, and then put them all back in warner: don't add 'em as attributes warner: just fix the result of __getitem__ to add a slot actually refer back to the interface radix: the problem is that IFoo['bar'] doesn't point back to IFoo warner: even better, make them callable :-) glyph: IFoo['bar'].interface == 'IFoo' RIFoo['bar']('hello') glyph: I was thinking of doing that in a later version of RemoteInterface exarkun: >>> type(IFoo['bar'].interface) right 'IFoo' Just look through all the defined interfaces for ones with matching names exarkun: ... e.g. *NOT* __main__.IFoo exarkun: AAAA you die hee hee * warner struggles to keep up with his thoughts and those of people around him * glyph realizes he has been given the power to whine glyph: ok, so with RemoteInterface.__getitem__, you could still do rr.callRemote(RIFoo.bar, **kw), right? was your objection to the interface or to the implementation? I really don't think you should add attributes to the interface ok I need to stash a table of method schemas somewhere just make __getitem__ return better type of object and ideally if this is generic we can get it into upstream Is there a reason Method.interface isn't a fully qualified name? not necessarily I have commit access to zope.interface if you have any features you want added, post to interface-dev@zope.org mailing list and if Jim Fulton is ok with them I can add them for you hmm does using RIFoo.bar to designate a remote method seem reasonable? I could always adapt it to something inside callRemote something PB-specific, that is but that adapter would have to be able to pull a few attributes off the method (name, schema, reference to the enclosing RemoteInterface) and we're really talking about __getattr__ here, not __getitem__, right? for x.y yes no, I don't think that's a good idea interfaces have all kinds od methods on them already, for introspection purposes namespace clashes are the suck unless RIFoo isn't really an Interface hm how about if it were a wrapper around a regular Interface? yeah, RemoteInterfaces are kind of a special case RIFoo(IFoo, publishedMethods=['doThis', 'doThat']) s/RIFoo/RIFoo = RemoteInterface(/ I'm confused. Why should you have to specify which methods are published? SECURITY! not actually necessary though, no and may be overkill the only reason I have it derive from Interface is so that we can do neat adapter tricks in the future that's not contradictory RIFoo(x) would still be able to do magic you wouldn't be able to check if an object provides RIFoo, though which kinda sucks but in any case I am against RIFoo.bar pity, it makes the callRemote syntax very clean hm So how come it's a RemoteInterface and not an Interface, anyway? I mean, how come that needs to be done explicitly. Can't you just write a serializer for Interface itself? * warner goes to figure out where the RemoteInterface discussion went after he got distracted maybe I should make RemoteInterface a totally separate class and just implement a couple of Interface-like methods cause rr.callRemote(IFoo.bar, a=1) just feels so clean warner: why not IFoo(rr).bar(a=1) ? hmm, also a possibility well IFoo(rr).callRemote('bar') or RIFoo, or whatever hold on, what does rr inherit from? RemoteReference it's a RemoteReference then why not IFoo(rr) / I'm keeping a strong distinction between local interfaces and remote ones ah, oka.y warner: right, you can still do RIFoo ILocal(a).meth(args) is an immediate function call in that case, I prefer rr.callRemote(IFoo.bar, a=1) .meth( is definitely bad, we need callRemote rr.callRemote("meth", args) returns a deferred radix: I don't like from foo import IFoo, RIFoo you probably wouldn't have both an IFoo and an RIFoo warner: well, look at it this way: IFoo(rr).callRemote('foo') still makes it obvious that IFoo isn't local warner: you could implement RemoteReferen.__conform__ to implement it radix: I'm thinking of providing some kind of other class that would allow .meth() to work (without the callRemote), but it wouldn't be the default plus, IFoo(rr) is how you use interfaces normally, and callRemote is how you make remote calls normally, so it seems that's the best way to do interfaces + PB hmm in that case the object returned by IFoo(rr) is just rr with a tag that sets the "default interface name" right and callRemote(methname) looks in that default interface before looking anywhere else for some reason I want to get rid of the stringyness of the method name and the original syntax (callRemoteInterface('RIFoo', 'methname', args)) felt too verbose warner: well, isn't that what your optional .meth thing is for? yes, I don't like that either using callRemote(RIFoo.bar, args) means I can just switch on the _name= argument being either a string or a (whatever) that's contained in a RemoteInterface a lot of it comes down to how adapters would be most useful when dealing with remote objects and to what extent remote interfaces should be interchangeable with local ones good point. I have never had a use case where I wanted to adapt a remote object, I don't think however, I have had use cases to send interfaces across the wire e.g. having a parameterized portal.login() interface that'll be different, just callRemote('foo', RIFoo) yeah. the current issue is whether to pass them by reference or by value eugh Can you explain it without using those words? :) hmm Do you mean, Referenceable style vs Copyable style? at the moment, when you send a Referenceable across the wire, the id-number is accompanied with a list of strings that designate which RemoteInterfaces the original claims to provide the receiving end looks up each string in a local table, and populates the RemoteReference with a list of RemoteInterface classes the table is populated by metaclass magic that runs when a 'class RIFoo(RemoteInterface)' definition is complete ok so a RemoteInterface is simply serialized as its qual(), right? so as long as both sides include the same RIFoo definition, they'll wind up with compatible remote interfaces, defining the same method names, same method schemas, etc effectively you can't just send a RemoteInterface across the wire right now, but it would be easy to add the places where they are used (sending a Referenceable across the wire) all special case them ok, and you're considering actually writing a serializer for them that sends all the information to totally reconstruct it on the other side without having the definiton yes or having some kind of debug method which give you that I'd say, do it the way you're doing it now until someone comes up with a use case for actually sending it... right the only case I can come up with is some sort of generic object browser debug tool everything else turns into a form of version negotiation which is better handled elsewhere hmm so RIFoo(rr).callRemote('bar', **kw) I guess that's not too ugly That's my vote. :) one thing it lacks is the ability to cleanly state that if 'bar' doesn't exist in RIFoo then it should signal an error whereas callRemote(RIFoo.bar, **kw) would give you an AttributeError before callRemote ever got called i.e. "make it impossible to express the incorrect usage" mmmh warner: but you _can_ check it immediately when it's called in the direction I was heading, callRemote(str) would just send the method request and let the far end deal with it, no schema-checking involved warner: which, 99% of the time, is effectively the same time as IFoo.bar would happen whereas callRemote(RIFoo.bar) would indicate that you want schema checking yeah, true hm. (that last feature is what allowed callRemote and callRemoteInterface to be merged) or, I could say that the normal RemoteReference is "untyped" and does not do schema checking but adapting one to a RemoteInterface results in a TypedRemoteReference which does do schema checking and which refuses to be invoked with method names that are not in the schema warner: we-ell warner: doing method existence checking is cool warner: but I think tying any further "schema checking" to adaptation is a bad idea yeah, that's my hunch too which is why I'd rather not use adapters to express the scope of the method name (which RemoteInterface it is supposed to be a part of) warner: well, I don't think tying it to callRemote(RIFoo.methName) would be a good idea just the same hm so that leaves rr.callRemote(RIFoo['add']) and rr.callRemoteInterface(RIFoo, 'add') OTOH, I'm inclined to think schema checking should happen by default It's just a the matter of where it's parameterized yeah, it's just that the "default" case (rr.callRemote('name')) needs to work when there aren't any RemoteInterfaces declared warner: oh but if we want to encourage people to use the schemas, then we need to make that case simple and concise * radix goes over the issue in his head again Yes, I think I still have the same position. which one? :) IFoo(rr).callRemote("foo"); which would do schema checking because schema checking is on by default when it's possible using an adaptation-like construct to declare a scope of the method name that comes later well, it _is_ adaptation, I think. Adaptation always has plugged in behavior, we're just adding a bit more :) heh it is a narrowing of capability hmm, how do you mean? rr.callRemote("foo") will do the same thing but rr.callRemote("foo") can be used without the remote interfaces I think I lost you. if rr has any RIs defined, it will try to use them (and therefore complain if "foo" does not exist in any of them, or if the schema is violated) Oh. That's strange. So it's really quite different from how interfaces regularly work... yeah except that if you were feeling clever you could use them the normal way Well, my inclination is to make them work as similarly as possible. "I have a remote reference to something that implements RIFoo, but I want to use it in some other way" s/possible/practical/ then IBar(rr) or RIBar(rr) would wrap rr in something that knows how to translate Bar methods into RIFoo remote methods Maybe it's not practical to make them very similar. I see. rr.callRemote(RIFoo.add, **kw) rr.callRemote(RIFoo['add'], **kw) RIFoo(rr).callRemote('add', **kw) I like the second one. Normal Interfaces behave like a dict, so IFoo['add'] gets you the method-describing object (z.i.i.Method). My RemoteInterfaces don't do that right now (because I remove the attributes before handing the RI to z.i), but I could probably fix that. I could either add attributes to the Method or hook __getitem__ to return something other than a Method (maybe a RemoteMethodSchema). Those Method objects have a .getSignatureInfo() which provides almost everything I need to construct the RemoteMethodSchema. Perhaps I should post-process Methods rather than pre-process the RemoteInterface. I can't tell how to use the return value trick, and it looks like the function may be discarded entirely once the Method is created, so this approach may not work. On the server side (Referenceable), subclassing Interface is nice because it provides adapters and implements() queries. On the client side (RemoteReference), subclassing Interface is a hassle: I don't think adapters are as useful, but getting at a method (as an attribute of the RI) is important. We have to bypass most of Interface to parse the method definitions differently. * create UnslicerRegistry, registerUnslicer DONE (PROVISIONAL), flat registry (therefore problematic for len(opentype)>1) consider adopting the existing collection API (getChild, putChild) for this, or maybe allow registerUnslicer() to take a callable which behaves kind of like a twisted.web isLeaf=1 resource (stop walking the tree, give all index tokens to the isLeaf=1 node) also some APIs to get a list of everything in the registry * use metaclass to auto-register RemoteCopy classes DONE ** use metaclass to auto-register Unslicer classes DONE ** and maybe Slicer classes too DONE with name 'slices', perhaps change to 'slicerForClasses'? class FailureSlicer(slicer.BaseSlicer): classname = "twisted.python.failure.Failure" slicerForClasses = (failure.Failure,) # triggers auto-register ** various registry approaches DONE There are currently three kinds of registries used in banana/newpb: RemoteInterface <-> interface name class/type -> Slicer (-> opentype) -> Unslicer (-> class/type) Copyable subclass -> copyable-opentype -> RemoteCopy subclass There are two basic approaches to representing the mappings that these registries implement. The first is implicit, where the local objects are subclassed from Sliceable or Copyable or RemoteInterface and have attributes to define the wire-side strings that represent them. On the receiving side, we make extensive use of metaclasses to perform automatic registration (taking names from class attributes and mapping them to the factory or RemoteInterface used to create the remote version). The second approach is explicit, where pb.registerRemoteInterface, pb.registerRemoteCopy, and pb.registerUnslicer are used to establish the receiving-side mapping. There isn't a clean way to do it explicitly on the sending side, since we already have instances whose classes can give us whatever information we want. The advantage of implicit is simplicity: no more questions about why my pb.RemoteCopy is giving "not unserializable" errors. The mere act of importing a module is enough to let PB create instances of its classes. The advantage of doing it explicitly is to remind the user about the existence of those maps, because the factory classes in the receiving map is precisely equal to the user's exposure (from a security point of view). See the E paper on secure-serialization for some useful concepts. A disadvantage of implicit is that you can't quite be sure what, exactly, you're exposed to: the registrations take place all over the place. To make explicit not so painful, we can use quotient's .wsv files (whitespace-separated values) which map from class to string and back again. The file could list fully-qualified classname, wire-side string, and receiving factory class on each line. The Broker (or rather the RootSlicer and RootUnslicer) would be given a set of .wsv files to define their mapping. It would get all the registrations at once (instead of having them scattered about). They could also demand-load the receive-side factory classes. For now, go implicit. Put off the decision until we have some more experience with using newpb. * move from VocabSlicer sequence to ADDVOCAB/DELVOCAB tokens Requires a .wantVocabString flag in the parser, which is kind of icky but fixes the annoying asymmetry between set (vocab sequence) and get (VOCAB token). Might want a CLEARVOCAB token too. On second thought, this won't work. There isn't room for both a vocab number and a variable-length string in a single token. It must be an open sequence. However, it could be an add/del/set-vocab sequence, allowing the vocab to be modified incrementally. ** VOCABize interface/method names One possibility is to make a list of all strings used by all known RemoteInterfaces and all their methods, then send it at broker connection time as the initial vocab map. A better one (maybe) is to somehow track what we send and add a word to the vocab once we've sent it more than three times. Maybe vocabize the pairs, as "ri/name1","ri/name2", etc, or maybe do them separately. Should do some handwaving math to figure out which is better. * nail down some useful schema syntaxes This has two parts: parsing something like a __schema__ class attribute (see the sketches in schema.xhtml) into a tree of FooConstraint objects, and deciding how to retrieve schemas at runtime from things like the object being serialized or the object being called from afar. To be most useful, the syntax needs to mesh nicely (read "is identical to") things like formless and (maybe?) atop or whatever has replaced the high-density highly-structured save-to-disk scheme that twisted.world used to do. Some lingering questions in this area: When an object has a remotely-invokable method, where does the appropriate MethodConstraint come from? Some possibilities: an attribute of the method itself: obj.method.__schema__ from inside a __schema__ attribute of the object's class from inside a __schema__ attribute of an Interface (which?) that the object implements Likewise, when a caller holding a RemoteReference invokes a method on it, it would be nice to enforce a schema on the arguments they are sending to the far end ("be conservative in what you send"). Where should this schema come from? It is likely that the sender only knows an Interface for their RemoteReference. When PB determines that an object wants to be copied by value instead of by reference (pb.Copyable subclass, Copyable(obj), schema says so), where should it find a schema to define what exactly gets copied over? A class attribute of the object's class would make sense: most objects would do this, some could override jellyFor to get more control, and others could override something else to push a new Slicer on the stack and do streaming serialization. Whatever the approach, it needs to be paralleled by the receiving side's unjellyableRegistry. * RemoteInterface instances should have an "RI-" prefix instead of "I-" DONE * merge my RemoteInterface syntax with zope.interface's I hacked up a syntax for how method definitions are parsed in RemoteInterface objects. That syntax isn't compatible with the one zope.interface uses for local methods, so I just delete them from the attribute dictionary to avoid causing z.i indigestion. It would be nice if they were compatible so I didn't have to do that. This basically translates into identifying the nifty extra flags (like priority classes, no-response) that we want on these methods and finding a z.i-compatible way to implement them. It also means thinking of SOAP/XML-RPC schemas and having a syntax that can represent everything at once. * use adapters to enable pass-by-reference or pass-by-value It should be possible to pass a reference with variable forms: rr.callRemote("foo", 1, Reference(obj)) rr.callRemote("bar", 2, Copy(obj)) This should probably adapt the object to IReferenceable or ICopyable, which are like ISliceable except they can pass the object by reference or by value. The slicing process should be: look up the type() in a table: this handles all basic types else adapt the object to ISliceable, use the result else raise an Unsliceable exception (and point the user to the docs on how to fix it) The adapter returned by IReferenceable or ICopyable should implement ISliceable, so no further adaptation will be done. * remove 'copy' prefix from remotecopy banana type names? warner: did we ever finish our conversation on the usefulness of the (copy foo blah) namespace rather than just (foo blah)? glyph: no, I don't think we did warner: do you still have (copy foo blah)? glyph: yup so far, it seems to make some things easier glyph: the sender can subclass pb.Copyable and not write any new code, while the receiver can write an Unslicer and do a registerRemoteCopy glyph: instead of the sender writing a whole slicer and the receiver registering at the top-level warner: aah glyph: although the fact that it's easier that way may be an artifact of my sucky registration scheme warner: so the advantage is in avoiding registration of each new unslicer token? warner: yes. I'm thinking that a metaclass will handily remove the need for extra junk in the protocol ;) well, the real reason is my phobia about namespace purity, of course warner: That's what the dots are for but ease of dispatch is also important warner: I'm concerned about it because I consider my use of the same idiom in the first version of PB to be a serious wart * warner nods I will put together a list of my reasoning warner: I think it's likely that PB implementors in other languages are going to want to introduce new standard "builtin" types; our "builtins" shouldn't be limited to python's provided data structures glyph: wait ok glyph: are you talking of banana types glyph: or really PB in which case (copy blah blah) is a non-builtin type, while (type-foo) is a builtin type warner: plus, our namespaces are already quite well separated, I can tell you I will never be declaring new types outside of quotient.* and twisted.* :) moshez: this is mostly banana (or what used to be jelly, really) warner: my inclination is to standardize by convention warner: *.* is a non-builtin type, [~.] is a builtin glyph: ? sorry [^.]* my regular expressions and shell globs are totally confused but you know what I mean moshez: yes glyph: hrm glyph: you're making crazy anime faces glyph: why do we need any non-Python builtin types moshez: because I want to destroy SOAP, and doing that means working with people I don't like moshez: outside of python glyph: I meant, "what specific types" I'd appreciate a blog on that * have Copyable/RemoteCopy default to __getstate__/__setstate__? At the moment, the default implementations of getStateToCopy() and setCopyableState() get and set __dict__ directly. Should the default instead be to call __getstate__() or __setstate__()? * make slicer/unslicers for pb.RemoteInterfaces exarkun's use case requires these Interfaces to be passable by reference (i.e. by name). It would also be interesting to let them be passed (and requested!) by value, so you can ask a remote peer exactly what their objects will respond to (the method names, the argument values, the return value). This also requires that constraints be serializable. do this, should be referenceable (round-trip should return the same object), should use the same registration lookup that RemoteReference(interfacelist) uses * investigate decref/Referenceable race Any object that includes some state when it is first sent across the wire needs more thought. The far end could drop the last reference (at time t=1) while a method is still pending that wants to send back the same object. If the method finishes at time t=2 but the decref isn't received until t=3, the object will be sent across the wire without the state, and the far end will receive it for the "first" time without that associated state. This kind of conserve-bandwidth optimization may be a bad idea. Or there might be a reasonable way to deal with it (maybe request the state if it wasn't sent and the recipient needs it, and delay delivery of the object until the state arrives). DONE, the RemoteReference is held until the decref has been acked. As long as the methods are executed in-order, this will prevent the race. TODO: third-party references (and other things that can cause out-of-order execution) could mess this up. * sketch out how to implement glyph's crazy non-compressed sexpr encoding * consider a smaller scope for OPEN-counter reference numbers For newpb, we moved to implicit reference numbers (counting OPEN tags instead of putting a number in the OPEN tag) because we didn't want to burn so much bandwidth: it isn't feasible to predict whether your object will need to be referenced in the future, so you always have to be prepared to reference it, so we always burn the memory to keep track of them (generally in a ScopedSlicer subclass). If we used explicit refids then we'd have to burn the bandwidth too. The sorta-problem is that these numbers will grow without bound as long as the connection remains open. After a few hours of sending 100-byte objects over a 100MB connection, you'll hit 1G-references and will have to start sending them as LONGINT tokens, which is annoying and slightly verbose (say 3 or 4 bytes of number instead of 1 or 2). You never keep track of that many actual objects, because the references do not outlive their parent ScopedSlicer. The fact that the references themselves are scoped to the ScopedSlicer suggests that the reference numbers could be too. Each ScopedSlicer would track the number of OPEN tokens emitted (actually the number of slicerForObject calls made, except you'd want to use a different method to make sure that children who return a Slicer themselves don't corrupt the OPEN count). This requires careful synchronization between the ScopedSlicers on one end and the ScopedUnslicers on the other. I suspect it would be slightly fragile. One sorta-benefit would be that a somewhat human-readable sexpr-based encoding would be even more human readable if the reference numbers stayed small (you could visually correlate objects and references more easily). The ScopedSlicer's open-parenthesis could be represented with a curly brace or something, then the refNN number would refer to the NN'th left-paren from the last left-brace. It would also make it clear that the recipient will not care about objects outside that scope. * implement the FDSlicer Over a unix socket, you can pass fds. exarkun had a presentation at PyCon04 describing the use of this to implement live application upgrade. I think that we could make a simple FDSlicer to hide the complexity of the out-of-band part of the communication. class Server(unix.Server): def sendFileDescriptors(self, fileno, data="Filler"): """ @param fileno: An iterable of the file descriptors to pass. """ payload = struct.pack("%di" % len(fileno), *fileno) r = sendmsg(self.fileno(), data, 0, (socket.SOL_SOCKET, SCM_RIGHTS, payload)) return r class Client(unix.Client): def doRead(self): if not self.connected: return try: msg, flags, ancillary = recvmsg(self.fileno()) except: log.msg('recvmsg():') log.err() else: buf = ancillary[0][2] fds = [] while buf: fd, buf = buf[:4], buf[4:] fds.append(struct.unpack("i", fd)[0]) try: self.protocol.fileDescriptorsReceived(fds) except: log.msg('protocol.fileDescriptorsReceived') log.err() return unix.Client.doRead(self) * implement AsyncDeferred returns dash wanted to implement a TransferrableReference object with a scheme that would require creating a new connection (to a third-party Broker) during ReferenceUnslicer.receiveClose . This would cause the object deserialization to be asynchronous. At the moment, Unslicers can return a Deferred from their receiveClose method. This is used by immutable containers (like tuples) to indicate that their object cannot be created yet. Other containers know to watch for these Deferreds and add a callback which will update their own entries appropriately. The implicit requirement is that all these Deferreds fire before the top-level parent object (usually a CallUnslicer) finishes. This allows for circular references involving immutable containers to be resolved into the final object graph before the target method is invoked. To accomodate Deferreds which will fire at arbitrary points in the future, it would be useful to create a marker subclass named AsyncDeferred. If an unslicer returns such an object, the container parent starts by treating it like a regular Deferred, but it also knows that its object is not "complete", and therefore returns an AsyncDeferred of its own. When the child completes, the parent can complete, etc. The difference between the two types: Deferred means that the object will be complete before the top-level parent is finished, AsyncDeferred makes claims about when the object will be finished. CallUnslicer would know that if any of its arguments are Deferreds or AsyncDeferreds then it need to hold off on the broker.doCall until all those Deferreds have fired. Top-level objects are not required to differentiate between the two types, because they do not return an object to an enclosing parent (the CallUnslicer is a child of the RootUnslicer, but it always returns None). Other issues: we'll need a schema to let you say whether you'll accept these late-bound objects or not (because if you do accept them, you won't be able to impose the same sorts of type-checks as you would on immediate objects). Also this will impact the in-order-invocation promises of PB method calls, so we may need to implement the "it is ok to run this asynchronously" flag first, then require that TransferrableReference objects are only passed to methods with the flag set. Also, it may not be necessary to have a marker subclass of Deferred: perhaps _any_ Deferred which arrives from a child is an indication that the object will not be available until an unknown time in the future, and obligates the parent to return another Deferred upwards (even though their object could be created synchronously). Or, it might be better to implement this some other way, perhaps separating "here is my object" from "here is a Deferred that will fire when my object is complete", like a call to parent.addDependency(self.deferred) or something. DONE, needs testing * TransferrableReference class MyThing(pb.Referenceable): pass r1 = MyThing() r2 = Facet(r1) g1 = Global(r1) class MyGlobalThing(pb.GloballyReferenceable): pass g2 = MyGlobalThing() g3 = Facet(g2) broker.setLocation("pb://hostname.com:8044") rem.callRemote("m1", r1) # limited to just this connection rem.callRemote("m2", Global(r1)) # can be published g3 = Global(r1) rem.callRemote("m3", g1) # can also be published.. g1.revoke() # but since we remember it, it can be revoked too g1.restrict() # and, as a Facet, we can revoke some functionality but not all rem.callRemote("m1", g2) # can be published E tarball: jsrc/net/captp/tables/NearGiftTable issues: 1: when A sends a reference on B to C, C's messages to the object referenced must arrive after any messages A sent before the reference forks in particular, if A does: B.callRemote("1", hugestring) B.callRemote("2_makeYourSelfSecure", args) C.callRemote("3_transfer", B) and C does B.callRemote("4_breakIntoYou") as soon as it gets the reference, then the A->B queue looks like (1, 2), and the A->C queue looks like (3). The transfer message can be fast, and the resulting 4 message could be delivered to B before the A->B queue manages to deliver 2. 2: an object which get passed through multiple external brokers and eventually comes home must be recognized as a local object 3: Copyables that contain RemoteReferences must be passable between hosts E cannot do all three of these at once http://www.erights.org/elib/distrib/captp/WormholeOp.html I think that it's ok to tell people who want this guarantee to explicitly serialize it like this: B.callRemote("1", hugestring) d = B.callRemote("2_makeYourSelfSecure", args) d.addCallback(lambda res: C.callRemote("3_transfer", B)) Note that E might not require that method calls even have a return value, so they might not have had a convenient way to express this enforced serialization. ** more thoughts To enforce the partial-ordering, you could do the equivalent of: A: B.callRemote("1", hugestring) B.callRemote("2_makeYourSelfSecure", args) nonce = makeNonce() B.callRemote("makeYourSelfAvailableAs", nonce) C.callRemote("3_transfer", (nonce, B.name)) C: B.callRemote("4_breakIntoYou") C uses the nonce when it connects to B. It knows the name of the reference, so it can compare it against some other reference to the same thing, but it can't actually use that name alone to get access. When the connection request arrives at B, it sees B.name (which is also unguessable), so that gives it reason to believe that it should queue C's request (that it isn't just a DoS attack). It queues it until it sees A's request to makeYourSelfAvailableAs with the matching nonce. Once that happens, it can provide the reference back to C. This implies that C won't be able to send *any* messages to B until that handshake has completed. It might be desireable to avoid the extra round-trip this would require. ** more thoughts url = PBServerFactory.registerReference(ref, name=None) creates human-readable URLs or random identifiers the factory keeps a bidirectional mapping of names and Referenceables when a Referenceable gets serialized, if the factory's table doesn't have a name for it, the factory creates a random one. This entry in the table is kept alive by two things: a live reference by one of the factory's Brokers an entry in a Broker's "gift table" When a RemoteReference gets serialized (and it doesn't point back to the receiving Broker, and thus get turned into a your-reference sequence), A->C: "I'm going to send somebody a reference to you, incref your gift table", C->A: roger that, here's a gift nonce A->B: "here's Carol's reference: URL plus nonce" B->C: "I want a liveref to your 'Carol' object, here's my ticket (nonce)", C->B: "ok, ticket redeemed, here's your liveref" once more, without nonces: A->C: "I'm going to send somebody a reference to you, incref your gift table", C->A: roger that A->B: "here's Carol's reference: URL" B->C: "I want a liveref to your 'Carol' object", C->B: "ok, here's your liveref" really: on A: c.vat.callRemote("giftYourReference", c).addCallback(step2) c is serialized as (your-reference, clid) on C: vat.remote_giftYourReference(which): self.table[which] += 1; return on A: step2: b.introduce(c) c is serialized as (their-reference, url) on B: deserialization sees their-reference newvat = makeConnection(URL) newvat.callRemote("redeemGift", URL).addCallback(step3) on C: vat.remote_redeemGift(URL): ref = self.urls[URL]; self.table[ref] -= 1; return ref ref is serialized as (my-reference, clid) on B: step3(c): b.remote_introduce(c) problem: if alice sends a thousand copies, that means these 5 messages are each send a thousand times. The makeConnection is cached, but the rest are not. We don't rememeber that we've already made this gift before, that the other end probably still has it. Hm, but we also don't know that they didn't lose it already. ** ok, a plan: concern 1: objects must be kept alive as long as there is a RemoteReference to them. concern 2: we should be able to tell when an object is being sent for the first time, to add metadata (interface list, public URL) that would be expensive to add to every occurrence. each (my-reference) sent over the wire increases the broker's refcount on both ends. the receiving Broker retains a weakref to the RemoteReference, and retains a copy of the metadata necessary to create it in the clid table (basically the entire contents of the RemoteReference). When the weakref expires, it marks the clid entry as "pending-free", and sends a decref(clid,N) to the other Broker. The decref is actually sent with broker.callRemote("decref", clid, N), so it can be acked. the sending broker gets the decref and reduces its count by N. If another reference was sent recently, this count may not drop all the way to zero, indicating there is a reference "in flight" and the far end should be ready to deal with it (by making a new RemoteReference with the same properties as the old one). If N!=0, it returns False to indicate that this was not the last decref message for the clid. If N==0, it returns True, since it is the last decref, and removes the entry from its table. Once remote_decref returns True, the clid is retired. the receiving broker receives the ack from the decref. If the ack says last==True, the clid table entry is freed. If it says last==False, then there should have been another (my-reference) received before the ack, so the refcount should be non-zero. message sequence: A-> : (my-reference clid metadata) [A.myrefs[clid].refcount++ = 1] A-> : (my-reference clid) [A.myrefs[clid].refcount++ = 2] ->B: receives my-ref, creates RR, B.yourrefs[clid].refcount++ = 1 ->B: receives my-ref, B.yourrefs[clid].refcount++ = 2 : time passes, B sees the reference go away <-B: d=brokerA.callRemote("decref", clid, B.yourrefs[clid].refcount) B.yourrefs[clid].refcount = 0; d.addCallback(B.checkref, clid) A-> : (my-reference clid) [A.myrefs[clid].refcount++ = 3] A<- : receives decref, A.myrefs[clid].refcount -= 2, now =1, returns False ->B: receives my-ref, re-creates RR, B.yourrefs[clid].refcount++ = 1 ->B: receives ack(False), B.checkref asserts refcount != 0 : time passes, B sees the reference go away again <-B: d=brokerA.callRemote("decref", clid, B.yourrefs[clid].refcount) B.yourrefs[clid].refcount = 0; d.addCallback(B.checkref, clid) A<- : receives decref, A.myrefs[clid].refcount -= 1, now =0, returns True del A.myrefs[clid] ->B: receives ack(True), B.checkref asserts refcount==0 del B.yourrefs[clid] B retains the RemoteReference data until it receives confirmation from A. Therefore whenever A sends a reference that doesn't already exist in the clid table, it is sending it to a B that doesn't know about that reference, so it needs to send the metadata. concern 3: in the three-party exchange, Carol must be kept alive until Bob has established a reference to her, even if Alice drops her carol-reference immediately after sending the introduction to Bob. (my-reference, clid, [interfaces, public URL]) (your-reference, clid) (their-reference, URL) Serializing a their-reference causes an entry to be placed in the Broker's .theirrefs[URL] table. Each time a their-reference is sent, the entry's refcount is incremented. Receiving a their-reference may initiate a PB connection to the target, followed by a getNamedReference request. When this completes (or if the reference was already available), the recipient sends a decgift message to the sender. This message includes a count, so multiple instances of the same gift can be acked as a group. The .theirrefs entry retains a reference to the sender's RemoteReference, so it cannot go away until the gift is acked. DONE, gifts are implemented, we punted on partial-ordering *** security, DoS Bob can force Alice to hold on to a reference to Carol, as long as both connections are open, by never acknowledging the gift. Alice can cause Bob to open up TCP connections to arbitrary hosts and ports, by sending third-party references to him, although the only protocol those connections will speak is PB. Using yURLs and StartTLS should be enough to secure and authenticate the connections. *** partial-ordering If we need it, the gift (their-reference message) can include a nonce, Alice sends a makeYourSelfAvailableAs message to Carol with the nonce, and Bob must do a new getReference with the nonce. Kragen came up with a good use-case for partial-ordering: A: B.callRemote("updateDocument", bigDocument) C.callRemote("pleaseReviewLatest", B) C: B.callRemote("getLatestDocument") * PBService / Tub Really, PB wants to be a Service, since third-party references mean it will need to make connections to arbitrary targets, and it may want to re-use those connections. s = pb.PBService() s.listenOn(strport) # provides URL base swissURL = s.registerReference(ref) # creates unguessable name publicURL = s.registerReference(ref, "name") # human-readable name s.unregister(URL) # also revokes all clids s.unregisterReference(ref) d = s.getReference(URL) # Deferred which fires with the RemoteReference d = s.shutdown() # close all servers and client connections DONE, this makes things quite clean * promise pipelining Even without third-party references, we can do E-style promise pipelining. hmm. subclass of Deferred that represents a Promise, can be serialized if it's being sent to the same broker as the RemoteReference it was generated for warner: hmmm. how's that help us? oh, pipelining? maybe a flag on the callRemote to say that "yeah, I want a DeferredPromise out of you, but I'm only going to include it as an argument to another method call I'm sending you, so don't bother sending *me* the result" aah yeah that sounds like a reasonable approach that would actually work dash: do you know if E makes any attempt to handle >2 vats in their pipelining implementation? seems to me it could turn into a large network optimization problem pretty quickly warner: Mmm hmm I do not think you have to so you have: t1=a.callRemote("foo",args1); t2=t1.callRemote("bar",args2), where callRemote returns a Promise, which is a special kind of Deferred that remembers the Broker its answer will eventually come from. If args2 consists of entirely immediate things (no Promises) or Promises that are coming from the same broker as t1 uses, then the "bar" call is eligible for pipelining and gets sent to the remote broker in the resulting newpb banana sequence, the clid of the target method is replaced by another kind of clid, which means "the answer you're going to send to method call #N", where N comes from t1 mmm yep using that new I-can't-unserialize-this-yet hook we added, the second call sequence doesn't finish unserializing until the first call finishes and sends the answer. Sending answer #N fires the hook's deferred. that triggers the invocation of the second method yay hm, of course that totally blows away the idea of using a Constraint on the arguments to the second method because you don't even know what the object is until after the arguments have arrived but well the first method has a schema, which includes a return constraint okay you can't fail synchronously so you *can* assert that, whatever the object will be, it obeys that constraint but you can return a failure like everybody else and since the constraint specifies an Interface, then the Interface plus mehtod name is enough to come up with an argument constraint so you can still enforce one this is kind of cool the big advantage of pipelining is that you can have a lot of composable primitives on your remote interfaces rather than having to smush them together into things that are efficient to call remotely hm, yeah, as long as all the arguments are either immediate or reference something on the recipient as soon as a third party enters the equation, you have to decide whether to wait for the arguments to resolve locally or if it might be faster to throw them at someone else that's where the network-optimization thing I mentioned before comes into play mmm you send messages to A and to B, once you get both results you want to send the pair to C to do something with them spin me an example scenario Hmm if all three are close to each other, and you're far from all of them, it makes more sense to tell C about A and B how _does_ E handle that or maybe tell A and B about C, tell them "when you get done, send your results to C, who will be waiting for them" warner: yeah, i think that the right thing to do is to wait for them to resolve locally assuming that C can talk to A and B is bad no it isn't well, depends on whether you live in this world or not :) warner: if you want other behaviour then you should have to set it up explicitly, i think I'm not even sure how you would describe that sort of thing. It'd be like routing protocols, you assign a cost to each link and hope some magical omniscient entity can pick an optimal solution ** revealing intentions Now suppose I say "B.your_fired(C.revoke_his_rights())", or such. A->C: sell all my stock. A->B: declare bankruptcy If B has access to C, and the promises are pipelined, then B has a window during which they know something's about to happen, and they still have full access to C, so they can do evil. Zooko tried to explain the concern to MarkM years ago, but didn't have a clear example of the problem. The thing is, B can do evil all the time, you're just trying to revoke their capability *before* they get wind of your intentions. Keeping intentions secret is hard, much harder than limiting someone's capabilities. It's kind of the trailing edge of the capability, as opposed to the leading edge. Zooko feels the language needs clear support for expressing how the synchronization needs to take place, and which domain it needs to happen in. * web-calculus integration Tyler pointed out that it is vital for a node to be able to grant limited access to some held object. Specifically, Alice may want to give Bob a reference not to Carol as a whole, but to just a specific Carol.remote_foo method (and not to any other methods that Alice might be allowed to invoke). I had been thinking of using RemoteInterfaces to indicate method subsets, something like this: bob.callRemote("introduce", Facet(self, RIMinimal)) but Tyler thinks that this is too coarse-grained and not likely to encourage the right kinds of security decisions. In his web-calculus, recipients can grant third-parties access to individual bound methods. bob.callRemote("introduce", carol.getMethod("howdy")) If I understand it correctly, his approach makes Referenceables into a copy-by-value object that is represented by a dictionary which maps method names to these RemoteMethod objects, so there is no actual callRemote(methname) method. Instead you do something like: rr = tub.getReference(url) d = rr['introduce'].call(args) These RemoteMethod objects are top-level, so unguessable URLs must be generated for them when they are sent, and they must be reference-counted. It must not be possible to get from the bound method to the (unrestricted) referenced object. TODO: how does the web-calculus maintain reference counts for these? It feels like there would be an awful lot of messages being thrown around. To implement this, we'll need: banana sequences for bound methods ('my-method', clid, url) ('your-method', clid) ('their-method', url, RI+methname?) syntax to carve a single method out of a local Referenceable A: self.doFoo (only if we get rid of remote_) B: self.remote_doFoo C: self.getMethod("doFoo") D: self.getMethod(RIFoo['doFoo']) leaning towards C or D syntax to carve a single method out of a RemoteReference A: rr.doFoo B: rr.getMethod('doFoo') C: rr.getMethod(RIFoo['doFoo']) D: rr['doFoo'] E: rr[RIFoo['doFoo']] leaning towards B or C decide whether to do getMethod early or late early means ('my-reference') includes a big dict of my-method values and a whole bunch of DECREFs when that dict goes away late means there is a remote_tub.getMethod(your-ref, methname) call and an extra round-trip to retrieve them dash thinks late is better We could say that the 'my-reference' sequence for any RemoteInterface-enabled Referenceable will include a dictionary of bound methods. The receiving end will just stash the whole thing. * do implicit "doFoo" -> RIFoo["doFoo"] conversion I want rr.callRemote("doFoo", args) to take advantage of a RemoteInterface, if one is available. RemoteInterfaces aren't supposed to be overlapping (at least not among RemoteInterfaces that are shared by a single Referenceable), so there shouldn't be any ambiguity. If there is, we can raise an error. * accept Deferreds as arguments? bob.callRemote("introduce", target=self.tub.getReference(pburl)) or bob.callRemote("introduce", carol.getMethod("doFoo")) instead of carol.getMethod("doFoo").addCallback(lambda r: bob.callRemote("introduce", r)) If one of the top-level arguments to callRemote is a Deferred, don't send the method request until all the arguments resolve. If any of the arguments errback, the callRemote will fail with some new exception (that can contain a reference to the argument's exception). however, this would mean the method would be invoked out-of-order w.r.t. an immediately-following bob.callRemote put this off until we get some actual experience. * batch decrefs? If we implement the copy-by-value Referenceable idea, then a single gc may result in dozens of simultaneous decrefs. It would be nice to reduce the traffic generated by that. * promise pipelining Promise(Deferred).__getattr__ DoS prevention techniques in CapIDL (MarkM) pb://key@ip,host,[ipv6],localhost,[/unix]/swissnumber tubs for lifetime management separate listener object, share tubs between listeners distinguish by key number actually, why bother with separate keys? Why allow the outside world to distinguish between these sub-Tubs? Use them purely for lifetime management, not security properties. That means a name->published-object table for each SubTub, maybe a hierarchy of them, and the parent-most Tub gets the Listeners. Incoming getReferenceByURL requests require a lookup in all Tubs that descend from the one attached to that listener. So one decision is whether to have implicitly-published objects have a name that lasts forever (well, until the Tub is destroyed), or if they should be reference-counted. If they are reference counted, then outstanding Gifts need to maintain a reference, and the gift must be turned into a live RemoteReference right away. It has bearing on how/if we implement SturdyRefs, so I need to read more about them in the E docs. Hrm, and creating new Tubs from within a remote_foo method.. to make that useful, you'd need to have a way to ask for the Tub through which you were being invoked. hrm. * creating new Tubs Tyler suggests using Tubs for namespace management. Tubs can share TCP listening ports, but MarkS recommends giving them all separate keys (which means separate SSL sessions, so separate TCP connections). Bill Frantz discourages using a hierarchy of Tubs, says it's not the sort of thing you want to be locked into. That means I'll need a separate Listener object, where the rule is that the last Tub to be stopped makes the Listener stop too.. probably abuse the Service interface in some wacky way to pull this off. Creating a new Tub.. how to conveniently create it with the same Listeners as the current one? If the method that's creating the Tub is receiving a reference, the Tub can be an attribute of the inbound RemoteReference. If not, that's trickier.. the _tub= argument may still be a useful way to go. Once you've got a source tub, then tub.newTub() should create a new one with the same Listeners as the source (but otherwise unassociated with it). Once you have the new Tub, registering an object in it should return something that can be directly serialized into a gift. class Target(pb.Referenceable): def remote_startGame(self, player_black, player_white): tub = player_black.tub.newTub() game = self.createGame() gameref = tub.register(game) game.setPlayer("black", tub.something(player_black)) game.setPlayer("white", tub.something(player_white)) return gameref Hmm. So, create a SturdyRef class, which remembers the tubid (key), list of location hints, and object name. These have a url() method that renders out a URL string, and a compare method which compares the tubid and object name but ignores the location hints. Serializing a SturdyRef creates a their-reference sequence. Tub.register takes an object (and maybe a name) and returns a SturdyRef. Tub.getReference takes either a URL or a SturdyRef. RemoteReferences should have a .getSturdyRef method. Actually, I think SturdyRefs should be serialized as Copyables, and create SturdyRefs on the other side. The new-tub sequence should be: create new tub, using the Listener from an existing tub register the objects in the new tub, obtaining a SturdyRef send/return SendLiveRef(sturdyref) to the far side SendLiveRef is a wrapper that causes a their-reference sequence to be sent. The alternative is to obtain an actual live reference (via player_black.tub.getReference(sturdyref) first), then send that, but it's kind of a waste if you don't actually want to use the liveref yourself. Note that it becomes necessary to provide for local references here: ones in different Tubs which happen to share a Listener. These can use real TCP connections (unless the Listener hint is only valid from the outside world). It might be possible to use some tricks cut out some of the network overhead, but I suspect there are reasons why you wouldn't actually want to do that.