root/doc/jobs.txt

Revision 203:0329dd32082b, 27.5 kB (checked in by "Brian Warner <warner@lothar.com>", 1 year ago)

docs: rename some files

Line 
1 -*- outline -*-
2
3 Reasonably independent newpb sub-tasks that need doing. Most important come
4 first.
5
6 * decide on a version negotiation scheme
7
8 Should be able to telnet into a PB server and find out that it is a PB
9 server. Pointing a PB client at an HTTP server (or an HTTP client at a PB
10 server) should result in an error, not a timeout. Implement in
11 banana.Banana.connectionMade().
12
13 desiderata:
14
15  negotiation should take place with regular banana sequences: don't invent a
16  new protocol that is only used at the start of the connection
17
18  Banana should be useable one-way, for storage or high-latency RPC (the mnet
19  folks want to create a method call, serialize it to a string, then encrypt
20  and forward it on to other nodes, sometimes storing it in relays along the
21  way if a node is offline for a few days). It should be easy for the layer
22  above Banana to feed it the results of what its negotiation would have been
23  (if it had actually used an interactive connection to its peer). Feeding the
24  same results to both sides should have them proceed as if they'd agreed to
25  those results.
26
27  negotiation should be flexible enough to be extended but still allow old
28  code to talk with new code. Magically predict every conceivable extension
29  and provide for it from the very first release :).
30
31 There are many levels to banana, all of which could be useful targets of
32 negotiation:
33
34  which basic tokens are in use? Is there a BOOLEAN token? a NONE token? Can
35  it accept a LONGINT token or is the target limited to 32-bit integers?
36
37  are there any variations in the basic Banana protocol being used? Could the
38  smaller-scope OPEN-counter decision be deferred until after the first
39  release and handled later with a compatibility negotiation flag?
40
41  What "base" OPEN sequences are known? 'unicode'? 'boolean'? 'dict'? This is
42  an overlap between expressing the capabilities of the host language, the
43  Banana implementation, and the needs of the application. How about
44  'instance', probably only used for StorageBanana?
45
46  What "top-level" OPEN sequences are known? PB stuff (like 'call', and
47  'your-reference')? Are there any variations or versions that need to be
48  known? We may add new functionality in the future, it might be useful for
49  one end to know whether this functionality is available or not. (the PB
50  'call' sequence could some day take numeric argument names to convey
51  positional parameters, a 'reference' sequence could take a string to
52  indicate globally-visible PB URLs, it could become possible to pass
53  target.remote_foo directly to a peer and have a callable RemoteMethod object
54  pop out the other side).
55
56  What "application-level" sequences are available? (Which RemoteInterface
57  classes are known and valid in 'call' sequences? Which RemoteCopy names are
58  valid for targets of the 'copy' sequence?). This is not necessarily within
59  the realm of Banana negotiation, but applications may need to negotiate this
60  sort of thing, and any disagreements will be manifested when Banana starts
61  raising Violations, so it may be useful to include it in the Banana-level
62  negotiation.
63
64 On the other hand, negotiation is only useful if one side is prepared to
65 accomodate a peer which cannot do some of the things it would prefer to use,
66 or if it wants to know about the incapabilities so it can report a useful
67 failure rather than have an obscure protocol-level error message pop up an
68 hour later. So negotiation isn't the only goal: simple capability awareness
69 is a useful lesser goal.
70
71 It kind of makes sense for the first object of a stream to be a negotiation
72 blob. We could make a new 'version' opentype, and declare that the contents
73 will be something simple and forever-after-parseable (like a dict, with heavy
74 constraints on the keys and values, all strings emitted in full).
75
76 DONE, at least the framework is in place. Uses HTTP-style header-block
77 exchange instead of banana sequences, with client-sends-first and
78 server-decides. This correctly handles PB-vs-HTTP, but requires a timeout to
79 detect oldpb clients vs newpb servers. No actual feature negotiation is
80 performed yet, because we still only have the one version of the code.
81
82 * connection initiation
83
84 ** define PB URLs
85
86 [newcred is the most important part of this, the URL stuff can wait]
87
88 A URL defines an endpoint: a pb.Referenceable, with methods. Somewhere along
89 the way it defines a transport (tcp+host+port, or unix+path) and an object
90 reference (pathname). It might also define a RemoteInterface, or that might
91 be put off until we actually invoke a method.
92
93  URL = f("pb:", host, port, pathname)
94  d = pb.callRemoteURL(URL, ifacename, methodname, args)
95
96 probably give an actual RemoteInterface instead of just its name
97
98 a pb.RemoteReference claims to provide access to zero-or-more
99 RemoteInterfaces. You may choose which one you want to use when invoking
100 callRemote.
101
102 TODO: decide upon a syntax for URLs that refer to non-TCP transports
103  pb+foo://stuff, pby://stuff (for yURL-style self-authenticating names)
104
105 TODO: write the URL parser, implementing pb.getRemoteURL and pb.callRemoteURL
106  DONE: use a Tub/PBService instead
107
108 TODO: decide upon a calling convention for callRemote when specifying which
109 RemoteInterface is being used.
110
111
112 DONE, PB-URL is the way to go.
113 ** more URLs
114
115 relative URLs (those without a host part) refer to objects on the same
116 Broker. Absolute URLs (those with a host part) refer to objects on other
117 Brokers.
118
119 SKIP, interesting but not really useful
120
121 ** build/port pb.login: newcred for newpb
122
123 Leave cred work for Glyph.
124
125 <thomasvs> has some enhanced PB cred stuff (challenge/response, pb.Copyable
126 credentials, etc).
127
128 URL = pb.parseURL("pb://lothar.com:8789/users/warner/services/petmail",
129                   IAuthorization)
130 URL = doFullLogin(URL, "warner", "x8yzzy")
131 URL.callRemote(methodname, args)
132
133 NOTDONE
134
135 * constrain ReferenceUnslicer properly
136
137 The schema can use a ReferenceConstraint to indicate that the object must be
138 a RemoteReference, and can also require that the remote object be capable of
139 handling a particular Interface.
140
141 This needs to be implemented. slicer.ReferenceUnslicer must somehow actually
142 ask the constraint about the incoming tokens.
143
144 An outstanding question is "what counts". The general idea is that
145 RemoteReferences come over the wire as a connection-scoped ID number and an
146 optional list of Interface names (strings and version numbers). In this case
147 it is the far end which asserts that its object can implement any given
148 Interface, and the receiving end just checks to see if the schema-imposed
149 required Interface is in the list.
150
151 This becomes more interesting when applied to local objects, or if a
152 constraint is created which asserts that its object is *something* (maybe a
153 RemoteReference, maybe a RemoteCopy) which implements a given Interface. In
154 this case, the incoming object could be an actual instance, but the class
155 name must be looked up in the unjellyableRegistry (and the class located, and
156 the __implements__ list consulted) before any of the object's tokens are
157 accepted.
158
159 * security TODOs:
160
161 ** size constraints on the set-vocab sequence
162
163 * implement schema.maxSize()
164
165 In newpb, schemas serve two purposes:
166
167  a) make programs safer by reducing the surprises that can appear in their
168     arguments (i.e. factoring out argument-checking in a useful way)
169
170  b) remove memory-consumption DoS attacks by putting an upper bound on the
171     memory consumed by any particular message.
172
173 Each schema has a pair of methods named maxSize() and maxDepth() which
174 provide this upper bound. While the schema is in effect (say, during the
175 receipt of a particular named argument to a remotely-invokable method), at
176 most X bytes and Y slicer frames will be in use before either the object is
177 accepted and processed or the schema notes the violation and the object is
178 rejected (whereupon the temporary storage is released and all further bytes
179 in the rejected object are simply discarded). Strictly speaking, the number
180 returned by maxSize() is the largest string on the wire which has not yet
181 been rejected as violating the constraint, but it is also a reasonable
182 metric to describe how much internal storage must be used while processing
183 it. (To achieve greater accuracy would involve knowing exactly how large
184 each Python type is; not a sensible thing to attempt).
185
186 The idea is that someone who is worried about an attacker throwing a really
187 long string or an infinitely-nested list at them can ask the schema just what
188 exactly their current exposure is. The tradeoff between flexibility ("accept
189 any object whatsoever here") and exposure to DoS attack is then user-visible
190 and thus user-selectable.
191
192 To implement maxSize() for a basic schema (like a string), you simply need
193 to look at banana.xhtml and see how basic tokens are encoded (you will also
194 need to look at banana.py and see how deserialization is actually
195 implemented). For a schema.StringConstraint(32) (which accepts strings <= 32
196 characters in length), the largest serialized form that has not yet been
197 either accepted or rejected is:
198
199   64 bytes (header indicating 0x000000..0020 with lots of leading zeros)
200  + 1 byte (STRING token)
201  + 32 bytes (string contents)
202  = 97
203
204 If the header indicates a conforming length (<=32) then just after the 32nd
205 byte is received, the string object is created and handed to up the stack, so
206 the temporary storage tops out at 97. If someone is trying to spam us with a
207 million-character string, the serialized form would look like:
208
209   64 bytes (header indicating 1-million in hex, with leading zeros)
210 +  1 byte (STRING token)
211 = 65
212
213 at which point the receive parser would check the constraint, decide that
214 1000000 > 32, and reject the remainder of the object.
215
216 So (with the exception of pass/fail maxSize values, see below), the following
217 should hold true:
218
219  schema.StringConstraint(32).maxSize() == 97
220
221 Now, schemas which represent containers have size limits that are the sum of
222 their contents, plus some overhead (and a stack level) for the container
223 itself. For example, a list of two small integers is represented in newbanana
224 as:
225
226  OPEN(list)
227   INT
228   INT
229  CLOSE()
230
231 which really looks like:
232
233  opencount-OPEN
234   len-STRING-"list"
235   value-INT
236   value-INT
237  opencount-CLOSE
238
239 This sequence takes at most:
240
241  opencount-OPEN: 64+1
242  len-STRING-"list": 64+1+1000  (opentypes are confined to be <= 1k long)
243  value-INT: 64+1
244  value-INT: 64+1
245  opencount-CLOSE: 64+1
246
247 or 5*(64+1)+1000 = 1325, or rather:
248
249   3*(64+1)+1000 + N*(IntConstraint().maxSize())
250
251 So ListConstraint.maxSize is computed by doing some math involving the
252 .maxSize value of the objects that go into it (the ListConstraint.constraint
253 attribute). This suggests a recursive algorithm. If any constraint is
254 unbounded (say a ListConstraint with no limit on the length of the list),
255 then maxSize() raises UnboundedSchema to indicate that there is no limit on
256 the size of a conforming string. Clearly, if any constraint is found to
257 include itself, UnboundedSchema must also be raised.
258
259 This is a loose upper bound. For example, one non-conforming input string
260 would be:
261
262  opencount-OPEN: 64+1
263  len-STRING-"x"*1000: 64+1+1000
264
265 The entire string would be accepted before checking to see which opentypes
266 were valid: the ListConstraint only accepts the "list" opentype and would
267 reject this string immediately after the 1000th "x" was received. So a
268 tighter upper bound would be 2*65+1000 = 1130.
269
270 In general, the bound is computed by walking through the deserialization
271 process and identifying the largest string that could make it past the
272 validity checks. There may be later checks that will reject the string, but
273 if it has not yet been rejected, then it still represents exposure for a
274 memory consumption DoS.
275
276 ** pass/fail sizes
277
278 I started to think that it was necessary to have each constraint provide two
279 maxSize numbers: one of the largest sequence that could possibly be accepted
280 as valid, and a second which was the largest sequence that could be still
281 undecided. This would provide a more accurate upper bound because most
282 containers will respond to an invalid object by abandoning the rest of the
283 container: i.e. if the current active constraint is:
284
285  ListConstraint(StringConstraint(32), maxLength=30)
286
287 then the first thing that doesn't match the string constraint (say an
288 instance, or a number, or a 33-character string) will cause the ListUnslicer
289 to go into discard-everything mode. This makes a significant difference when
290 the per-item constraint allows opentypes, because the OPEN type (a string) is
291 constrained to 1k bytes. The item constraint probably imposes a much smaller
292 limit on the set of actual strings that would be accepted, so no
293 kilobyte-long opentype will possibly make it past that constraint. That means
294 there can only be one outstanding invalid object. So the worst case (maximal
295 length) string that has not yet been rejected would be something like:
296
297   OPEN(list)
298    validthing [0]
299    validthing [1]
300     ...
301    validthing [n-1]
302    long-invalid-thing
303
304 because if the long-invalid thing had been received earlier, the entire list
305 would have been abandoned.
306
307 This suggests that the calculation for ListConstraint.maxSize() really needs
308 to be like
309   overhead
310   +(len-1)*itemConstraint.maxSize(valid)
311   +(1)*itemConstraint.maxSize(invalid)
312
313 I'm still not sure about this. I think it provides a significantly tighter
314 upper bound. The deserialization process itself does not try to achieve the
315 absolute minimal exposure (i.e., the opentype checker could take the set of
316 all known-valid open types, compute the maximum length, and then impose a
317 StringConstraint with that length instead of 1000), because it is, in
318 general, a inefficient hassle. There is a tradeoff between computational
319 efficiency and removing the slack in the maxSize bound, both in the
320 deserialization process (where the memory is actually consumed) and in
321 maxSize (where we estimate how much memory could be consumed).
322
323 Anyway, maxSize() and maxDepth() (which is easier: containers add 1 to the
324 maximum of the maxDepth values of their possible children) need to be
325 implemented for all the Constraint classes. There are some tests (disabled)
326 in test_schema.py for this code: those tests assert specific values for
327 maxSize. Those values are probably wrong, so they must be updated to match
328 however maxSize actually works.
329
330 * decide upon what the "Shared" constraint should mean
331
332 The idea of this one was to avoid some vulnerabilities by rejecting arbitrary
333 object graphs. Fundamentally Banana can represent most anything (just like
334 pickle), including objects that refer to each other in exciting loops and
335 whorls. There are two problems with this: it is hard to enforce a schema that
336 allows cycles in the object graph (indeed it is tricky to even describe one),
337 and the shared references could be used to temporarily violate a schema.
338
339 I think these might be fixable (the sample case is where one tuple is
340 referenced in two different places, each with a different constraint, but the
341 tuple is incomplete until some higher-level node in the graph has become
342 referenceable, so [maybe] the schema can't be enforced until somewhat after
343 the object has actually finished arriving).
344
345 However, Banana is aimed at two different use-cases. One is kind of a
346 replacement for pickle, where the goal is to allow arbitrary object graphs to
347 be serialized but have more control over the process (in particular we still
348 have an unjellyableRegistry to prevent arbitrary constructors from being
349 executed during deserialization). In this mode, a larger set of Unslicers are
350 available (for modules, bound methods, etc), and schemas may still be useful
351 but are not enforced by default.
352
353 PB will use the other mode, where the set of conveyable objects is much
354 smaller, and security is the primary goal (including putting limits on
355 resource consumption). Schemas are enforced by default, and all constraints
356 default to sensible size limits (strings to 1k, lists to [currently] 30
357 items). Because complex object graphs are not commonly transported across
358 process boundaries, the default is to not allow any Copyable object to be
359 referenced multiple times in the same serialization stream. The default is to
360 reject both cycles and shared references in the object graph, allowing only
361 strict trees, making life easier (and safer) for the remote methods which are
362 being given this object tree.
363
364 The "Shared" constraint is intended as a way to turn off this default
365 strictness and allow the object to be referenced multiple times. The
366 outstanding question is what this should really mean: must it be marked as
367 such on all places where it could be referenced, what is the scope of the
368 multiple-reference region (per- method-call, per-connection?), and finally
369 what should be done when the limit is violated. Currently Unslicers see an
370 Error object which they can respond to any way they please: the default
371 containers abandon the rest of their contents and hand an Error to their
372 parent, the MethodCallUnslicer returns an exception to the caller, etc. With
373 shared references, the first recipient sees a valid object, while the second
374 and later recipient sees an error.
375
376
377 * figure out Deferred errors for immutable containers
378
379 Somewhat related to the previous one. The now-classic example of an immutable
380 container which cannot be created right away is the object created by this
381 sequence:
382
383         t = ([],)
384         t[0].append((t,))
385
386 This serializes into (with implicit reference numbers on the left):
387
388 [0] OPEN(tuple)
389 [1]  OPEN(list)
390 [2]   OPEN(tuple)
391 [3]    OPEN(reference #0)
392       CLOSE
393      CLOSE
394     CLOSE
395
396 In newbanana, the second TupleUnslicer cannot return a fully-formed tuple to
397 its parent (the ListUnslicer), because that tuple cannot be created until the
398 contents are all referenceable, and that cannot happen until the first
399 TupleUnslicer has completed. So the second TupleUnslicer returns a Deferred
400 instead of a tuple, and the ListUnslicer adds a callback which updates the
401 list's item when the tuple is complete.
402
403 The problem here is that of error handling. In general, if an exception is
404 raised (perhaps a protocol error, perhaps a schema violation) while an
405 Unslicer is active, that Unslicer is abandoned (all its remaining tokens are
406 discarded) and the parent gets an Error object. (the parent may give up too..
407 the basic Unslicers all behave this way, so any exception will cause
408 everything up to the RootUnslicer to go boom, and the RootUnslicer has the
409 option of dropping the connection altogether). When the error is noticed, the
410 Unslicer stack is queried to figure out what path was taken from the root of
411 the object graph to the site that had an error. This is really useful when
412 trying to figure out which exact object cause a SchemaViolation: rather than
413 being told a call trace or a description of the *object* which had a problem,
414 you get a description of the path to that object (the same series of
415 dereferences you'd use to print the object: obj.children[12].peer.foo.bar).
416
417 When references are allowed, these exceptions could occur after the original
418 object has been received, when that Deferred fires. There are two problems:
419 one is that the error path is now misleading, the other is that it might not
420 have been possible to enforce a schema because the object was incomplete.
421
422 The most important thing is to make sure that an exception that occurs while
423 the Deferred is being fired is caught properly and flunks the object just as
424 if the problem were caught synchronously. This may involve discarding an
425 otherwise complete object graph and blaming the problem on a node much closer
426 to the root than the one which really caused the failure.
427
428 * adaptive VOCAB compression
429
430 We want to let banana figure out a good set of strings to compress on its
431 own. In Banana.sendToken, keep a list of the last N strings that had to be
432 sent in full (i.e. they weren't in the table). If the string being sent
433 appears more than M times in that table, before we send the token, emit an
434 ADDVOCAB sequence, add a vocab entry for it, then send a numeric VOCAB token
435 instead of the string.
436
437 Make sure the vocab mapping is not used until the ADDVOCAB sequence has been
438 queued. Sending it inline should take care of this, but if for some reason we
439 need to push it on the top-level object queue, we need to make sure the vocab
440 table is not updated until it gets serialized. Queuing a VocabUpdate object,
441 which updates the table when it gets serialized, would take care of this. The
442 advantage of doing it inline is that later strings in the same object graph
443 would benefit from the mapping. The disadvantage is that the receiving
444 Unslicers must be prepared to deal with ADDVOCAB sequences at any time (so
445 really they have to be stripped out). This disadvantage goes away if ADDVOCAB
446 is a token instead of a sequence.
447
448 Reasonable starting values for N and M might be 30 and 3.
449
450 * write oldbanana compatibility code?
451
452 An oldbanana peer can be detected because the server side sends its dialect
453 list from connectionMade, and oldbanana lists are sent with OLDLIST tokens
454 (the explicit-length kind).
455
456
457 * add .describe methods to all Slicers
458
459 This involves setting an attribute between each yield call, to indicate what
460 part is about to be serialized.
461
462
463 * serialize remotely-callable methods?
464
465 It might be useful be able to do something like:
466
467  class Watcher(pb.Referenceable):
468      def remote_foo(self, args): blah
469
470  w = Watcher()
471  ref.callRemote("subscribe", w.remote_foo)
472
473 That would involve looking up the method and its parent object, reversing
474 the remote_*->* transformation, then sending a sequence which contained both
475 the object's RemoteReference and the appropriate method name.
476
477 It might also be useful to generalize this: passing a lambda expression to
478 the remote end could stash the callable in a local table and send a Callable
479 Reference to the other side. I can smell a good general-purpose object
480 classification framework here, but I haven't quite been able to nail it down
481 exactly.
482
483 * testing
484
485 ** finish testing of LONGINT/LONGNEG
486
487 test_banana.InboundByteStream.testConstrainedInt needs implementation
488
489 ** thoroughly test failure-handling at all points of in/out serialization
490
491 places where BananaError or Violation might be raised
492
493 sending side:
494  Slicer creation (schema pre-validation? no): no no
495   pre-validation is done before sending the object, Broker.callFinished,
496   RemoteReference.doCall
497   slicer creation is done in newSlicerFor
498
499  .slice (called in pushSlicer) ?
500  .slice.next raising Violation
501  .slice.next returning Deferrable when streaming isn't allowed
502  .sendToken (non-primitive token, can't happen)
503  .newSlicerFor (no ISlicer adapter)
504  top.childAborted
505
506 receiving side:
507  long header (>64 bytes)
508  checkToken (top.openerCheckToken)
509  checkToken (top.checkToken)
510  typebyte == LIST (oldbanana)
511  bad VOCAB key
512  too-long vocab key
513  bad FLOAT encoding
514  top.receiveClose
515  top.finish
516  top.reportViolation
517  oldtop.finish (in from handleViolation)
518  top.doOpen
519  top.start
520 plus all of these when discardCount != 0
521 OPENOPEN
522
523 send-side uses:
524  f = top.reportViolation(f)
525 receive-side should use it too (instead of f.raiseException)
526
527 ** test failure-handing during callRemote argument serialization
528
529 ** implement/test some streaming Slicers
530
531 ** test producer Banana
532
533 * profiling/optimization
534
535 Several areas where I suspect performance issues but am unwilling to fix
536 them before having proof that there is a problem:
537
538 ** Banana.produce
539
540 This is the main loop which creates outbound tokens. It is called once at
541 connectionMade() (after version negotiation) and thereafter is fired as the
542 result of a Deferred whose callback is triggered by a new item being pushed
543 on the output queue. It runs until the output queue is empty, or the
544 production process is paused (by a consumer who is full), or streaming is
545 enabled and one of the Slicers wants to pause.
546
547 Each pass through the loop either pushes a single token into the transport,
548 resulting in a number of short writes. We can do better than this by telling
549 the transport to buffer the individual writes and calling a flush() method
550 when we leave the loop. I think Itamar's new cprotocol work provides this
551 sort of hook, but it would be nice if there were a generalized Transport
552 interface so that Protocols could promise their transports that they will
553 use flush() when they've stopped writing for a little while.
554
555 Also, I want to be able to move produce() into C code. This means defining a
556 CSlicer in addition to the cprotocol stuff before. The goal is to be able to
557 slice a large tree of basic objects (lists, tuples, dicts, strings) without
558 surfacing into Python code at all, only coming "up for air" when we hit an
559 object type that we don't recognize as having a CSlicer available.
560
561 ** Banana.handleData
562
563 The receive-tokenization process wants to be moved into C code. It's
564 definitely on the critical path, but it's ugly because it has to keep
565 calling into python code to handle each extracted token. Maybe there is a
566 way to have fast C code peek through the incoming buffers for token
567 boundaries, then give a list of offsets and lengths to the python code. The
568 b128 conversion should also happen in C. The data shouldn't be pulled out of
569 the input buffer until we've decided to accept it (i.e. the
570 memory-consumption guarantees that the schemas provide do not take any
571 transport-level buffering into account, and doing cprotocol tokenization
572 would represent memory that an attacker can make us spend without triggering
573 a schema violation). Itamar's CLineReceiver is a good example: you tokenize
574 a big buffer as much as you can, pass the tokens upstairs to Python code,
575 then hand the leftover tail to the next read() call. The tokenizer always
576 works on the concatenation of two buffers: the tail of the previous read()
577 and the complete contents of the current one.
578
579 ** Unslicer.doOpen delegation
580
581 Unslicers form a stack, and each Unslicer gets to exert control over the way
582 that its descendents are deserialized. Most don't bother, they just delegate
583 the control methods up to the RootUnslicer. For example, doOpen() takes an
584 opentype and may return a new Unslicer to handle the new OPEN sequence. Most
585 of the time, each Unslicer delegates doOpen() to their parent, all the way
586 up the stack to the RootUnslicer who actually performs the UnslicerRegistry
587 lookup.
588
589 This provides an optimization point. In general, the Unslicer knows ahead of
590 time whether it cares to be involved in these methods or not (i.e. whether
591 it wants to pay attention to its children/descendants or not). So instead of
592 delegating all the time, we could just have a separate Opener stack.
593 Unslicers that care would be pushed on the Opener stack at the same time
594 they are pushed on the regular unslicer stack, likewise removed. The
595 doOpen() method would only be invoked on the top-most Opener, removing a lot
596 of method calls. (I think the math is something like turning
597 avg(treedepth)*avg(nodes) into avg(nodes)).
598
599 There are some other methods that are delegated in this way. open() is
600 related to doOpen(). setObject()/getObject() keep track of references to
601 shared objects and are typically only intercepted by a second-level object
602 which defines a "serialization scope" (like a single remote method call), as
603 well as connection-wide references (like pb.Referenceables) tracked by the
604 PBRootUnslicer. These would also be targets for optimization.
605
606 The fundamental reason for this optimization is that most Unslicers don't
607 care about these methods. There are far more uses of doOpen() (one per
608 object node) then there are changes to the desired behavior of doOpen().
609
610 ** CUnslicer
611
612 Like CSlicer, the unslicing process wants to be able to be implemented (for
613 built-in objects) entirely in C. This means a CUnslicer "object" (a struct
614 full of function pointers), a table accessible from C that maps opentypes to
615 both CUnslicers and regular python-based Unslicers, and a CProtocol
616 tokenization code fed by a CTransport. It should be possible for the
617 python->C transition to occur in the reactor when it calls ctransport.doRead
618 python->and then not come back up to Python until Banana.receivedObject(),
619 at least for built-in types like dicts and strings.
Note: See TracBrowser for help on using the browser.