Opened 5 years ago

Closed 3 years ago

#247 closed task (fixed)

serialize logs with JSON, not pickle

Reported by: Brian Warner Owned by:
Priority: major Milestone: 0.13.0
Component: logging Version: 0.9.1
Keywords: Cc:


split out of #244

There are several of places which might serialize log events:

  • via LogFileObserver, when $FLOGFILE= is set at process startup
  • in a locally-written Incident file, via foolscap.logging.log.setLogDir()
  • the log gatherer and incident gatherer
  • via flogtool tail --save-to=
  • in the output of flogtool filter

All of these currently use the stdlib Pickle module to read and write serialized log events. The files (which may be BZ2-compressed, or uncompressed) contain concatenated pickle bytestrings, so they can be read by calling pickle.load(f) repeatedly, until it throws an error.

This ticket is about replacing these pickles with something safer, namely JSON. Pickles are unsafe, and obviously I should have never used them in the first place.

My plan is to delete all the pickle-handling code and replace it with JSON-handling code (newline-separated JSON records, read with f.readlines() and JSON.loads(). The new version won't import pickle at all, and the only nod to backwards compatibility will be a check that looks at the first few bytes of the logfile to see if it might be a pickle. If we see one, the tool will print an apology message which explains the unsafety, and mentions that installing an old version of Foolscap is the only way to read these logfiles.

Change History (5)

comment:1 Changed 5 years ago by Brian Warner

Two problems I've encountered so far with the de-pickle-ization task:

  • foolscap includes an "incarnation record" in the log "header" (stored as the first event in each serialized logfile). This includes a randomly-generated 8-byte "unique" value. We need to change this hex- or base64- encode the value first.
  • log.err() stores a Failure or CopiedFailure object into the event buffer. This was great when we were serializing with pickle or Banana, but they aren't JSONable.

We need a way to serialize the relevant data from a Failure into something JSON-compatible. We also need a good place to do the serialization. There is a clause in log.msg() which looks for a failure= argument, which is basically the one polite way to pass in a Failure object (note that log.err() uses log.msg(failure=) internally). This is the most obvious place to catch Failures. The clause currently replaces any Failure with a CopiedFailure, and this could be enhanced to replace it with something else.

What should we replace it with? It'd be nice if the serialization were lazy. I don't know how much work is done when the Failure is created, but it'd be nice if we didn't do a lot of additional work inside log.msg(), since chances are good that the event will be discarded without anybody ever looking at it. If we were using Pickle, we could replace it with a wrapper object whose __getstate()__ method did the conversion. Perhaps the json.dumps(object_hook=) hook can help.

comment:2 Changed 5 years ago by Brian Warner


comment:3 Changed 5 years ago by Brian Warner


Milestone renamed

comment:4 Changed 5 years ago by Brian Warner


Moving these tickets from 0.12 to 0.13, since we need 0.12 out quickly to resolve #263

comment:5 Changed 3 years ago by Brian Warner

Resolution: fixed
Status: newclosed

fixed by [0708923]

Note: See TracTickets for help on using tickets.