id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
61	logging: implement Incident handling	Brian Warner		"It is time to finally implement the ""triggered logging"" feature that is the
whole purpose of foolscap logging: dump the circular event buffers when
something serious happens.

My plan is centered around the idea of an ""Incident"". There will be an
""Incident Qualifier"" which watches the event stream and gets to declare when
an incident has occurred: the default one will fire when events above a
certain severity are witnessed, but this can be overridden. Then there will
be an ""Incident Reporter"" class which is instantiated when the qualifier
fires, and is responsible for pulling events out of the buffers and writing
them to a file.

The reporter needs access to a working directory. It will populate this with
""TIMESTAMP.incident.flog"" files, each of which is a pickled list of event
dicts (just like the .flog files created by {{{flogtool tail --save-to}}} and
the log-gatherer).

Things I haven't figured out yet:

 * '''remote access''': I think that the flogport protocol
   ({{{RILogPublisher}}}) should have a way to ask about existing incidents,
   fetch their contents, and subscribe to hear about new ones
 * '''marking the triggering event''': Each incident.flog file could have the
   event that triggered the incident marked specially. In general I think the
   flogfiles need an extension mechanism.. perhaps we should declare that the
   first pickled object in the file is a dictionary, with contents that are
   currently ignored. This would be a compatiblity-breaking change, but
   perhaps better now than later.
   * A non-breaking approach would be to put a synthetic event as the first
     one in the file, but I'm concerned that this would confuse tools that
     want to use the time or number of the first event to summarize the
     contents of the file.
   * an extension dict like this could also help with another problem: giving
     the remote incident publisher a way to describe the incidents to the
     subscriber: they might only wish to fetch recent incidents, rather than
     old ones that they already know about. Without some sort of metadata,
     the incident publisher has only the filename to go on (or it must
     examine the full contents of the flogfile to produce this summary).

The Incident Reporter should be able to record some number of events that
occur '''after''' the trigger: this might capture the application's response
to the problem. I'm thinking 100 events or 5 seconds, whichever comes first.
I'd like the incident file to be compressed, and I don't want to depend upon
external /usr/bin/bzip2 tools. This is complicated by the fact that we can't
be sure that the app will continue running for much longer. So my plan is:

 * open two files: one compressed, one uncompressed.
 * sort the existing events, pickle them, dump them into both files, flush,
   but leave the filehandles open
 * the Reporter will stick around for 100 events or 5 seconds, subscribed to
   hear about subsequent events. Each event is written into the files.
 * when the post-trigger window closes, close both files. If the .bz2 close
   is successful, delete the non-compressed file.

That will give us an uncompressed file that will survive the app quitting
quickly, or a compressed file if the app lasts long enough.

 
"	task	closed	major	0.2.6	logging	0.2.5	fixed