I actually wrote something similar in bash which I use frequently when I need to munge a table of numbers on the command line [1]. The whole time I was thinking I should really be doing this in common lisp.
here's the announcement email (very boiled down version of the readme, essentially):
babbage is a library for easily gathering data and computing summary measures in a declarative way.
The summary measure functionality allows you to compute multiple
measures over arbitrary partitions of your input data simultaneously
and in a single pass. You just say what you want to compute:
The functions :x, :y, and #(+ (or (:x %) 0) (or (:y %) 0)) defined in
the fields map are called once per input element no matter how many
sets the element contributes to. The function #(contains? % y) is also
called once per input element, no matter how many unions,
intersections, complements, etc. the set :has-y contributes to.
A variety of measure functions, and structured means of combining
them, are supplied; it's also easy to define additional measures.
babbage also supplies a method for running computations structured as
dependency graphs; this can make gathering the initial data for
summarizing simpler to express. To give an example that's probably
familiar from another context:
Options are provided for parallel, sequential, and lazy computation of
the elements of the result map, and for resolving the dependency graph
in advance of running the computation for a given input, either at
runtime or at compile time.
Cutting to the chase, does this make the summary results available in the midst of the sequence; eg: if it takes two hours to gather pressure data (or any other time series data) does this expose the running variance 10 minutes in, an hour in, etc. ?
Not currently, but it would certainly be possible to add something like that---exposing the running stats for partial subsequences of the input sequence would just be a matter of replacing the "reduce" in the definition of calculate with "reductions" (well, and at least one other change, but at a similar level of complexity). That wouldn't give you ten, sixty, etc. minutes in to the data gathering, because it wouldn't be tied to how long the actual computation of the elements of the input seq---something outside calculate's purview, ATM---was taking, but it would start delivering running answers right away.
Running answers right away is fairly useful; a bit of a challenge in that problem domain is with multichannel sensors ("cameras" with multiple frequency bands, satellites like MODIS, radiometric spectrometers, etc.) where the sharpest "image" is produced by using an SVD (singular value decomposition) type transform to reduce (say) 256 input channels to (say) 6 major dimensions and using those to recreate an enhanced image. Producing branchless code to generate basic running stats (min, mean, max, variance, trends) on multiple input channels is a bit of puzzle, generating an efficient rolling SVD enhancement (best image based on most recent observations) is a bit trickier.
The application areas are continuous processing of continuously arriving data, infinite unbounded sequences.
[1] http://eschulte.github.com/data-wrapper/