# MedianTracker

## A pure Python module to track the median of a stream of values "on-line" in reasonably efficient
fashion.

Usage:

from mediantracker import MedianTracker
tracker = MedianTracker()
tracker.add(1)
tracker.add(2)
tracker.add(3)
assert tracker.lower_median() == 2
tracker.add(4)
assert tracker.upper_median() == 3

`MedianTracker` supports efficient median queries on and dynamic
additions to a list of values. It provides both the lower and upper median
of all values seen so far. Any `__cmp__()`-able object can be tracked,
in addition to numeric types. `add()` takes *log(n)* time for a tracker
with *n* items; `lower_median()` and `upper_median()` run in constant
time. Since all values must be stored, memory usage is proportional to the
number of values added (*O(n)*).

The values can be accessed via iteration over the `MedianTracker`,
though they are not ordered in any particular way:

sum = 0.0
for val in tracker:
sum += val
mean = sum / len(tracker)

Use this module if you are processing values "on-line," one at a time, as
they arrive, and need to know the new median after each new value (or
batch of values). If you just want the median of a whole list, there are
much more efficient linear-time median (or more general k-th smallest
selection) algorithms. Using this module will take *O(nlogn)* time and an
extra *O(n)* space to compute the median of *n* items. On the other hand,
a `MedianTracker` will only take *O(nlogn + m)* time for any sequence of
*n* adds and *m* median queries, whereas running a traditional
non-incremental median algorithm *m* times would take *O(n*m)*.

Finally, some sources define the median of an even-length list to be the
average of the middle two values. This is easily and efficiently computed
(in constant time):

tracker = MedianTracker([1, 2, 3, 4])
median = (tracker.lower_median() + tracker.upper_median()) / 2.0
assert median == 2.5

`mediantracker`
can be installed via easy_install
or downloaded from SourceForge.