Stream-lib: Stream Summarizer and Cardinality Estimator
Stream Library is a Java library for summarizing data in streams for which it is infeasible to store all events. More specifically, there are classes for estimating: cardinality (i.e. counting things); set membership; top-k elements and frequency. One particularly useful feature is that cardinality estimators with compatible configurations may be safely merged.
These classes may be used directly in a JVM project or with the provided shell scripts and good old Unix IO redirection.