Michael.Walker's blog

Stream-lib: Stream Summarizer and Cardinality Estimator

Stream Library is a Java library for summarizing data in streams for which it is infeasible to store all events. More specifically, there are classes for estimating: cardinality (i.e. counting things); set membership; top-k elements and frequency. One particularly useful feature is that cardinality estimators with compatible configurations may be safely merged.

These classes may be used directly in a JVM project or with the provided shell scripts and good old Unix IO redirection.

Kafka New Release

Apache Kafka is high-throughput, publish-subscribe messaging system rethought of as a distributed commit log. The new Kafka release introduces many new features, improvements and fixes including:

 - A new Java producer for ease of implementation and enhanced performance.
 - A Kafka-based offset storage.
 - Delete topic support.
 - Per topic configuration of preference for consistency over availability.
 - Scala 2.11 support and dropping support for Scala 2.8.
 - LZ4 Compression.