Blogs
Rmux: Redis Connection Pooler and Multiplexer

Rmux is a Redis connection pooler and multiplexer, written in Go. Rmux is meant to be used for LAMP stacks, or other short-lived process applications, with high request volume. It should be run as a client, on every server that connects to redis - to reduce the total inbound connection count to the redis servers, while handle consistent multiplexing.
The Emerging Data Stack

Gordon Moore Foundation Giving USD $1.5 Million to Data Scientists

Moore’s new law is that big data will lead to big science. The Gordon and Betty Moore Foundation plans to give USD $1.5 million grants (in $200 000 to $300 000 yearly installments) to 15 worthy interdisciplinary scientists who can develop and use new algorithms, machine learning techniques, and other data-intensive science tricks to turn huge volumes of data into amazing scientific discoveries.
Data Engineering is the foundation of the "big data" buzz.
If you are a well read professional, chances are you are well aware of the big data era. But still to provide a background, here is how I define the big data age :
"With the recent advancement in the way we store,manage and process data, Companies can afford to get deeper insights in their data at the same or rather less cost than a decade ago."
Deconstructing the Octopus

A colleague of mine, Michael Walker, from the newly formed Data Science Association, sent me an email with an outrageous logo.
Yes, this is the logo for a satellite launched by the United States on December 5, 2013, even with the backdrop of Snowden’s NSA revelations.
Data Size Matters

UCI Machine Learning Repository

Access 264 data sets at http://archive.ics.uci.edu/ml/datasets.html.
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
Datasets from UCI also available at http://www.sgi.com/tech/mlc/db/.
Real-time data science

Looking back on 2013, the world of Hadoop emerged from the era of batch processing and into streaming processing. In the context of "crisp and actionable," actionable often comes with an expiration date. If you don't take action soon enough, it's too late.
Top Ten Algorithms in Data Mining

Identifying some of the most influential algorithms that are widely used in the data mining community, The Top Ten Algorithms in Data Mining provides a description of each algorithm, discusses its impact, and reviews current and future research.