Updated List of Spark Alternatives

On 5 Dec, 2014 By michaelmalak 0 Comments

Eight months ago, I blogged a list of Spark alternatives. It's time to update that list:

Memory Based

BID Data Project — Out of Berkeley (like Spark) but more recent, from 2013. I blogged about this project a couple of months ago, but I don't believe it's gotten the attention it deserves, and I believe it's the next big thing beyond Spark. The authors claim a single node with GPGPU is faster, when performing logistic regression, than a 100-node Spark cluster, although the Spark team notes that the comparison was against Spark 0.9 and that several improvements to Spark have since been made or are planned.
Distributed R — A cluster computing version of the R programming language
Parallel Processing in Python — A number of systems for cluster computing in the Python programming language (in addition to the one that Monte Lunacek presented last year)
Apache Ignite (Open sourced by GridGain)
Apache Flink (Formerly Stratosphere)
H2O — Either stand-alone or integrated with Spark as Sparkling Water
Piccolo
RAMCloud