Project Iron: More Efficient Spark Execution
A few hours ago, Reynold Xin created a Jira Epic with the compelling title Improving Physical Execution and Memory Management. It promises:
- Much more efficient memory usage & robustness
- Much more efficient execution
We will start with the DataFrame API, which gives us more application semantics, and eventually improve core as well.
This could help address some surprisingly poor benchmark numbers from Spark SQL 1.1 in October 2014 against Impala. That was two versions ago, and the pace of Spark development has been very rapid, but it's the most recent I've been able to find.
Thanks to Reynold Xin for getting this Project Iron underway.
UPDATE 2015-04-28: Project Iron has been renamed to Project Tungsten with details released.