A few hours ago, Reynold Xin created a Jira Epic with the compelling title Improving Physical Execution and Memory Management. It promises:
- Much more efficient memory usage & robustness
- Much more efficient execution
We will start with the DataFrame API, which gives us more application semantics, and eventually improve core as well.
For April Fools Day this month, Tesla tweeted that it was introducing a wristwatch, the "Model W":
Reports of where Data Science has successfully optimized businesses have started to flow in the mass media and social media:
There are some words that are contronyms, with "sanction" being the most famous example (half the time meaning "allow" and the other half the time meaning "disallow"). "Modeling" is a contronym.
In the realm of data science, it usually means statistical modeling. This is entirely empirical. But as I wrote long ago on identifying causality, modeling can also mean constructing an engineering model that explains system behavior from first principles. The former is empirical and the latter is theory.
I've blogged about BIDMach before (Single GPU-Powered Node 4x Faster Than 50-node Spark Cluster), which is a much newer project from AMPLab than Spark. But BIDMach, although it has plans for cluster operation, apparently does not have it yet.