This morning at Spark Summit Europe 2016, Ion Stoica announced during his keynote the Drizzle project, which promises to reduce streaming data latency in Spark to be less than Flink and Storm. Ion announced this in the context of the new RISE Lab at UC Berkeley.
For years now, companies have been hiding behind the moniker "machine learning" as a way to avoid the stigma associated (or associated in the past from the AI winters) with "artificial intelligence". Well, no more. This week Google announced Google Neural Machine Translation (GNMT) that gets close to human performance.
The free 2016 O'Reilly Data Science Salary Survey has been released. This year, they applied a handy linear regression and spelled out the entire model on page 42. A couple of things jumped out at me. In terms of $1,000s of dollars:
Spark Summit (West) 2016 took place this past week in San Francisco, with the big news of course being Spark 2.0 which among other things ushers in yet another 10x performance improvement through whole-stage code generation.
|Char2Vec||Character||Sentence||Unsupervised||CNN -> LSTM|
|Doc2Vec||Paragraph Vector||Document||Supervised||ANN -> Logistic Regression|
|Video2Vec||Video Elements||Video||Supervised||CNN -> MLP|
The powerful word2vec algorithm has inspired a host of other algorithms listed in the table above. (For a description of word2vec, see my Spark Summit 2015 presentation.) word2vec is a convenient way to assign vectors to words, and of course vectors are the currency of machine learning. Once you've vectorized your data, you are then free to apply any number of machine learning algorithms.