michaelmalak's blog

Drizzle Brings Low-Latency Streaming to Spark; but RISE Lab is Just a Change in Funding

This morning at Spark Summit Europe 2016, Ion Stoica announced during his keynote the Drizzle project, which promises to reduce streaming data latency in Spark to be less than Flink and Storm. Ion announced this in the context of the new RISE Lab at UC Berkeley.

Table of XX2Vec Algorithms

XX2Vec Embed In Sup/Unsup Algorithms used
Char2Vec Character Sentence Unsupervised CNN -> LSTM
Word2Vec Word Sentence Unsupervised ANN
GloVe Word Sentence Unsupervised SGD
Doc2Vec Paragraph Vector Document Supervised ANN -> Logistic Regression
Image2Vec Image Elements Image Unsupervised DNN
Video2Vec Video Elements Video Supervised CNN -> MLP

The powerful word2vec algorithm has inspired a host of other algorithms listed in the table above. (For a description of word2vec, see my Spark Summit 2015 presentation.) word2vec is a convenient way to assign vectors to words, and of course vectors are the currency of machine learning. Once you've vectorized your data, you are then free to apply any number of machine learning algorithms.