Data Science: Pied Piper of Modern World

Bit by bit they gathered over the years

They bit, they spread, and they flew everywhere!

On land, in air – they left no empty space

They sucked everyone into a pretty mad race!


Megabytes! Gigabytes! Terabytes! Their sizes grew bigger

Petabytes and Zettabytes are now ready to trigger!

They sped through the wires, they rode the air waves

In dots and lines, they came in all shapes!


The ‘likes’, the ‘dislikes’ and even the very ‘neutral’

You are forced to pay attention and cannot be too casual!

Apache NiFi

NiFi is a system to process and distribute data - a dataflow lifecycle automation tool that acquires and delivers data across enterprise systems in real time. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Table of XX2Vec Algorithms

XX2Vec Embed In Sup/Unsup Algorithms used
Char2Vec Character Sentence Unsupervised CNN -> LSTM
Word2Vec Word Sentence Unsupervised ANN
GloVe Word Sentence Unsupervised SGD
Doc2Vec Paragraph Vector Document Supervised ANN -> Logistic Regression
Image2Vec Image Elements Image Unsupervised DNN
Video2Vec Video Elements Video Supervised CNN -> MLP

The powerful word2vec algorithm has inspired a host of other algorithms listed in the table above. (For a description of word2vec, see my Spark Summit 2015 presentation.) word2vec is a convenient way to assign vectors to words, and of course vectors are the currency of machine learning. Once you've vectorized your data, you are then free to apply any number of machine learning algorithms.