Apache NiFi

NiFi is a system to process and distribute data - a dataflow lifecycle automation tool that acquires and delivers data across enterprise systems in real time. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Table of XX2Vec Algorithms

XX2Vec Embed In Sup/Unsup Algorithms used
Char2Vec Character Sentence Unsupervised CNN -> LSTM
Word2Vec Word Sentence Unsupervised ANN
GloVe Word Sentence Unsupervised SGD
Doc2Vec Paragraph Vector Document Supervised ANN -> Logistic Regression
Image2Vec Image Elements Image Unsupervised DNN
Video2Vec Video Elements Video Supervised CNN -> MLP

The powerful word2vec algorithm has inspired a host of other algorithms listed in the table above. (For a description of word2vec, see my Spark Summit 2015 presentation.) word2vec is a convenient way to assign vectors to words, and of course vectors are the currency of machine learning. Once you've vectorized your data, you are then free to apply any number of machine learning algorithms.

Introduction to Data Quality

How many times have you heard managers and colleagues complain about the quality of the data in a particular report, system or database? People often describe poor quality data as unreliable or not trustworthy. Defining exactly what high or low quality data is, why it is a certain quality level and how to manage and improve it is often a trickier task.