|Char2Vec||Character||Sentence||Unsupervised||CNN -> LSTM|
|Doc2Vec||Paragraph Vector||Document||Supervised||ANN -> Logistic Regression|
|Video2Vec||Video Elements||Video||Supervised||CNN -> MLP|
The powerful word2vec algorithm has inspired a host of other algorithms listed in the table above. (For a description of word2vec, see my Spark Summit 2015 presentation.) word2vec is a convenient way to assign vectors to words, and of course vectors are the currency of machine learning. Once you've vectorized your data, you are then free to apply any number of machine learning algorithms.
How many times have you heard managers and colleagues complain about the quality of the data in a particular report, system or database? People often describe poor quality data as unreliable or not trustworthy. Defining exactly what high or low quality data is, why it is a certain quality level and how to manage and improve it is often a trickier task.
Big Data, to be effective, must recognize the following voices (in order).
- VOC=Voice of the Customer
- VOB=Voice of the Business
- VOP=Voice of the Process
Datification is the link between the three voices. As well as capturing and displaying relevant metrics from all your improvement projects.
DATIFICATION!! What's the big deal? And why your business needs it?
What is Datification?
This week Databricks announced GraphFrames, a library posted to spark-packages.org that is based on Spark SQL Dataframes rather than RDDs (as GraphX is). GraphFrames is still a work in progress -- it is currently at the 0.1 version -- so it provides interoperability with GraphX (graphs can be converted back and forth).