Table of XX2Vec Algorithms
|Char2Vec||Character||Sentence||Unsupervised||CNN -> LSTM|
|Doc2Vec||Paragraph Vector||Document||Supervised||ANN -> Logistic Regression|
|Video2Vec||Video Elements||Video||Supervised||CNN -> MLP|
The powerful word2vec algorithm has inspired a host of other algorithms listed in the table above. (For a description of word2vec, see my Spark Summit 2015 presentation.) word2vec is a convenient way to assign vectors to words, and of course vectors are the currency of machine learning. Once you've vectorized your data, you are then free to apply any number of machine learning algorithms.
word2vec is able to come up with vectors by leveraging the concept of embedding. In a corpus, a word appears in the context of surrounding words, and word2vec uses those co-occurrences to infer relationships between those words.
All of the XX2Vec algorithms listed in the table above assign vectors to X's, where those X's are embedded in some larger context Y.
But the similarities end there. Each XX2Vec algorithm not only goes about it through means suited for its domain, but their use cases aren't even analagous. Doc2Vec, for example, is supervised learning whereas most of the others are unsupervised learning. The goal of Doc2Vec is to be able to apply labels to documents, whereas the goal of word2Vec and most of the other XX2Vec algorithms is simply to spit out vectors that you can then go and do other machine learning and analyses on (such as analogy detection).
Here is a brief description of each XX2Vec:
Like word2vec but because it operates at the character level, it is much more tolerant of misspellings and thus better for analysis of tweets, user product reviews, etc.
Described above. But one more note: it's one of those unreasonably effective algorithms -- a kind of getting lucky, if you will.
Instead of just getting lucky, there have been a number of efforts to ground the idea of word embeddings in something more mathematical than just pulling weights out of a neural network and hoping they work. GloVe is the current standard-bearer in this regard. Its model is designed from the ground up to support finding analogies, instead of just getting them by chance in word2vec.
Actually, Doc2Vec uses Word2Vec as a first pass. It then comes up with a composite vector for each sentence or paragraph from the contributing Word2Vec word vectors. This composite gives some kind of overall context to the sentence or paragraph, and then this composite vector is plopped down into the beginning of the sentence or pargraph as an "extra word". The paragraph vectors togeher with the word vectors are used to train a supervised-learning classifier using human labels of the documents.
Whereas word2vec intentionally uses a shallow neural network, Image2Vec uses a deep neural network and composes the resultant vectors from the weights from multiple layers of the network. Image elements that might be represented by these weights include image fragments (grass, bird, fence, etc.) or overall image qualities like color.
If machine learning on images involves high dimensions, videos involve even higher dimensions. Video2Vec does some initial dimension reduction by doing a first pass with convolutional neural networks.