March 29, 2015

Data Science Association Announcements

Join us on Wednesday, April 15, 2015 at Rock Bottom Brewery for "Data Storage Trends and Architectures". Free live-streaming for those who cannot attend. Register here.

The Data Science Association's own Michael Malak will be presenting on "Extending Word2Vec for Performance and Semi-supervised Learning" at the Spark Summit in San Francisco in June, 2015. Register here. Michael is author of the new book "Spark GraphX In Action" that may be purchased here. Below is the abstract:

MLLib Word2Vec is an unsupervised learning technique that can generate vectors of features that can then be clustered. But the weakness of unsupervised learning is that although it can say an apple is close to a banana, it can't put the label of "fruit" on that group. We show how MLLib Word2Vec can be combined with the human-created data of YAGO2 (which is derived from the crowd-sourced Wikipedia metadata), along with the NLP metrics Levenshtein and Jaccard, to properly label categories.
As an alternative to GraphX even though YAGO2 is a graph, we make use of Ankur Dave's powerful IndexedRDD, which is slated for inclusion in Spark 1.4 or later. IndexedRDD is also used in a second way: to further parallelize MLLib Word2Vec. The use case is labeling columns of unlabeled data uploaded to the Oracle Big Data Prep Cloud Service (OBDPCS) cloud app, which processes big data in the cloud.

Join us on Thursday, April 2, 2015 for "Hortonworks - Big Data for Business". Register here.


Event: Data Storage Trends and Architectures - Rock Bottom Brew in Denver - April 15, 2015.

Event: Hortonworks - Big Data for Business - Centennial, CO - April 2, 2015.

Event: Failing Fast & Early: Assertive / Defensive­ Programming for R Analysis Pipelines - New York City - March 30, 2015.

Event: Intro to Graphs - London - April 7, 2015.

Event: Hands-on Data Visualization with D3.js - San Francisco - April 13, 2015.

New Books from DSA Store:

New DSA Videos:

New DSA Resources:

Data Science News Articles

Featured Data Science Blogs

We urge you to start your own personal data science blog @ Great way to share ideas, network and demonstrate your data science acumen.

The following are featured DSA blog posts:

© 2015 Data Science Association, Inc. — All Rights Reserved.