March 29, 2015
Data Science Association Announcements
Join us on Wednesday, April 15, 2015 at Rock Bottom Brewery for "Data Storage Trends and Architectures". Free live-streaming for those who cannot attend. Register here.
The Data Science Association's own Michael Malak will be presenting on "Extending Word2Vec for Performance and Semi-supervised Learning" at the Spark Summit in San Francisco in June, 2015. Register here. Michael is author of the new book "Spark GraphX In Action" that may be purchased here. Below is the abstract:
MLLib Word2Vec is an unsupervised learning technique that can generate vectors of features that can then be clustered. But the weakness of unsupervised learning is that although it can say an apple is close to a banana, it can't put the label of "fruit" on that group. We show how MLLib Word2Vec can be combined with the human-created data of YAGO2 (which is derived from the crowd-sourced Wikipedia metadata), along with the NLP metrics Levenshtein and Jaccard, to properly label categories.
As an alternative to GraphX even though YAGO2 is a graph, we make use of Ankur Dave's powerful IndexedRDD, which is slated for inclusion in Spark 1.4 or later. IndexedRDD is also used in a second way: to further parallelize MLLib Word2Vec. The use case is labeling columns of unlabeled data uploaded to the Oracle Big Data Prep Cloud Service (OBDPCS) cloud app, which processes big data in the cloud.
Join us on Thursday, April 2, 2015 for "Hortonworks - Big Data for Business". Register here.
Events:
Event: Data Storage Trends and Architectures - Rock Bottom Brew in Denver - April 15, 2015.
Event: Hortonworks - Big Data for Business - Centennial, CO - April 2, 2015.
Event: Failing Fast & Early: Assertive / Defensive Programming for R Analysis Pipelines - New York City - March 30, 2015.
Event: Intro to Graphs - London - April 7, 2015.
Event: Hands-on Data Visualization with D3.js - San Francisco - April 13, 2015.
New Books from DSA Store:
- Statistics Done Wrong: The Woefully Complete Guide
- Data Mining and Analysis: Fundamental Concepts and Algorithms
- Learning From Data
New DSA Videos:
- Intel and the Role of Open Source - Michael Greene
- A Bigger Lens Through Which to View the World - Adam Kocoloski
- Close Encounters with the Third Kind of Database - Eric Frenkiel
New DSA Resources:
- Spectral Learning of Mixture of Hidden Markov Models
- Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature
- Nicholas Bloom Interview - Large-scale Measurement - 2014
- NIST Cloud Computing Reference Architecture - V2 2013
Data Science News Articles
- Data Science in Python
- 5 Best Python Libraries For Data Science
- Algorithmia Launches With More Than 800 Algorithms On Its Marketplace
- Using Log Data And Machine Learning To Weed Out The Bad Guys
- PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning
- Where Do Data Scientists Come From?
Featured Data Science Blogs
We urge you to start your own personal data science blog @http://bit.ly/1dZ9bKI. Great way to share ideas, network and demonstrate your data science acumen.
The following are featured DSA blog posts:
- Predictive Analytics Strategy
- Spark Gets GPU in the Lab
- Data Warehouse and Data Management Solutions for Analytics Magic Quadrant 2015
© 2015 Data Science Association, Inc. — All Rights Reserved.