2014-02-26: Druid Data Ingest and Text Mining with Python

University of Colorado Boulder - Wednesday February 26, 2014 @ 6:00pm MST

Location: ATLAS - 1125 18th St Bldg 223, Boulder, CO - Room 100 

Druid Data Ingest

Link to slideshow: PDF Slideshow

Abstract

Most data scientists know and accept that their greatests moments of blinding insight are likely to be preceded by hours, days, or weeks of data retrieval, inspection and cleanup.  Another unheralded area of data science is the setup and administration of a project's big-data tools.  This presentation will be a practical guide to data ingest in Druid, an open-source analytics database designed for scalable, explanatory analysis of large datasets.  In addition to real-time ingest and analytics, Druid supports several options for bulk ingest of historical data.  Druid data ingest can be a little challenging for the first-timer, so this presentation will be a hands-on guide to the practical details. We will start with a quick overview of the topology of a Druid cluster and a brief look at real-time ingest.  Next, we'll cover how to choose which bulk ingest method to use, configuration opportunities/pitfalls, and where to look if things go wrong.  My goal is to leave you with enough information to speed up your deployment if you find yourself getting started with this outstanding database.

Bio

Wayne Adams is a software consultant in Boulder, Colorado.  After obtaining a BS in Physics from Eastern Kentucky University during a tough job market, he was fortunate to procure (civilian) employment with the US Navy, as well as the coolest assignment of his career -- testing ship fragmentation armor at Aberdeen Proving Ground.  A few years and an MS in Electrical Engineering from Colorado State University later, he is happy to enjoy the relative tranquility of business software.  Like all of you, he is interested in all things data, and he especially enjoys providing detailed how-to's to help you get productive as quickly as possible.

 

Text Mining with Python

Link to slideshow: PDF Slideshow

Link to slideshow: PDF Slideshow

Link to slideshow: PDF Slideshow

Python source code and iPython Notebooks: http://www.williamgstanton.com/#!meetup-slides/cm7g

Abstract

Raw text is the classic example of unstructured, high-dimensional data. Text mining methods allow you to uncover structures, patterns, and sometimes even meaning in text. In this talk, I will introduce the key challenges and methods in text mining, and give examples of how to actually do text mining using Python. This talk will contain example use-cases and big picture ideas for generalists, as well as some real, working code for technical folks.

Bio

Will Stanton is on the analytics team at Return Path, the world's leading email data company. Before starting at Return Path, Will studied probability in the Department of Mathematics at CU Boulder. Will loves learning and teaching data science. You can find him on LinkedIn (http://www.linkedin.com/in/willstanton) or on his personal website (http://www.williamgstanton.com/).

Date: 
Wednesday, February 26, 2014 - 6:00pm to 9:00pm