2014-02-26: Druid Data Ingest and Text Mining with Python
University of Colorado Boulder - Wednesday February 26, 2014 @ 6:00pm MST
Location: ATLAS - 1125 18th St Bldg 223, Boulder, CO - Room 100
Druid Data Ingest
Abstract
Most data scientists know and accept that their greatests moments of blinding insight are likely to be preceded by hours, days, or weeks of data retrieval, inspection and cleanup. Another unheralded area of data science is the setup and administration of a project's big-data tools. This presentation will be a practical guide to data ingest in Druid, an open-source analytics database designed for scalable, explanatory analysis of large datasets. In addition to real-time ingest and analytics, Druid supports several options for bulk ingest of historical data. Druid data ingest can be a little challenging for the first-timer, so this presentation will be a hands-on guide to the practical details. We will start with a quick overview of the topology of a Druid cluster and a brief look at real-time ingest. Next, we'll cover how to choose which bulk ingest method to use, configuration opportunities/pitfalls, and where to look if things go wrong. My goal is to leave you with enough information to speed up your deployment if you find yourself getting started with this outstanding database.
Bio
Wayne Adams is a software consultant in Boulder, Colorado. After obtaining a BS in Physics from Eastern Kentucky University during a tough job market, he was fortunate to procure (civilian) employment with the US Navy, as well as the coolest assignment of his career -- testing ship fragmentation armor at Aberdeen Proving Ground. A few years and an MS in Electrical Engineering from Colorado State University later, he is happy to enjoy the relative tranquility of business software. Like all of you, he is interested in all things data, and he especially enjoys providing detailed how-to's to help you get productive as quickly as possible.
Text Mining with Python
Python source code and iPython Notebooks: http://www.williamgstanton.com/#!meetup-slides/cm7g
Abstract
Raw text is the classic example of unstructured, high-dimensional data. Text mining methods allow you to uncover structures, patterns, and sometimes even meaning in text. In this talk, I will introduce the key challenges and methods in text mining, and give examples of how to actually do text mining using Python. This talk will contain example use-cases and big picture ideas for generalists, as well as some real, working code for technical folks.
Bio
Will Stanton is on the analytics team at Return Path, the world's leading email data company. Before starting at Return Path, Will studied probability in the Department of Mathematics at CU Boulder. Will loves learning and teaching data science. You can find him on LinkedIn (http://www.linkedin.com/in/willstanton) or on his personal website (http://www.williamgstanton.com/).