2016-09-07 Akka streams & Bloom filters; and ML Pipeline for Text Classification

An Investigation of Akka Streams and Bloom Filters - Abstract 

Link to slideshow: PDF Slideshow

Anthony is involved in replacing a legacy system based on batch processing with stream processing to more quickly monetize the data and decrease costs. Akka Streams promised to be fast and good at managing state, while Bloom Filters promised to save a lot of costly Cassandra joins while being at least 99% correct. This is the story of their investigation in using Akka Streams and Bloom Filters for part of their new system.

Anthony May - Bio

Anthony is a Senior Software Engineer with the Oracle Data Cloud where he works on the Campaign Dev team. His work is focused on the ingest of large volumes of unbounded streaming datasets using Akka, Spark, Kafka, Mesos etc. Previously he's built software for Healthcare, Education and Retail. He originally hails from New Zealand and moved to the USA in 2012. 

 

Spark ML Pipeline for Text Classification - Abstract 

Link to slideshow: PDF Slideshow

Text classification is a broad topic ranging from binary spam filtering to enterprise document management.  Spark on Hadoop’s distributed approach to machine learning is an ideally suited solution to the problem of classifying documents at large scale.  This talk will cover the use of another Apache product, Tika, and the use of Tesseract open source OCR to get a handle on multi-class document categorization.  The focus will be on text extraction and cleansing through Spark ML Pipeline Models.

Adam Hicks - Bio

Adam’s background is where data analyst, full stack developer and DevOps engineer converge.  He has an academic background in philosophy, pure mathematics and graduate computer science.  His work spans business automation with Visual Basic, micro service application development, continuous integration for the enterprise and management of a Hadoop stack from the bottom to the top with a focus on Spark development.  Adam has an undying passion for all things open source and *nix, and currently works for a small financial services company helping provide solutions anywhere they need a fresh approach.