2014-12-03: LineageDB Architecture for Big Data Analytics & Data Quality

Register @ http://bit.ly/1Elg3tJ

University of Colorado Boulder - Wednesday December 3, 2014 @ 6:00pm MST

NOTE: For folks unable to attend in person register and we will email you a livestream link 2 hours prior to event.

Location: ATLAS - 1125 18th St Bldg 223, Boulder, CO - Room 100 - Map: http://goo.gl/maps/XTJ9v

Agenda:

6:00 - 6:20 Schmooze - Food shall be served in Lobby

6:20 - 6:30 Announcements

6:30 - 7:30 LineageDB Architecture for Big Data Analytics by Charles Clifford

7:30 - 8:30 Data Quality - the Dirty Underbelly of Data Science by Ken Farmer

8:30 - 9:30 Network at Old Chicago at 1102 Pearl St. (western end of Pearl Street pedestrian mall, directly facing Boulder Bookstore). Please support our sponsor, Old Chicago in Boulder, and make new friends. See: http://oldchicago.com/locations/boulder

DSA Announcements

Link to slideshow: PDF Slideshow

LineageDB Architecture for Big Data Analytics - Abstract

Link to slideshow: PDF Slideshow


The traditional approach to data analytic platforms are:

• tightly coupled to expensive relational data services; 
• limited to star and snow-flake schema (notoriously difficult to maintain); and 
• heavily dependent on brittle, expensive ETLs.

RDBMS can be scaled vertically (at a big price point), but eventually you run out of run-way because a b-tree does not scale linearly. The morphing of relational services into MPP appliances have resulted in platforms that are not flexible enough to support rapidly changing data analytic needs. These limitations in can be overcome by adopting the LineageDB architecture, a polyglot composed from loosely coupled, open-source:

• key-value storage service; 
• index service; 
• graph service; 
• SQL service; and 
• in-memory data service.

Charles Clifford - Bio

Charles Clifford has been designing and developing both transaction, as well as analytic, business solutions since the early 90s. He has delivered distributed solutions to a variety of industries, from tel-com, to capital markets, to health care, to software powerhouses. His current focus is on the design and delivery of DaaS solutions. 

Data Quality - the Dirty Underbelly of Data Science - Abstract


Data quality continues to be one of the chief challenges, costs and reasons for project failure in data science. Problems in this space limit accuracy, destroy credibility and can result in harmful solutions. And unlike challenges such as scalability and cost it has seen no major breakthrough improvements. This presentation will cover the types of problems, as well as their impacts, causes and various solutions.

Ken Farmer - Bio

Ken Farmer is the senior data architect/wrangler/librarian for ProtectWise where he is developing their analytical data solution. Previously, he has developed, maintained, managed and consulted on analytical data architectures for IBM, MapQuest, Verizon, and others.

 

 

Register @ http://bit.ly/1Elg3tJ

 

Date: 
Wednesday, December 3, 2014 - 6:00pm to 9:00pm