2014-12-03: LineageDB Architecture for Big Data Analytics & Data Quality
Register @ http://bit.ly/1Elg3tJ
University of Colorado Boulder - Wednesday December 3, 2014 @ 6:00pm MST
NOTE: For folks unable to attend in person register and we will email you a livestream link 2 hours prior to event.
Location: ATLAS - 1125 18th St Bldg 223, Boulder, CO - Room 100 - Map: http://goo.gl/maps/XTJ9v
Agenda:
6:00 - 6:20 Schmooze - Food shall be served in Lobby
6:20 - 6:30 Announcements
6:30 - 7:30 LineageDB Architecture for Big Data Analytics by Charles Clifford
7:30 - 8:30 Data Quality - the Dirty Underbelly of Data Science by Ken Farmer
8:30 - 9:30 Network at Old Chicago at 1102 Pearl St. (western end of Pearl Street pedestrian mall, directly facing Boulder Bookstore). Please support our sponsor, Old Chicago in Boulder, and make new friends. See: http://oldchicago.com/locations/boulder
DSA Announcements
LineageDB Architecture for Big Data Analytics - Abstract
The traditional approach to data analytic platforms are:
• tightly coupled to expensive relational data services;
• limited to star and snow-flake schema (notoriously difficult to maintain); and
• heavily dependent on brittle, expensive ETLs.
RDBMS can be scaled vertically (at a big price point), but eventually you run out of run-way because a b-tree does not scale linearly. The morphing of relational services into MPP appliances have resulted in platforms that are not flexible enough to support rapidly changing data analytic needs. These limitations in can be overcome by adopting the LineageDB architecture, a polyglot composed from loosely coupled, open-source:
• key-value storage service;
• index service;
• graph service;
• SQL service; and
• in-memory data service.
Charles Clifford - Bio
Charles Clifford has been designing and developing both transaction, as well as analytic, business solutions since the early 90s. He has delivered distributed solutions to a variety of industries, from tel-com, to capital markets, to health care, to software powerhouses. His current focus is on the design and delivery of DaaS solutions.
Data Quality - the Dirty Underbelly of Data Science - Abstract
Data quality continues to be one of the chief challenges, costs and reasons for project failure in data science. Problems in this space limit accuracy, destroy credibility and can result in harmful solutions. And unlike challenges such as scalability and cost it has seen no major breakthrough improvements. This presentation will cover the types of problems, as well as their impacts, causes and various solutions.
Ken Farmer - Bio
Ken Farmer is the senior data architect/wrangler/librarian for ProtectWise where he is developing their analytical data solution. Previously, he has developed, maintained, managed and consulted on analytical data architectures for IBM, MapQuest, Verizon, and others.
Register @ http://bit.ly/1Elg3tJ