Big Data Streaming Analytics Platforms Wave 2014

Near real-time streaming analytics is a hot area and has the potential to accelerate “time to insight” from the massive amounts of data originating from market data, sensors, mobile phones, the Internet of Things, Web clickstreams, and transactions. Forrester claims a 66 percent increase in firms’ use of streaming analytics according to a 2014 survey of 740 decision makers.

Different real-time streaming platform architectures and designs have been developed for different analytical requirements and use cases where incoming data controls the code in contrast to usual analytics programming where code execution controls data.

Most stream processing designs have customized user defined operations for industry specific use cases such as text analytics, advanced geospatial analytics, signal processing and erlangs in a traffic network.

While open source streaming analytic products like Apache Storm and Apache Spark are popular among certain data engineers for specific use cases, at this time they are immature and lack key functionality found in the offerings of proprietary vendors. Expect both Storm and Spark to innovate and improve in the future.

Storm streaming is a record-by-record stream processing engine yet is a very technical platform that lacks the higher order tools and streaming operators that are provided by mature vendor platforms while Spark streaming is not actually a real-time streaming in the technical sense - but more of a micro-batch framework and lacks industry-proven credentials. Moreover, the Spark API layer has a different approach based on batching up events for processing - this may work well for some limited use cases, but not as well as Storm and other record-by-record stream processing engines for real-time analytics that require processing each individual record or an understanding of the data creation time.

Note that Forrester did not include exciting new streaming products such as Google's DataFlow and Amazon's Kinesis or products from vendors playing on the edge of real-time analytics, such as DataTorrent, ZoomData; in-memory data grid developers, such as GridGain and ScaleOut Software; mega-vendors like Oracle and Microsoft (don’t sell standalone streaming analytics tools); other open source products, like the S4 Apache Incubator project, or Apache Kafka and Apache Samza; or any of the NoSQL and NewSQL vendors, such as DataStax or VoltDB, that offer some analytic functions atop a fast transactional database.

Seven (7) real-time streaming platforms made the cut including the following:

Software AG: The heart of Software AG’s real-time streaming offering is Apama, a product that has a long history as a complex event processing (CEP) platform. Since it was released way back in 2001, Apama has seen widespread use on Wall Street, where it powers algorithmic trading applications but it’s also seen use in retail banking, telecommunications, logistics, government, energy, and manufacturing. Apama was acquired from Progress Software in 2013.

IBM: InfoSphere Streams is “industrial strength,” and can support the “gnarliest of use-cases”. It scored highest in performance and scalability, and is second in functionality only to Apama. Deployments in healthcare, financial services, telecommunications, government, energy, and utilities.

SAP: The Event Stream Processor (ESP) has a “long, rich history” as one of the original CEP platforms. SAP’s roadmap calls for integrating ESP into the in-memory HANA database.

TIBCO: The 2013 acquisition of StreamBase gave TIBCO a reputable story to tell in the real-time streaming business, and complement’s The Information Bus Company’s 15-year history in the high-frequency trading market. StreamBase’s intuitive interfaces score high.

Informatica: The 2011 decision to redesign its RulePoint business rules engine with support for streaming analytics allows applications to be configured using either streaming operator constructs or business rule constructs.

Vitria Technology: They have a “proven track record” of helping companies in a variety of industries. The “unified platform” approach gives users unique options.

SQLstream: The reliance on ANSI SQL gives it a unique place in the world of streaming analytics, and helps to lessen the learning curve for developers. A relatively shallow market presence pulled down the total score for SQLstream.

See: http://bit.ly/1EkmdQd