Apache Arrow - New Columnar In-Memory Analytics Framework
Arrow is a columnar in-memory analytics framework that provides the performance benefits of modern techniques while also providing the flexibility of complex data and dynamic schemas. Arrow grew out of three prevailing trends and business requirements:
Columnar: Big Data has gone columnar. Led by the creation and adoption of Apache Parquet and other columnar data storage technologies, the industry has experienced rapid adoption of columnar storage for analytical workloads. These formats reduce the footprint and increase the performance of such workloads.
In-memory: In-memory systems like SAP HANA and Spark accelerate analytical workloads by holding data in memory. In short, people are no longer willing to wait for non-in-memory systems.
Complex data and dynamic schemas: Real-world objects are easier to represent as hierarchical and nested data structures. This has lead to the rise of formats/technologies such as JSON and document databases. Analytical systems must treat this type of data as first class.