Big Data, to be effective, must recognize the following voices (in order).
- VOC=Voice of the Customer
- VOB=Voice of the Business
- VOP=Voice of the Process
Datification is the link between the three voices. As well as capturing and displaying relevant metrics from all your improvement projects.
DATIFICATION!! What's the big deal? And why your business needs it?
What is Datification?
This week Databricks announced GraphFrames, a library posted to spark-packages.org that is based on Spark SQL Dataframes rather than RDDs (as GraphX is). GraphFrames is still a work in progress -- it is currently at the 0.1 version -- so it provides interoperability with GraphX (graphs can be converted back and forth).
Data Science Skills Survey 2015
- Do you have domain knowledge? If yes, construct a better set of “ad hoc” features.
- Are your features commensurate? If no, consider normalizing them.
- Do you suspect interdependence of features? If yes, expand your feature set by constructing conjunctive features or products of features, as much as your computer resources allow you (see example of use in Section 4.4).
Arrow is a columnar in-memory analytics framework that provides the performance benefits of modern techniques while also providing the flexibility of complex data and dynamic schemas. Arrow grew out of three prevailing trends and business requirements: