Kafka 1.0 - New Release

Apache Kafka is a distributed streaming platform for publishing and subscribing, storing, and processing streaming data at scale and in real-time. It has become an awesome tool for a durable system of data collection platforms.

The new release (Kafka 1.0.0) provides the following enhancements:

Data Visualization in Data Science

“If you’re trying to extract useful information from an ever-increasing inflow of data, you’ll likely find visualization useful – whether it’s to show patterns or trends with graphics instead of mountains of text, or to try to explain complex issues to a nontechnical audience.” So writes InfoWorld’s Sharon Machlis.

Standard Methodology for Analytical Models




In this document, the Standard Methodology for Analytical Models (SMAM) is described. A short overview of the SMAM phases can be found in Table 1. The most frequent used methodology is the Cross Industrial Standard Processes for Data Mining (CRISP-DM)[1], which has several shortcomings that translate into frequent friction points with the business when practitioners start building analytical models.

Curated Medical Data for Machine Learning

A new curated list of medical data for machine learning is available here.

In addition, Stanford is developing a petabyte-scale, cloud-based, multi-institutional, searchable, open repository of diagnostic imaging studies for developing intelligent image analysis systems - called Medical Image Net.

These curated data sets are great for experimenting - please respect data usage restrictions for each data set.