Apache Kafka is a distributed streaming platform for publishing and subscribing, storing, and processing streaming data at scale and in real-time. It has become an awesome tool for a durable system of data collection platforms.
The new release (Kafka 1.0.0) provides the following enhancements:
A new curated list of medical data for machine learning is available here.
In addition, Stanford is developing a petabyte-scale, cloud-based, multi-institutional, searchable, open repository of diagnostic imaging studies for developing intelligent image analysis systems - called Medical Image Net.
These curated data sets are great for experimenting - please respect data usage restrictions for each data set.
Today data scientists are using new techniques with machine learning to solve complex challenges. In applied data science speed kills and the Tesla V100 accelerator is built for speed. Data scientists always make trade-offs between accuracy and time. Thus, more powerful compute systems are required to crunch and analyze diverse data sets, and train exponentially more complex deep learning models in a practical amount of time.