Access 264 data sets at http://archive.ics.uci.edu/ml/datasets.html.
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
Datasets from UCI also available at http://www.sgi.com/tech/mlc/db/.
In a new HuffPost/YouGov poll, only 36 percent of Americans reported having "a lot" of trust that information they get from scientists is accurate and reliable. Fifty-one percent said they trust that information only a little, and another 6 percent said they don't trust it at all. See: http://huff.to/19Joyn5
A new book "Predictive Business Analytics" by Gary Cokins, a Data Science Assocation Advisory Board Member. Gary is a great writer who is an expert with talent for simplifying complex subjects with clarity. Gary is a trusted advisor and I strongly recommend that you purchase this book.
More and more frequently we see organizations make the mistake of mixing and confusing team roles on a data science or "big data" project - resulting in over-allocation of responsibilities assigned to data scientists. For example, data scientists are often tasked with the role of data engineer leading to a misallocation of human capital. Here the data scientist wastes precious time and energy finding, organizing, cleaning, sorting and moving data. The solution is adding data engineers, among others, to the data science team.
Magazine Luiza, one of the largest retail chains in Brazil, developed an in-house product recommendation system, built on top of a large knowledge Graph. AWS resources like Amazon EC2, Amazon SQS, Amazon ElastiCache and others made it possible for them to scale from a very small dataset to a huge Cassandra cluster. By improving their big data processing algorithms on their in-house solution built on AWS, they improved their conversion rates on revenue by more than 25 percent compared to market solutions they had used in the past.