Cloud Machine Learning Platforms vs. Apache Spark Solutions


Cloud giants like Amazon, Google, Azure and IBM have rushed into the big data analytics cloud market.  They claim their tools will make developer tasks simple. For machine learning, they say their cloud products will free data scientists and developers from implementation details so they can focus on business logic.  


The big companies have kicked off a race between machine learning platforms. Amazon ML, Azure ML, IBM Watson and Google Cloud Prediction are striving to fold data science workflows into their existing ecosystems. They want to drive the adoption of machine learning algorithms across software development teams and expand data science throughout the business.


But data scientists and big data platform engineers do not always want or need this one-size-fits-all approach.  They understand first hand how powerful and flexible Apache Spark, R and Python are when it comes to machine learning. These people are experts who cannot be constrained by the cloud GUI. They need access to the command line. And they do not want to be tied to any one vendor.


We would like to remove marketing buzz coming from the cloud companies and hone in on the facts. So we ask you to help us gather actual statistics that show how much companies are using these clouds versus using custom solutions they built themselves with opensource technologies and machine learning libraries.


Follow this link to participate in the survey.

You might also want to follow the discussion at Quora or Reddit too and comment there.