The Fourth Bubble in the Data Science Venn Diagram: Social Sciences

OK, I have countless made fun of bloggers who, searching for something to fill their blog with, invent a new "V" to to the Three V's. I think we're up to six or seven V's now in the blogosphere.

But this is different. Really. I came across this innocuous conclusion to a CIO Insight blog post from earlier this month by Samuel Greengard, Preparing for the Big Data Deluge:

While the need for data scientists, mathematicians and statisticians hasn't gone away, it's also necessary to attract experts in psychographics, sociology, cultural anthropology and other non-IT disciplines.

I realized immediately that that is a critical missing element in many data science teams. While many are diligently collecting data and mining it with statistics and machine learning looking for, for example, customer behavioral patterns, they may not be trained or educated in the behavioral sciences. They may have expertise in the domain of the particular business, but not necessarily expertise in psychology and sociology.

So let's take look at adding "social sciences" as a fourth bubble to Drew Conway's now-venerable Venn Diagram. First, Drew's original:

Now one data science blogger in the field of political economics has already come up with a Venn Diagram that includes Social Sciences, but did so by replacing "substantive expertise" with "social sciences":

That is appropriate if you happen to be doing data science on the domain of the social sciences, but for the vast realm of domains out there, especially everyday business, it is appropriate to add it as a fourth bubble instead:

Now I did some sleight of hand above. Notice that I renamed Conway's "Substantive Expertise" into "Domain Expertise". One could retroactively apply "Substantive Expertise" to mean the combination of "Domain Expertise" and "Social Sciences," and be left with just the three original bubbles, and perhaps that was the meaning Conway had in mind, but when I first read "Substantive Expertise" a year ago, I took it to mean just "Domain Expertise," and I'm not the only one. Splitting out Social Sciences as a separate bubble more effectively communicates its importance.

An example of "socially-unaware data science" is trying to analyze website clickstream data without knowing any behavioral science or even any CHI/UX knowledge (computer-human interface and user experience). Even domains that don't directly deal with customers, such as hard physical sciences like climate change, social sciences are important when trying to do predictive analytics -- when trying to craft corrective policies -- by modeling citizen behavior in the face of rising sea levels and increasing food and energy prices.

An example of "domain-unaware data science" is hiring a social-sciences-oriented data scientist who lacks knowledge in your particular business domain.

Someone who possesses both domain expertise and social sciences but lacks statistics is someone who thinks they know too much -- even more than someone in one of the "single-danger" zones -- that they prefer their own hunches over what the data would tell them even more strongly.

You'll notice that a weakness of Venn Diagrams with four bubbles is that not all possible combinations are represented. Two pairs are not represented. One of the pairs, Hacking Skills plus Statistics, is Machine Learning as Conway portrayed, but that didn't make it into the four-way Venn Diagram. The other pair, Social Sciences plus Domain Expertise (without either Hacking Skills or Statistics), equals perhaps the program manager who had some social sciences education -- still merely knowledge and without the tools of either statistics or computers.

Now where does one find such a purple unicorn who possesses all four bubbles of this Venn Diagram? Fellow DSA board member Michael Walker presented last year that data science is not to be burdened upon a single person, but rather done by a team, as outlined by the Gartner table below.

To the above 2012 table then, we should add Social Scientist, as suggested by CIO Insight blogger Samuel Greengard.


Hi Michael,

Many thanks for the very informative post. My colleague and I are writing a paper entitled 'Spatial and temporal analysis data analysis in support of decision making for complex animal problems in the Big Data era'. It has been provisionally accepted for publication in Preventive Veterinary Medicince subject to a few minor revisions, one of which was the suggestion to include the four-bubble venn diagram, as we discuss it in the text. Would you allow us to include the diagram in our manuscript? You would be cited as the original author, and that it was being included with your permission. I can email you a cop of the draft manuscript if you are interested.

Looking forward to hearing from you.