Thursday, March 14, 2013

Data Science in Business/Computational Social Science in Academia?

Nomen Est Omen?

Lately, the terms "data science" and "data scientist" turn up at an increasing pace in the R-blog-sphere. Since its first occurrence (to my knowledge,  "data scientist" has been coined by DJ Patil and Jeff Hammerbacher in 2008), the term "data scientist" has become established and accepted not only in the data-blog-sphere but also in the corporate/business world as well as in academia. It's frequent occurrence as job title as well as some controversial discussions on the situation of the respective labor market are evidence for our understanding of data scientist as an occupational title. At the same time data science is being established as a course taught at universities (with the first drafts of specific textbooks to learn data science; see, e.g., Jeffrey Stanton's free book on data science).   Interestingly, by the existence of job descriptions for data scientists and the respective skill sets, our understanding of data science is increasingly defined through what the corporate labor market demands - hence, through business. As I see it, this development is also taken up by the scholars teaching data science at universities. A data science course is quite specifically a preparation for a future job as data scientist. In that sense, data science is not a science itself but the application of various sciences (computer science, statistics, etc.). This notion, I think, is also present when reading the JDS.

Empirical Computational Social Science

 The corporate labor market asks for data scientists and universities are offering new courses in order to fill the gaps. But, is there also room for data science skills in a purely academic research environment?

I think, there is very much room for it. At around the same time the term "data scientist" came up, the Science Magazine published Lazer et al.'s maniphesto on data-driven computational social science (or, the term I prefer, empirical computational social science). Historically, the term computational social science is rather referring to the application of numerical methods and simulation (i.e., agent based modelling) to complex issues of social science research. What Lazer et al. rather understand as computational social science, however, is social science research that draws on the enormous potential of vast amounts of digital data on social interactions (made available through the Internet, mobile applications etc.). Handling this data in order to conduct empirical social science research clearly needs data science skills. To come full circle, I have revisited Drew Conway's post and Venn-diagram on data science and drafted another Venn-diagram to illustrate how data-driven computational social science could be interpreted in the framework discussed above.

Whether or not you generally share my point of view concerning data science and computational social science, I am pretty sure you will agree on one thing: R will play an important role in the further development of these fields.