Data Science and Social Science Theory

The interwoven approaches of data science and machine learning have gained attention as ways to use vast amounts of numeric, linguistic, image-based, or other information to understand, explain, and predict phenomena. In this week’s Research Methods Café Conversation, we explored the intersection of data science and social science theory and considered potential affordances of and barriers to productive alliances between the two.

The discussion included contributions from researchers in anthropology, geography, physics, health sciences, and other fields. From the outset, it was clear there was an apprehension about the dangers of “theory-free” or theory-ignorant uses of data science, and that its usefulness for solving nearly any type of problem might be exaggerated. Several participants stated that it was important for data scientists to have a grounding in the research field in which they were working, and that they should be aware of important concepts, theories, terminology, and literature. This grounding has not always been evident. There have also been instances where data scientists have seemingly over-interpreted data, such as facial expressions in paintings and historical documents, to make bold claims about social phenomena. In these cases, one concern is that the samples would not be representative of the whole population of interest, and another is that the chosen type of data might not be a reliable stand-in for the phenomenon of interest (e.g. facial expressions taken to indicate trustworthiness). As some participants noted, sampling and measurement are key issues in all kinds of research. Whenever quantitative methods are used without careful consideration of the underlying principles, interpretations can be led astray. However, the scale at which data scientists work could magnify these problems. One of the most substantial risks is that analysis of pre-existing data sets could reinforce systems of bias and privilege, especially when these data are collected without the knowledge of people to whom they refer.

When the discussion turned to how data science could impact their own research, participants agreed they do not expect off-the-shelf solutions from data science to optimize their work, even if the impression is that these are being offered. Instead, it was suggested that researchers work with data scientists and specify the computational tools that would be most helpful for them. Several participants expressed interest in joining collaborations, but some anticipated that it would be difficult to build bridges between their own fields and data science. They mentioned “turf wars” between data science and statistics and the ambiguity around who owns a field of research like education, in which psychologists, sociologists, practitioners, and others have all made important contributions. Some participants expressed doubt that cross-disciplinary publications would “count” as valuable in their own home fields. There was also an awareness of a lack of shared terminology to discuss research, and some participants mentioned they had experienced difficulty in locating data scientists with the necessary skills and interests to collaborate in specific social sciences fields. Recognizing the practical constraints against researchers’ developing expertise in both their home disciplines and in data science, participants voiced a belief that what is needed instead are “hybrids” or “translators” who may not be experts in any one field but are multi-talented enough to stand in the gap and facilitate productive cross-talk. A researcher in education suggested that a shared problem was needed to develop cross-disciplinary communication, and several participants thought that current environmental and healthcare crises would be good candidates.

Regarding the suggestion of tackling a shared problem, there were deeper issues to address than just the practicalities of collaboration: What is worth studying, at what level, and what do we hope to gain from it? Participants thought there might be a mismatch in research values between data scientists and other researchers, and one participant pointed out that the “problem with the problem” may appear very differently from different perspectives. Although not stated overtly in the discussion, it seemed accepted that research should be brought to bear on issues of human well-being. The work of Frances Griffiths and colleagues on healthcare was mentioned as an example of how to thoughtfully “mix” methods, including analysis of large quantitative data sets. One participant working in healthcare education discussed her own work with “humanistic benchmarking,” which seeks to scale-up the kinds of priorities that matter to individuals in health outcomes. Another participant’s work uses large-scale data to examine sea ice changes and their impact on indigenous communities, which she further explores through ethnographic methods. These examples illustrate that it is possible to foreground lived experiences while gaining insight from data science, but the perception seemed to be that this did not always happen. 

Returning to the place of theory in research, participants discussed their beliefs that data scientists sought to work outside of theory or to generate theory post hoc. There seemed to be a slight disagreement in the conversation over the meaning of “theory” in research, whether it was synonymous with “hypothesis,” and whether its main purpose was to explain phenomena or make predictions. The idea that (new) theory should account for key concepts and well-understood phenomena within a discipline was reiterated, as was the need to consider conflicting or alternative explanations. Because highly regarded theories can sometimes hold sway over a field, one participant, a physicist, proposed that limited theories or localized models designed to work under specific circumstances would be less likely to reinforce a protectionist research culture. Despite some differences, there seemed to be a consensus that theory should definitely be part of a data scientist’s repertoire. Otherwise, there are likely to be problems in the research that cannot simply be addressed with better techniques.  

Reflecting on this week’s conversation, I was impressed by the critical but still encouraging stance most participants voiced toward the prospect of using data science in their research programmes, with a very interesting project being led by faculty from my home department of education in collaboration with computer scientists. Such efforts reflect my belief that there are a variety of research designs and methods which can address social issues that matter. I also think that seeking to “break down barriers between disciplines,” as one participant stated, is a valuable opportunity to reconsider the foundational principles on which our research is based. Whatever kind of research we engage in, I believe our approaches should be thoughtfully chosen, with their limitations as well as their potential benefits in mind, and we should always seek to apply research outcomes in ways that fit the specific context and affirm the dignity and equality of our stakeholders. While machines may assist us greatly in handling the data, humans alone bear the responsibility for making such judgments. 

This blog expresses the author's views and interpretation of comments made during the conversation. Any errors or omissions are the author's.

Loraine Hitt is an Ed.D. candidate in Durham University's School of Education.  

 

Comments

  1. Nice Post…Keep Writing!!!
    Do read my blog on Future of AI.

    The key categories where Future of Artificial Intelligence is applicable are diagnosis and treatment recommendations, patient engagement and adherence, and other administrative activities.

    ReplyDelete

Post a Comment