Machine Learning

Machine Learning (ML) has, in the last decade become something of a “scientific bandwagon”. Starting as theoretical work in the departments of computer science and mathematics, it has since toured through most campus parts of universities, where its performances has led to new insights and breakthroughs on a wide range of topics, an aspect that was reflected by the broad range of participants backgrounds (Computer Science, Archaeology, Psychology, Physics to name but a few) in this week’s Research Methods Conversation, of which the focus was on how ML is and can be used by academics and the Universities management, and what ethical and methodological difficulties could arise.

The conversation started with mapping out how participants have used ML inspired algorithmic compositions to achieve advances in their field. A participant of the department of Psychology, explained how he is using reinforcement learning (an area of ML where models are trained to make decisions in an uncertain, complex environment) to simulate how the nervous system deals with uncertainty. This path of research holds much potential as connections between ML and the cognitive system have been established (e.g. Yamins, D.L.K. et al. 2014). Another exciting project was brought up by a researcher in the department of Archaeology, and their collaboration with the Institute of Data Science to use ML to map soil types on aerial photography or satellite imagery. As some participants voiced their interest to learn more about ML, it was pointed out that the Advanced Research Computing is offering a training course.

However, the “ML bandwagon” found bigger resonance outside of academia. As neoliberal policies from 80’s have completed the dismantling of public and private monopolies over digital technologies, data flows have profoundly intensified (Bigo, R., Isin, E., Ruppert E. 2017). Shortly thereafter, research into ML has shifted in the 90’s from a knowledge- to a data-driven approach to reasoning, which led to Deep Blue winning against Garry Kasparov and transformed data into the new oil of the digital industry. This data dependency of ML has led the conversation overlap with themes touched upon in the previous meeting where ‘Data Science and Social Science Theory’ were discussed. An ML specific issue that was mentioned is that it can be tuned and tweaked to near-perfect performance in the lab often fail in real settings. This is typically put down to a mismatch between the data the ML model was trained and tested on and the data it encounters in the world, a problem known as data shift. Now another issue, known as underspecification in statistics, has been brought to light by Google. In contrast to the data shift, underspecification describes a problem with the way that ML models are currently trained and tested. Thus the need for a “human in the loop” with expert knowledge was emphasized, while it was also realized that this human is an impossible figure when one considers the entanglement of the expert, algorithm, training and real-world data, as detailed in “Cloud Ethics” by L. Amoore (2020).

On this topic the important work of a researcher from the department of Computer Science, was mentioned. As ML models can be highly non-linear with a gazillion parameters, they form a black box that is impossible to understand. Their research focuses on finding tools that could turn a black into a white box, which provide explanations of their decision in a human-understandable way. Achieving interpretability and transparency in turn are important to ensure algorithmic fairness and identify potential bias.

However, ML models themselves are not the only bottleneck towards algorithmic accountability. The opaque, unverifiable and unchallengeable decision-making process of many currently deployed data flows and algorithms across consumer, business, and government sectors have led to coin the term black box society. Many participants in the conversation could immediately think of ways how ML based algorithms could impact their professional lives. The widely reported case of Amazon’s automated hiring system has made the participants wonder how their knowledge thereof would alter the way they would present themselves, putting emphasize on buzzwords the algorithm could weigh positively.

As Covid-19 had us move more of our lives into virtual spaces, Universities have opened new data flows to reach their now remote students, that in some cases are not far from surveillance technology. In this context a participant noted the PhD project at Durham Law School which explores the ways in which surveillance can crop up in simulated classroom environments in India. Understanding how they are used and interpreted by students, teachers, and administrators is need if a legal framework is sought. Another participant pushed further and imagined use cases beyond the supporting human decision making but replacing it altogether. This would lead to the redundancy of labour which was imagined to impact especially the global South, as it is in the North where most of the computational power, data centres, and advances in learning algorithms reside.

This blog expresses the author’s views and interpretation of comments made during the conversation. Any errors or omissions are the author’s.

Christoph Becker is a PhD student working at the Institute for Computational Cosmology and the Institute for Data Science.

Comments