A few months back I was introduced to the book, "Weapons of Math Destruction" by Dr. Cathy O'Neil. A mathematician and a former data scientist, she now writes about ethics in the tech industry. I got to see her in Milwaukee as she spoke on a major problem in data science: misguided metrics. Specifically, she discussed metrics that ignore the social implications of harmful models.
The highlight of her talk was the "ethical matrix" -- a tool for understanding how an algorithm effects model stakeholders. It teaches data scientist to look beyond the normal metrics of performance (i.e. accuracy), and to apply more focus to other vital characteristics like data quality, perceived fairness, and transparency. Likewise, the ethical matrix not only accounts for the entities looking to use the model. It also takes into consideration those subjected to the model, the designers and sellers of the model, and the public at large. Overall, her talk was quite inspiring, and offers a great tool for improving social consciousness in data science. You can find out more at her blog mathbabe.org.
The highlight of her talk was the "ethical matrix" -- a tool for understanding how an algorithm effects model stakeholders. It teaches data scientist to look beyond the normal metrics of performance (i.e. accuracy), and to apply more focus to other vital characteristics like data quality, perceived fairness, and transparency. Likewise, the ethical matrix not only accounts for the entities looking to use the model. It also takes into consideration those subjected to the model, the designers and sellers of the model, and the public at large. Overall, her talk was quite inspiring, and offers a great tool for improving social consciousness in data science. You can find out more at her blog mathbabe.org.
Now to her book -- she doesn't focus specifically on the ethical matrix. Rather, she highlights cases where poorly-designed high-stakes algorithms and their implementation caused detrimental harm to people. She points out how the use of such models create societal negative feedback loops which actually exacerbate the problems they claim to solve. Whether finding a new job, obtaining a loan, getting accepted into school, ranking your performance at work, setting your insurance rates, convicting you of crime -- she illustrates how people are constantly subject to the judgment of these algorithms, and how their effects feedback into an already faulty system, making fair judgment even less likely each time the model is put to use. While computational algorithms are paraded as tools for non-biased decision making, their greater purpose seems to be to shirk accountability.
She also shared suggestions on what data scientists can do to stop the spread of bad algorithms. I mention some of those suggestions below, along with some expanded thoughts of my own:
1) Certain models should simply not exist. This should go without saying, but some decisions require context and nuance that is impossible for an automated system to measure. It's fine to gather data (as long as it's relevant) for use in human decision making, but if a model cannot account for important variables that are known to be crucial for high-stakes decisions, THE MODEL SHOULD NOT BE USED. For example, models like the 'value-added model' for judging teacher performance rely heavily on proxy information which does not capture enough relevant information, and is an ill-informed scorer. In high-stakes, complex situations like judging teacher performance, which require careful introspection and context of an expert, model scores are likely to be of little worth.
2) Feedback is crucial to useful and ethical model development. One cannot unleash a model on the world and insist people be subject to its scoring / recommendations without first having determined whether the output makes sense and seems fair. This is an ugly characteristic of the tech world, which often seems far more interested in gaining recognition for deploying the first solution, rather than the safest / most fair solution. There must be opportunities for stakeholders to let their grievances be known if the model does not work for them. If certain people carry the brunt of the model outputs, then the model must be adjusted. Without these feedback loops, high-stakes models can wreak havoc on a system unabated and wreck lives, as long as those who power them remain ignorant (willfully or not).
3) Interpretable models give greater opportunity for feedback. "Explainable artificial intelligence" (XAI) is a rather popular topic because we the public are now understanding the influence of computational models in our daily lives. We often feel they get things wrong, and we want to understand how this happens. Interpretable machine learning, interpretable models, XAI -- these are different names but they all involve a wide set of tools and processes which are meant to make complex computational models (and their seeming decisions) understandable to humans. The literature on this is quite expansive, but for now I will simply say that I believe taking the time to interpret models can only help in making them useful and fair. Investing this into the initial phases of model development could avoid harm and cost in the future, allowing stakeholders to give sensible feedback, providing accountability and opportunity for model audits, and potentially uncovering useful insight into the data itself.
4) The data science method needs to follow more closely the scientific method. That is, building a useful model requires careful observation, testing, and skepticism. Often, data scientists are tasked with solving complex problems, and are required to work with foreign data in unfamiliar terrains. A scientist in this situation would likely feel it necessary to study the data and understand its meaning, while an industry data scientist may focus instead on their strength of optimizing algorithms and spitting out the highest accuracy scores. An industry data scientist may feel completely comfortable without the knowing the meaning of the variables in their models, while most highly-trained academic scientists likely would not. While the motivations and constraints of data scientists likely do not afford as much problem-solving time as an academic scientist, lacking understanding of your data can lead to low-utility (or dangerous) models. Data scientists in general should approach their problem solving with as much understanding of the data as possible, including the process by which it is gathered and currently used. If a data scientist cannot make sense of the data they are working with, there is no way to determine whether the model is trivial or whether there is leakage across variables. More importantly, if they skip out on understanding the data and its uses, taking ethical considerations to take into account becomes a big challenge.
I feel compelled at this point to say something about whether a data scientist should require an advanced degree in the scientific method. It's a controversial topic, as many do not hold advanced research-focused degrees. But data scientists' actual roles vary greatly -- a data scientist at a large company with thousands of employees is likely to function very differently from a data scientist at a startup and often do not require research experience. Roles more focused on analytics, engineering, and database management are not likely to benefit from concentrated scientific study. Thus, I feel that in many cases an advanced education is not necessary to be a good data scientist. However, if you seek insight from your data or research, actual research experience could come in handy. A data scientist with a PhD may be more careful about data integrity, more skeptical of model performance scores, and more conservative with interpretations of results, simply because of their own experience in scientific study. This is not to say that a PhD data scientist can't fall short on judgment, but they have likely spent years perfecting experimental design, data collection, analysis and modeling. Additionally, for many PhD researchers, finding a solution to a problem is often not enough -- they must propose and/or find mechanisms of the problem itself. This is often a much tougher task and can require endurance and grit, a sense of respect for the data, and a keen understanding of its limitations. A PhD data scientist, I believe, will be more likely to ask whether high-stakes conclusions can be drawn from the data provided than one who has not had to wrestle with this question in their own research. This is also not to say that a PhD is necessary to perform the work of a data scientist who develops models -- but some research experience would likely be helpful, and it demonstrates they have the ability to sense which questions can and can't be answered given a set of data.
That being said, a PhD is not a fail-safe for poor decisions, or even bad science. Scientists are known throughout history for their moral violations (the unauthorized use of Henrietta Lacks cells, widespread biocolonialism), coercion of research subjects (Stanford prison experiments, Milgram shock experiments), and spreading harmful untruths (race science, phrenology). Even today, scientific literature is rife with sensationalist claims and poorly designed / analyzed experiments, and a replication crisis. This is also not to mention that many of the harmful models discussed in this book were built by data scientists with PhDs. So we cannot count on research experience alone to remedy the propagation of bad and/or unethical models. Useful and fair models require input and feedback from subject matter experts and stakeholders of all types.
Finally, here are some other thoughts I had while reading this book:
- With all the faults of the pharmaceutical industry, new drugs are likely safer because of regulated clinical trials -- we don't just unleash them on the public like we do with computational models. Could models be made more safe through some form of government-regulated and phased model testing?
- Would the public benefit from IRB-like requirements on high-stakes models?
- Feedback is crucial to building useful and fair models, but often only happens after a model is widely used and news of its unfairness spreads to the public. Testing phases of models are not currently required -- it takes time and resources, and would place accountability back onto the model creators. So what's the most effective way to incentivize stakeholder feedback into the model-building process? What's the most effective way to act on it?
- Would the public benefit from IRB-like requirements on high-stakes models?
- Feedback is crucial to building useful and fair models, but often only happens after a model is widely used and news of its unfairness spreads to the public. Testing phases of models are not currently required -- it takes time and resources, and would place accountability back onto the model creators. So what's the most effective way to incentivize stakeholder feedback into the model-building process? What's the most effective way to act on it?
Comments
Post a Comment