Guest blog: AI in healthcare - How economics can help us understand the risks
Federico Cilauro, Manager at Frontier Economics, a leading economics consultancy, writes about the potential of AI to revolutionise medical diagnosis.
AI diagnosis has great potential
Screening for diseases has traditionally been expensive, because of the amount of labour it involves. But AI offers a new frontier, drawing on data to make faster, more cost-effective diagnoses.
For example, in 2020, a deep learning system developed by Google Mind and Moorfields Eye Hospital outperformed five out of six doctors in predicting the onset of macular degeneration. And one recent study showed that smartwatch readings could identify coronavirus infections in 63% of cases, before symptoms were evident.
The risk: AI could lead to unfair healthcare access
But the use of AI in diagnosis is not without its risks. Poorly calibrated algorithms could create bias in healthcare provision, on the basis of sex, race or other characteristics.
How? AI systems depend on the data they use – and certain groups are under-represented in health data, because of previous difficulties in accessing treatment. Another problem is ‘the healthy volunteer effect’: those who volunteer to provide genomic data are typically healthier and better off than average. Different population groups also have different genetic markers for certain diseases.
If healthcare algorithms learn from this biased data, there’s a risk they’ll make biased decisions about who should be treated.
Using economics to understand the trade-offs
There is no single measure of fairness in algorithmic decision making. But an economic approach can shed light on the trade-offs involved by:
- Assessing the risks of getting the diagnosis wrong; and
- Assessing the benefits and costs of using AI for diagnosis given that risk
This article focusses on the risk assessment. One measure is predictive accuracy: how successfully does the algorithm predict the disease in question? We can then look at predictive parity: are the rates of accuracy the same for different population groups?
But also important are the risks of false negatives – incorrectly identifying an ill person as healthy – and false positives – identifying a healthy person as ill.
An example – error rates
Let’s say a hospital uses AI tech to diagnose diabetes in a group of patients. Twelve of the patients are white and sixteen are black.
The algorithm identifies three white patients as having diabetes, and is correct in two of those cases. That’s a predictive accuracy of 66%. It also flags six of the black patients as high risk, of which four do have the disease. The predictive accuracy is again 66%, so the algorithm’s designers say the tech is fair.
But using statistical analysis – like that used in researching fairness in recidivism prediction – we can look closer and find a difference in error rates.
In particular, there’s a worrying disparity in false negative rates. Among white patients it’s 33% (the system fails to detect diabetes in one of three cases), but among black participants, it’s 43% (three out of seven). The implication: among black patients, diabetes may go undetected for longer, with more serious long-term complications.
If parity is impossible, risks must be incorporated
Whenever the prevalence of a disease varies between population groups, it’s statistically impossible to achieve predictive parity and identical error rates. And the greater the difference in prevalence, the bigger the disparity in false negative rates.
Understanding these trade-offs is therefore a vital step in the development of AI diagnosis technology. And once the trade-offs are understood, the next step must be to define and measure biases in a systematic way, and to assess the benefits and costs of using AI given the risk of bias, to produce evidence-based recommendations for the systems of the future.