On average, AI and machine learning can deliver faster, more consistent diagnoses than human experts, but the limited capacity for nuance in many algorithms can lead, in some cases, to overdiagnosis.
In a recent commentary at NEJM, Adewole Adamson, MD, an assistant professor in the department of internal medicine at Austin, Texas-based Dell Medical School, and H. Gilbert Welch, MD, a senior investigator in the center for surgery and public health at Boston-based Brigham and Women's Hospital, describe particular limitations of AI in diagnosing early-stage cancer.
In a nutshell, they say, the problem is that “(T)here is no single right answer to the question, ‘What constitutes cancer?’ The clinical interest is in a dynamic process: cancer is a tumor destined to cause symptoms (by means of local infiltration or metastasis to distant sites) and to lead to death if left untreated. Pathological interpretation, on the other hand, is based on a static observation: cancer is defined on the basis of the appearance of individual cells, the surrounding tissue architecture, and the relationship between these characteristics and various biomarkers.”
This lack of a “gold standard, “the authors say, means algorithms are trained only to recognize designated patterns in imaging data, and are therefore likely to classify as cancerous certain abnormalities that a human expert may conclude to be asymptomatic or completely benign.
“Diagnoses of early-stage cancer made using machine-learning algorithms will undoubtedly be more consistent and more replicable than those based on human interpretation,” they recognize. “But they won’t necessarily be closer to the truth — that is, algorithms may not be any better than humans at determining which tumors are destined to cause symptoms or death.”
On a practical level, the authors identify some reasons to worry that machine learning will aggravate the problem of overdiagnosis. First, algorithms are much faster than humans, to put it mildly, and as the technology spreads it will become cheaper to use AI than rely on humans to interpret slides. “Higher throughput — more tissue, more patients — will only increase opportunities for overdiagnosis.”
Still, they conclude, while “(m)achine learning cannot solve the gold-standard problem, . . . it could further expose it. Ultimately, what matters to patients and clinicians is whether the diagnosis of cancer has relevance to the length or quality of life. We believe that the possibility of training machine-learning algorithms to recognize an intermediate category between 'cancer' and 'not cancer' should be given serious consideration before this technology is widely adopted. Highlighting the existence of gray areas could present an important opportunity for pathologists to discuss decisions about what constitutes cancer.”