Study finds AI matches humans in analyzing images

For future studies, researchers recommended using deep learning systems in clinical trials to assess whether patient outcomes improved compared with current practices.

A new study suggests current AI tools can at least equal human experts when it comes to classifying diseases using medical imaging.  But the research team also cautioned against far-reaching projections concerning AI’s overall potential given the small number of studies thus far.

According to the report, which was published recently at the Lancet Digital Health, “medical imaging is one of the most valuable sources of diagnostic information but is dependent on human interpretation and subject to increasing resource challenges. The need for, and availability of, diagnostic images is rapidly exceeding the capacity of available specialists, particularly in low-income and middle-income countries.”

While AI has been considered a potential tool to help human analysts, the report noted that, to date, there has been little in the way of a systemic review of the body of evidence supporting AI-based diagnosis.  Indeed, “an initial search turned up more than 20,000 relevant studies. However, only 14 studies – all based on human disease – reported good quality data, tested the deep learning system with images from a separate dataset to the one used to train it, and showed the same images to human experts.

“The team pooled the most promising results from within each of the 14 studies to reveal that deep learning systems correctly detected a disease state 87% of the time – compared with 86% for healthcare professionals – and correctly gave the all-clear 93% of the time, compared with 91% for human experts.”

Summing up the study, Dr Xiaoxuan Liu, the lead author of the study, observed, “There are a lot of headlines about AI outperforming humans, but our message is that it can at best be equivalent.” 

Moreover, the researchers noted a number of problems with the quality of related research efforts thus far, including “assessing deep learning diagnostic accuracy in isolation, in a way that does not reflect clinical practice,” a lack of prospective studies done in real clinical environments and a wide range of metrics used to report diagnostic performance in previous deep learning studies.

Nonetheless, the researchers were optimistic about the growth of AI as an imaging analysis tool.

Prof. Alastair Denniston, a co-author of the study, said deep learning systems could act as a diagnostic tool and help tackle the backlog of scans and images, while Liu added they could prove useful in places which lack experts to interpret images.