Governance group releases recommendations for AI testing guidelines

Following the DOME recommendations, say the stakeholders, healthcare providers and researchers will find emerging AI solutions increasingly relevant and useful.
Jeff Rowe

As the uses for AI and Machine Learning (ML) in healthcare have multiplied rapidly, so has the need for ensuring that new AI models are validated accurately and that uniform standards for reporting ML-based analyses are in use.

To that end, an international group of scientists, including the ELIXIR Machine Learning Focus Group, a subset of a European intergovernmental organization comprised of life scientists, computer scientists and support staff, has developed a set of guidelines for better reporting standards for AI methods aiming to classify biomedical data.  

Published in the journal Nature Methods, the document provides a check list and recommendations for anyone aiming to build or publish a supervised classification method for the biological and medical sciences.

“The popularity of machine learning and deep learning nowadays gives the impression that novel AI tools can be quickly designed without much thought about the data and the actual objectives. Such is not the case,” said Professor Tom Lenaerts, a member of the ELIXIR Machine Learning Focus Group, in a statement. “Inaccuracy is easily achieved when one does not have full understanding of the nature of the data and features used in a predictive method. In the medical and biological fields, more than half of the time should be spent on designing a high-quality data set and finding the right set of features to train the method.” 

Examples of such methods are machine learning predictors that try to identify, based on genetic and other data, whether someone suffers from a particular rare disease or predictive methods that aim to identify the drug to which a cancer patient would respond best.  

According to the report, through a community-driven consensus the group developed “a list of minimal requirements asked as questions to ML implementers that, if followed, will help to assess the quality and reliability of reported methods more faithfully. We have focused on data, optimization, model and evaluation (DOME) as each component of an ML implementation usually falls within one of these four topics. We do not propose new specific solutions, only recommendations. A reporting checklist is also provided. Our recommendations are made primarily for the case of supervised learning in biological applications in the absence of direct experimental validation, as this is the most common type of ML approach used.”

ELIXIR, initially established in 2014, is now a mature intergovernmental European infrastructure for biological data and represents over 220 research organizations in 22 countries across many aspects of bioinformatics.

“Notwithstanding the benefits that novel predictive AI methods may bring to molecular or disease understanding and potentially patient care, they often suffer from reproducibility and clarity issues, and in worst case design and bias issues associated with the data and methods used in the predictor,” Lenaerts added in the statement. “Inadequate explanations on the main parts of these methods will not only lead to distrust but will also block the transfer of the suggested approaches to clinics and thus patient care.  By adding the information requested in this paper to your own manuscript or in an online document that anyone can consult, it becomes possible to separate the wheat from the chaff and raise the standards for machine learning products in these domains to a higher level.”

Photo by monsitj/Getty Images