We recently highlighted a commentary on the importance of federated learning to protecting patient data privacy, and as if on cue, researchers at the Technical University of Munich (TUM) and Imperial College London have published a study that provides a perfect example.
Specifically, the team, in partnership with the non-profit OpenMined, developed a unique combination of AI-based diagnostic processes for radiological image data that safeguard data privacy. The processes come in the form of a deep learning algorithm that helps to classify pneumonia conditions in x-rays of children.
“Secure and privacy-preserving machine learning (PPML) aims to protect data security, privacy and confidentiality, while still permitting useful conclusions from the data or its use for model development,” the team wrote in their report, which was published at Nature Machine Intelligence. “In practice, PPML enables state-of-the-art model development in low-trust environments despite limited local data availability. Such environments are common in medicine, where data owners cannot rely on other parties’ privacy and confidentiality compliance. PPML can also provide guarantees to model owners that their model will not be modified, stolen or misused, for example by its encryption during use. This lays the groundwork for sustainable collaborative model development and commercial deployment by alleviating concerns of asset protection.”
In a statement, Alexander Ziller, a researcher at the Institute of Radiology, explained, "For our algorithm we used federated learning, in which the deep learning algorithm is shared - and not the data. Our models were trained in the various hospitals using the local data and then returned to us. Thus, the data owners did not have to share their data and retained complete control.”
To prevent identification of institutions where the algorithm was trained, the team applied another technique: secure aggregation. They combined the algorithms in encrypted form and only decrypted them after they were trained with the data of all participating institutions. And to ensure 'differential privacy' - i.e. to prevent individual patient data from being filtered out of the data records - the researchers used a third technique when training the algorithm.
"Ultimately, statistical correlations can be extracted from the data records, but not the contributions of individual persons," said project leader Georgios Kaissis of the TUM Institute of Medical Informatics, Statistics and Epidemiology.
According to Daniel Rueckert, Alexander von Humboldt Professor of Artificial Intelligence in Healthcare and Medicine at TUM, "Our methods have been applied in other studies, but we have not yet seen large-scale studies using real clinical data. Through the targeted development of technologies and the cooperation between specialists in informatics and radiology, we have succeeded in training models that deliver precise results while meeting high standards of data protection and privacy."
The team expects that combination of the latest data protection processes will also facilitate cooperation between institutions.
Photo by thodonal/Getty Images