New ‘privacy-first’ machine learning set to drive AI expansion

While machine learning has the potential to transform disease diagnosis and detection, it needs to get past patients’ reluctance to give up access to sensitive information.
Jeff Rowe

It’s funny how the success of a technological development often comes down to timing.

To wit, in a recent article in MIT Technology Review, writer Karen Hao describes how, a couple years ago, Google unveiled a new approach to machine learning to which few AI stakeholders paid much attention.  Now, however, they’re sitting up and taking notice.

The reason, she says, is because “Federated Learning,” the name given to the ability to learn from a series of data sources distributed across multiple devices, as opposed to the data needing to be all in one place, is a “ privacy-first approach (that) could very well be the answer to the greatest obstacle facing AI adoption in healthcare today.”

Hao points out that in healthcare, “despite many studies showing its promise for detecting and diagnosing diseases, progress in using deep learning to help real patients has been tantalizingly slow.”  One big cause for that slowness, according to Ramesh Raskar, an MIT associate professor of computer science whose research focuses on AI in health, “is a false dichotomy between the privacy of patient data and the utility of the data to society. People don’t realize the sand is shifting under their feet and that we can now in fact achieve privacy and utility at the same time.”

While current state-of-the-art algorithms require immense amounts of data to learn, Hao explains that with federated learning an AI model can be trained “using data stored at multiple different hospitals without that data ever leaving a hospital’s premises or touching a tech company’s servers. It does this by first training separate models at each hospital with the local data available and then sending those models to a central server to be combined into a master model. As each hospital acquires more data over time, it can download the latest master model, update it with the new data, and send it back to the central server. Throughout the process, raw data is never exchanged—only the models, which cannot be reverse-engineered to reveal that data.”

Some challenges remain, but according to Ramesh Raskar  these challenges aren’t insurmountable. 

“More work needs to be done, but it’s mostly Band-Aid work,” he said, adding that, over time, the applications of distributed learning could also extend far beyond healthcare to any industry where people don’t want to share their data.

“In distributed, trustless environments, this is going to be very, very powerful in the future,” he predicted.