How “real” do data sets need to be?

The pressing need for real-world data collection can be an impediment to AI development, but emerging synthetic data may be just what the doctor ordered.
Jeff Rowe

Can computers effectively generate data sets that then are used to train other computers?

That’s one way to sum up the underlying question our colleague Bill Siwicki recently discussed over at HealthcareITNews with Michael Naber, CEO and cofounder of Simerse, a company that creates “synthetic data” to train AI and machine learning models.

According to Naber, AI and machine learning models “can be trained faster, more cost-effectively and without the constraints of real-world data collection.”

Simply put, synthetic data is computer-generated data that imitates real-world data, and Naber says health IT executives should prepare for the use of synthetic data because, in his view, it represents the future of, among other things, robotic surgery and data-powered medicine.

“Companies that want to be on the forefront of this revolution should be seriously investigating the concept of synthetic data,” he argues, “or at least partnering with a company focused on synthetic data. Healthcare CIOs may not want to build this capability in-house, but should prepare their businesses for lowered AI model-training costs as a result of this oncoming technological wave.”

Looking into the future, Naber says “AI likely will be assisting doctors not only in diagnosing injuries, but in giving medical recommendations as well. Synthetic datasets can help evaluate the impact of AI-powered diagnosis by creating a circular feedback loop, and may even be able to act as a testbed for AI-derived recommendations.”

One particular use for synthetic data that Naber says is coming soon is training machine learning algorithms “to detect diseases, bone fractures and a whole number of other ailments from medical images. Synthetic data will only accelerate this research. By leveraging synthetic data, researchers will be able to artificially create synthetic injuries in medical images, and then teach computers to detect and analyze those medical images.”

Not that such uses are going to be “shovel-ready” tomorrow, but Naber thinks it will be important healthcare and life sciences companies “to budget in research and development expenditure to ensure that companies are prepared for the coming advancements in life science and healthcare robotics. . . . There are so many applications for AI, machine learning and computer vision within the field of healthcare, that I think it's important for companies to not bite off more than they can chew.

“Synthetic data and simulations as a whole will be challenging to create and will have to be tailored to a particular task, so companies should go after one robotic action at a time rather than trying to solve everything under the sun with AI.”

Photo by sanjeri/Getty Photos