ONC launches challenge to expand research data supply

The Synthetic Health Data Challenge encourages researchers and developers to validate the realism of synthetic health records generated by Synthea and spur novel uses of synthetic health data.
Jeff Rowe

Large amounts of health data are essential to an array of academia, research, industry, and government initiatives, but privacy issues are a significant barrier when it comes to ensuring adequate amounts of anonymized data.

To overcome those barriers, researchers have increasingly turned to so-called “synthetic” data, which can reflect the characteristics of a population of interest and be a useful resource for researcher, but not actually be real data.

In an effort to encourage the development of big data analytics tools using realistic, but not real, health record data that contains a complete medical history from birth to death, ONC has launched the Synthetic Health Data Challenge.

“Synthetic data like those created by Synthea can augment the infrastructure for patient-centered outcomes research by providing a source of low risk, readily available, synthetic data that can complement the use of real clinical data,” said Teresa Zayas-Cabán, ONC chief scientist, in a statement.

Synthetic health data can be used without cost or restriction and is meant to support the specific interests of researchers and developers testing the effectiveness of tools, data analytics algorithms, and disease modeling approaches.

The challenge is part of ONC’s Synthetic Health Data Generation to Accelerate Patient-Centered Outcomes Research (PCOR) project. Through the Synthetic Health Data Challenge, participants will create and test innovative and novel solutions to further refine the capabilities of Synthea, an open-source synthetic patient generator that models the medical histories of synthetic patients.

Said Zayas-Cabán, “By enhancing Synthea with new clinical data modules or demonstrating novel uses of Synthea-generated synthetic data, Challenge participants will support PCOR research and development efforts by enhancing PCOR researchers' ability to conduct rigorous analyses and generate relevant findings.”

Participants can submit their challenge Phase I proposals in one of two categories: Enhancements to Synthea or novel uses of Synthea generated synthetic data. The best proposals will move on to Challenge Phase II, prototype or solutions development.

Phase II will feature awards totaling up to $100,000, with up to two first-place winning solutions receiving $25,000 each; up to two second-place solutions receiving $15,000 each; and up to two third-place solutions receiving $10,000 each.

According to HHS, the overall focus of the project is to enhance the ability of Synthea to produce high-quality synthetic data for opioid, pediatric, and complex care use cases.

The project will reach its goal by: identifying and convening a multidisciplinary panel of experts to provide insights regarding the selection of use cases and module development; developing opioid, pediatric, and complex care data generation modules for Synthea to increase the number and diversity of synthetic patient health records to meet PCOR needs; and engaging the broader community of researchers and developers to validate the realism and demonstrate the potential uses of the generated synthetic health records through a challenge.