Study unlocks value of unstructured data for predictive analytics

Unstructured data has long posed hurdles for health system analytics initiatives, but AI may be the way to unearth a depth of new insights, helping population health and value-based care efforts.
Jeff Rowe

Structured or unstructured? That is the question.

Well, the full question would probably go something like: Which kinds of data, in the form of medical notes, have greater potential when it comes to medical research?  Structured or unstructured?

As our colleague Mike Miliard pointed out recently in HealthcareIT News, the structured clinical notes found in EHRs are obviously more readily accessible, but then he points to a recent report in the Journal of the American Medical Informatics Association that “suggests that real-world data captured in unstructured notes offers more accuracy when trained algorithms are used to mine it.”

In other words, it’s a question for which AI may have an answer.

"With growing availability of digital health data and technology, health-related studies are increasingly augmented or implemented using real world data," wrote the researchers, led by Tina Hernandez-Boussard, associate professor of biomedical informatics, data science and surgery at Stanford University School of Medicine.

"Recent federal initiatives promote the use of RWD to make clinical assertions that influence regulatory decision-making," the researchers said. "Our objective was to determine whether traditional real world evidence techniques in cardiovascular medicine achieve accuracy sufficient for credible clinical assertions, also known as 'regulatory-grade' RWE.”

Using six years' worth of de-identified EHR data, structured and unstructured EHR data were mined for a specified set of clinical concepts. The structured data were mined using standard query techniques while AI was used for the unstructured EHR data.

According to the researchers, "The dataset included 10,840 clinical notes. Individual concept occurrence ranged from 194 for coronary artery bypass graft to 4502 for diabetes mellitus."

The results?

With structured EHR data, or EHR-S, "average recall and precision were 51.7% and 98.3%, respectively," according to the report. For unstructured data (EHR-U) those numbers were 95.5% and 95.3%.

Researchers concluded from the research that, "overall, EHR-S did not meet regulatory grade criteria, while EHR-U did. These results suggest that recall should be routinely measured in EHR-based studies intended for regulatory use. Furthermore, advanced data and technologies may be required to achieve regulatory grade results."

As Miliard sums up the implications, “Now that AI tools are sufficiently mature and widespread to extract some of that value, more and more researchers will be making use of real-world evidence, and both federal agencies and technology developers are honing their efforts to ensure providers and life sciences organizations can capitalize on those insights.”