Diamonds in the rough: Unstructured clinical data buries valuable information for population health management and analytics

Jonathan M. Niloff, MD, MBA, Chief Medical Officer, Diameter Health - Wednesday, August 15th, 2018

The promise of precision medicine and population health analytics to revolutionize healthcare is clear.

However, the effective transmission and consumption of clinical data from the electronic health record (EHR) is a key success factor for many facets of population management and related analytics. And, all too frequently, chaotic and unstructured clinical data defeats the promise of advanced analytic techniques.

Granted, every certified EHR is required to export a standard clinical document, most commonly a CCD or CCD-A. There are, however, several challenges with these documents and how they are used in clinical scenarios. These effectively bury important clinical data and make the data difficult to consume in downstream systems. Much of the data is overlooked, and never put to use. Why does this happen? In this article we’ll discuss three factors that contribute to dirty clinical data: 1) variation between and within the EHR, 2) variation between clinicians documenting care and 3) clinical data being stored as unstructured text.

The clinical document is not a tight standard. Every EHR produces clinical documents differently. Even documents for different versions of the same brand of EHR can be different. This heterogeneity comes in many forms. The terminology is not uniform. Consider the documentation for “Heart failure”, for example. There are at least 91 different ways to document heart failure, including:

• 10091002 High output heart failure (disorder)
• 10335000 Chronic right-sided heart failure (disorder)
• 10633002 Acute congestive heart failure (disorder)
• 111283005 Chronic left-sided heart failure (disorder)
• 128404006 Right heart failure (disorder)
• 428.0 Congestive heart failure, unspecified
• 428.1 Left heart failure
• 428.20 Systolic heart failure, unspecified
• 428.21 Acute systolic heart failure
• 428.22 Chronic systolic heart failure

Transforming and normalizing these data for analysis without technology is a long, arduous task.
We also often see free text in structured sections of the clinical documents and data in the wrong sections. For example, it is common to see immunizations in the procedure section rather than where they belong in the immunization section.

Compounding all this document heterogeneity is the variation in which clinicians document care in their EHRs. Consider a medication like atorvastatin (used for cholesterol control). Among the representations seen in the medical record include:
• Lipitor Oral
• 1483793 atorvastatin calcium propylene glycol solvate
• 1297766 atorvastatin calcium trihydrate
• 63629-3366 Lipitor 80 mg Oral Tablet
• 63629-3366-1 Lipitor 80 mg Oral Tablet ,30 tablet, film coated in 1 bottle

Among hundreds of other variations…

Let’s assume a data analyst was tasked with studying the use of the drug category “Atorvastatin.” To build a complete query the analyst would need to include all the variants of the category and the brand names that exist within the clinical data. A more effective approach would be leveraging technology which enriches the data so all these variants may simply be queried by their category name: Atorvastatin.

All this document variability has consequences. Substantial amounts of data can be lost when the documents are used for clinical applications or as substrate data for population health analytics.
As another example, consider an actual medication prescription like:
• Furosemide 20mg Furosemide 20mg PO qD

To be most analytically useful, technology is required to interpret and add value to this unstructured data. This occurs in a series of steps:

1. Codification - Context aware technology can be used to assign that this medication should be associated with RxNorm code 4603
2. Targeted Natural Language Processing (NLP) can be used to normalize the medication instructions which are constructed of Latin acronyms: "take 20 mg by mouth (PO, or “per os” for Latin fans) once daily (qD, or “quaque die” in Latin)"
3. Additional metadata is also added to the medication: Furosemide can be categorized as a medication in the “Loop Diuretic” family

Analysts using data normalized and standardized in this way can more productively study prescription patterns for Loop Diuretics without hunting and pecking for Furosemide and all the other variations that exist in clinical records.

Clinicians face another challenge. Each encounter generates a new clinical document. And much of the data is redundant among these documents. When clinicians are seeing patients in the clinic and want to review a clinical history from CCDs or CCD-As, they often have need to look through multiple, duplicated multi-page documents to find the clinical data they are looking for. This is time-consuming and inefficient. One aggregated de-duplicated document is much more efficient to review. And this problem only deteriorates over time as patients accumulate more such documents.

Fortunately, technology is available to help clinicians overcome these challenges. Solutions that normalize the data in CCDs and CDAs or that create an aggregated single document with all the clinical data deduplicated greatly increase the useful data yields from clinical documents. There are multiple use cases. The aggregated de-duplicated CCD or CCD-A can be reviewed much more quickly, increasing practice efficiency and allowing more time to talk with patients. The aggregated deduplicated documents can also be used to populate a longitudinal patient record. The normalized documents populate analytic solutions and population health analytic databases with much more robust and complete data than the native CCDs and CCD-As.

Diamonds in the rough are simply not sufficient. Technology to normalize, standardize, deduplicate and enrich disparate sources of clinical date is critical both to the care of a patient with chest pain in the ER as well as the study of opioid use in large patient populations. In each case, the availability of the right information, at the right time, in the right place improves patient outcomes and reduces cost. Those are goals we can all support.

Jonathan M. Niloff, MD, MBA, is Chief Medical Officer of Diameter Health.

Diamonds in the rough: Unstructured clinical data buries valuable information for population health management and analytics

Featured Learning Opportunities

Featured Whitepapers

Featured Webinars