Data normalization for semantic translation

Why normalizing your clinical and claims data into standard terminologies is critical to supporting forward-thinking initiatives such as big data analytics, population health management and semantic interoperability

A classic good news/bad news situation

First the good news: the steady growth of healthcare information technology (HIT) in recent years has ushered in a new era of automation. The future of healthcare will be built on data—actionable patient information to support population health management and analytics to improve outcomes.

But there's a downside to this unprecedented adoption of healthcare technology. Today, patient data is scattered across an array of rapidly proliferating IT systems, each with its own way of representing clinical terms. For instance, does a patient have 'MI', 'Myocardial Infarction' or a 'bad ticker'?

The lack of a common clinical vocabulary across disparate systems creates communication barriers, which hinders the ability to coordinate care and aggregate data for analysis. These disparate terminology lexicons must be normalized (semantically translated) into standard terminologies so that the meaning of the clinical data is unambiguous.

Trends Driving Data Normalization

Health information exchange, interoperability, big data, analytics, quality measurement, population health and risk sharing—all leading buzzwords across the healthcare landscape that hold great promise towards moving the industry closer to improving quality, reducing costs, and increasing patient satisfaction. As the industry looks to increase momentum with these movements, key stakeholders are increasingly realizing that data normalization is a fundamental component of the equation.

There is a lot of noise in the system today, but two key trends consistently boil up to the top: semantic interoperability to support new delivery models and clinical data repositories (CDRs) to support big data analytics.

Semantic Interoperability to Support New Delivery Models

Emerging healthcare delivery models such as Accountable Care Organizations, patient-centered medical homes and pay-for-performance all depend to some degree on interoperability—the ability to share data.

By making use of data transport and syntactic standards such as HL7, which define messaging structure, today's HIT maturity has gone a long way in establishing the foundational elements required for interoperability among disparate IT systems. Healthcare organizations are now turning their attention to semantic interoperability—the ability of IT systems to understand the meaning of the data that is being shared. Two local drug codes, for instance, may describe the same drug in different terms. A normalized information model lets organizations share a common terminology and promotes semantic interoperability.

CDRs to Support Big Data Analytics

A CDR is an aggregation of granular patient-centric health data usually collected from multiple source IT systems and intended to support multiples uses such as collecting data for quality measures or identifying at-risk populations for intervention. Collectively, the industry refers to these initiatives as Big Data.

The fragmentation of data contained within CDRs is a significant obstacle to leveraging Big Data. Clinical information, for example, might be stored in a variety of forms, including 1) text (such as "Diabetes Mellitus"), 2) standardized diagnosis codes generated by a claims system (such as an ICD-9-CM code), 3) standardized problems generated by an EMR (such as an SNOMED CT ® code) or 4) local, proprietary codes that have meaning only in internal applications. A data normalization solution acts as a mediation layer that enables enterprises to interrogate their data irrespective of the underlying terminology contained within the CDR.

Laying the Foundations

As data is aggregated from disparate systems across the continuum, organizations must be able to 1) standardize local content to terminology standards and 2) semantically translate data between standards to eliminate ambiguity of meaning.

Standardizing Local Content to Terminology Standards

The key to making analytic initiatives work is ensuring the ecosystem consists of structured data—data that can be mined and shared with other systems. Structured data is available in a controlled format or terminology, rather than free text.

Take the case of tracking hemoglobin A1c values across a multi-institutional health system. The same test may be referred to as "HbA1c" by one institution, "A1c" at a second and "glycosylated hemoglobin" at a third. Normalizing these to a common LOINC code allows for a comprehensive apples-to-apples, cross-institutional view for population management.

Semantic Translations Among Different Terminology Standards

There are a myriad of terminology standards used across systems—each serving a distinct purpose. Organizations that can semantically translate (or map) between these standards are better positioned to use the data generated by these systems for secondary analysis.

As an example, consider the case where a health system's CDR has been populated with inpatient data coded in ICD-10-PCS and outpatient data coded in CPT-4 (Current Procedural Terminology). In order to interrogate the data, the health system needs the ability to semantically translate between these code systems. Data normalization enables health systems to look for key clinical conditions irrespective of the underlying terminology used to populate their CDRs.


In general, data normalization establishes a foundation for achieving semantic interoperability and creates an infrastructure that enables data sharing and aggregation.

By integrating all the claims and clinical data within a healthcare delivery organizations, data normalization enables organizations to provide an accurate picture of health across the patient population it manages.


Jason Wolfson is Vice President of Marketing and Product Management at Health Language, Wolters Kluwer Health, Clinical Solutions. Jason can be reached at


The views, opinions and positions expressed within these guest posts are those of the author alone and do not represent those of Becker's Hospital Review/Becker's Healthcare. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them.

Copyright © 2024 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.


Featured Whitepapers

Featured Webinars