4 Steps to Activating a Hospital's Big Data for Population Health Management

Almost 10 percent of U.S. residents now receive their healthcare from an accountable care organization.1 The momentum toward outcome-based reimbursement is starting to approach the tipping point. As ACOs begin to serve larger patient volumes, it is more imperative than ever that they activate their "big data" for effective health and risk management. "Big data" is a collection of raw data points (patient, clinical and financial data) that is housed in a hospital's information system. When collected, integrated and analyzed, these data points evolve from simple zeroes and ones into a treasure trove of actionable information.

If a healthcare provider is able to accurately estimate the amount of care or utilization needed for a group of patients, the costs and associated financial risks can be contained. The financial risk of providing care for a set of particular patients can be minimized when healthcare utilization can be accurately estimated. When statistical algorithms or data mining models are applied against a hospital's big data set, financial and clinical executives can gain critical insight on areas ranging from revenue enhancement opportunities to proactive patient management and the tangible links between cost of treatment and outcomes.  

Here are four steps to activating a hospital's "big data" assets to support population health management.

1. Collect, integrate and categorize the universe of patient data into data sets based on a specific health condition to prepare the data for analysis. Typically, all hospital, patient and billing data points are stored in disparate information technology systems within the hospital. To make the terabytes of data useful and ready for analysis, the provider must capture and categorize all relevant data elements in a data warehouse. These quantifiable elements should include patient demographics (age, gender, location), behavioral data elements (such as frequency of clinical visits, diet, proactive care), all procedural and diagnostic codes, and lab and pharmacy data.

The data sources must then be categorized, so that data elements logically relate to one another, and are verified, to ensure accuracy and validity of decisions based upon information extracted from the source data. Finally, data sets must be created for patients with a given condition based on relevant attributes (e.g, one set of patients with ICD-9 primary diagnoses related to obesity and another for patients with ICD-9 primary diagnoses of type 2 diabetes, etc).

2. Identify new patient data sets based on similar elements to lay the foundation for analysis.
The next step is to apply statistical clustering algorithms to each of these separate data sets for patients with given conditions (e.g., type 2 diabetes diagnosis, congestive heart failure) to identify patient sub-populations (clusters of patients) that share common characteristics across the thousands of data elements collected and thereby lay the foundation for analysis. The clustering algorithms will sift through the thousands of data elements in minutes summarizing the patient to identify very similar, smaller sub-populations of these patients. Typically, these sub-populations will correspond to different health outcomes and/or different financial risk related to treatment over time.

The clustering algorithms will zero in on patients with similarities such as diagnoses, demographics, frequency of treatment, lab results, medications and outcomes. For example, a sub-group of patients may be identified who have been diagnosed with obesity and are young, have frequent, but low-cost interactions likely resulting from brief checkups and have planned visits with their healthcare provider indicating that they are currently managing their condition. Clustering algorithms will typically identify between six and 12 distinct sub-populations of patients with similar behaviors or risk. This is key because the more correlations, the more accurate the analysis.

3. Estimate the health and financial risks of patient sub-populations to improve clinical outcomes while containing costs. After the clustering data mining algorithm has tagged patients as belonging to similar sub-populations, the next step is analyzing the profiles for these patients. By reviewing information such as visit frequency, average total charges per visit, common diagnosis and procedure codes, summary lab results and prescription usage, the overall health of patients in a sub-population becomes clearer and can be rated, for example, as "good," "average" or "poor." As a result, health-related risks for these patient sub-populations start to emerge.  

The patient sub-populations identified by the clustering algorithms also provide insight into the financial risks that may be associated with a given group of patients. By analyzing the variance in healthcare utilization for each cluster or sub-population of patients, providers can quickly hone in on those sub-populations that have low financial risk (where utilization can be accurately estimated) versus those sub-populations with high financial risk (where it is difficult to accurately estimate utilization). As more providers take on capitated risk it's imperative to know which patients are high-risk that can be more efficiently managed by "carving out" this sub-population into a separate payment category or partnering with an organization that may be able to more predictably manage the care of these patients.

4. Conduct ongoing, real-time data modeling to assist with early intervention, improve greater health outcomes and lower financial risk. Managing the care for thousands of patients is daunting, much less doing it in a proactive manner. The cluster models and patient sub-populations determined in the previous steps provide a technical foundation to algorithmically determine if a patient's health (and associated financial risk) is likely to deteriorate. The clustering algorithms run continuously evaluating patients interactions with the provider in real-time, alerting care coordinators of new trends, which they in turn can leverage to proactively reach out to patients whose risk profile has changed.  

The clustering algorithm identifies sub-populations for common conditions, and these sub-populations are then assessed for health- and financial-related risks. Based on the results of these assessments, patients who are transitioning to a different sub-population than originally determined are identified by the cluster models tracking changes in patient data elements that are associating them with a new sub-population.

These transitions typically happen when new data from a patient's recent encounter (or set of encounters) starts to indicate that he or she belongs to a different sub-population. If this new sub-population has poorer health or higher risk, a proactive, automated alert can be issued to caregivers in an effort to intervene with care treatment in order to improve both clinical outcomes and reduce financial risks.

Case study: Using statistical modeling to limit the health and financial risk of obese patients     

With its prevalence in the United States increasing year over year, obesity is one of the greatest risk factors for developing type 2 diabetes and a growing financial risk for health systems nationwide. A $4 billion hospital system examined the financial costs and risks associated with patients diagnosed with obesity and those diagnosed with type 2 diabetes. The goal was to identify the specific factors that indicate early on that an obese patient is likely to develop type 2 diabetes. Identifying such factors in early stages means clinicians and care coordinators could proactively intervene before obese patients would become diabetic. Early identification also links costs to treatments associated with managing these conditions, which enables an understanding of the care plan that will bring the greatest value for patients.

Analyzing the "big data"

Two data sets were prepared: one for patients diagnosed with obesity, and one for patients diagnosed with type 2 diabetes. The data sets included patient demographic data (age, gender, zip code), diagnoses codes (ICD-9 Dx codes), procedures (ICD-9 Px codes), service charges and attending physician information. Data mining cluster models were built. The first model clusters and determines the sub-populations of patients diagnosed with obesity, and the second does the same for those patients diagnosed with type 2 diabetes.

The clustering algorithms found eight distinct sub-populations of patients that were diagnosed with obesity. The largest cluster consisted of 22.3 percent of those patients diagnosed with obesity and were characterized by a relatively large number of hospital visits, high total charges and relatively young patients (less than 35-years-old). A "prototypical" patient in this cluster was male, 16-years-old and diagnosed with obesity and chest pain. Common procedures for this patient included a lipid panel and thyroid hormone treatment.  

The clustering algorithms identified 10 distinct sub-populations within the type 2 diabetes subset. One subset consisted of patients who were managing their condition — meaning they had regular visits with low total charges. This "proactive" care sub-population accounted for 13.5 percent of the type 2 diabetes population. A second sub-population consisted of patients not managing their care well. These patients had fewer visits with higher total charges per visit and accounted for 6.8 percent of the type 2 diabetes populations. The remaining sub-populations segmented type 2 diabetes patients by age and frequency of inpatient and outpatient visits and into various intermediate health- and financial-related risk levels.  

Embedded with information about the attributes driving sub-population membership, the data-driven, proactive monitoring system tracked those young patients diagnosed with obesity through their encounters with the hospital system and determined their risk for becoming diabetic. Automated alerts were developed and sent directly to clinicians if obese patients were tracking towards the type 2 diabetes sub-population. Based on this information, care coordinators and physicians intervened to increase the likelihood that the patient transitioned to the sub-population that was managing their condition well over time.

Applying statistical clustering algorithms to the "big data" assets that a healthcare organization already has can lay the technical groundwork for jump-starting better health outcomes and reduced financial risk. Applying data mining and statistical predictive modeling techniques to these patient-focused data sets reveals patterns and correlations between costs and outcomes. Data analytics gave ACO managers the right tools and firepower to improve patient outcomes while maintaining costs and minimizing financial risk.


1 N. Gandhi and R. Weil.  The ACO Surprise. November 2012 (Fierce Practice Management, 11/28/12).

Paul Bradley, PhD, is chief scientist at MethodCare Inc. where he oversees research and development functions, including the development of new processes, technologies and products.  Dr. Bradley earned his PhD and MS degrees in computer science and a BS degree in mathematics from the University of Wisconsin.

More Articles on Big Data:

The Rise of Big Data in Hospitals: Opportunities Behind the Phenomenon
Should Hospitals Use Automated Software to Handle 3 Big Data Issues?

© Copyright ASC COMMUNICATIONS 2019. Interested in LINKING to or REPRINTING this content? View our policies by clicking here.


Top 40 Articles from the Past 6 Months