De-identified patient data: Treasure trove for research or privacy nightmare?

EHR vendors Epic and Cerner have each recently unveiled initiatives that aim to propel the use of de-identified patient data for medical research.

Last month, Epic CEO Judy Faulkner announced the EHR giant's new program called Cosmos, which is designed to mine data from millions of patient medical records at various health systems across the country in support of treatment research.

Currently, Cosmos collects de-identified patient data from 8 million individuals at nine health systems. An additional 31 healthcare organizations have pledged to participate in the program, which will bring the total anticipated number of included records to 25 million in the next few months.

Cerner partnered with the Duke Clinical Research Institute in August to pilot the Cerner Learning Health Network, which aims to automate data collection and expand researchers' access to patient health data. At the Durham, N.C.-based research institute, Duke researchers will use Cerner's network to analyze the use and impact of proven therapies for cardiovascular disease.

The study will examine de-identified patient data from Columbia-based University of Missouri Health Care and Ascension Seton Medical Center Austin, in partnership with Dell Medical School at the University of Texas at Austin. After the pilot and study finish, Cerner clients will be able to leverage the EHR vendor's HealtheDataLab tool, which uses Cerner's big data and insights platform in conjunction with the Learning Health Network to aggregate de-identified patient data from both Cerner and non-Cerner EHRs.

De-identified patient data extracted from Cerner's initiatives can be used to create research models and algorithms to help aid care decisions, such as early detection of patients who may be at risk for costly episodes of care.

Privacy concerns

Despite the medical research advantages that de-identified patient data from EHRs may present, there have been some privacy concerns surrounding its use.

In June, University of Chicago Medical Center and Google were sued for allegedly violating HIPAA by sharing thousands of patients' records without properly de-identifying the data. The plaintiff, a former University of Chicago Medical Center patient, claimed the records included date stamps of when patients checked in and out of the hospital as well as physicians' notes. The records were used for the technology giant's research on predictive medical data analytics technology.

University of Chicago Medical Center and Googled filed motions to dismiss the lawsuit in August, claiming they used secure and HIPAA-compliant data sharing methods. The organizations also allege the plaintiff never claimed Google identified him but just that the company has the technological ability to do so.

While de-identified data is increasingly being used for research, HHS warns there is a small chance that even properly applied de-identified data could still be re-identified.

"Although the risk is very small, it is not zero, and there is a possibility that de-identified data could be linked back to the identity of the patient to which it corresponds," according to HHS's guidance for de-identification of protected health information. HHS does not restrict the use of de-identified data under the privacy rule because it is not considered PHI.

Artificial intelligence appears to be a new potential way to re-identify de-identified health data. Researchers from the U.S. and China collaborated on a project last December to analyze whether two different AI algorithms could re-identify patients from a national de-identified physical activity dataset of activity tracker users. Both algorithms were fairly successful and identified users by learning daily patterns in step data and matching them to study participants' demographic data, according to the study published in JAMA.