AI can re-identify de-identified health data, study finds

Activity trackers store a variety of personal data from users, including demographic information and exercise metrics. While businesses that collect this information have said existing practices for de-identifying user data are sufficient to ensure their privacy, a recent study casts doubt on this claim.

Researchers from the U.S. and China collaborated on a project to re-identify users in a national de-identified physical activity dataset using machine learning, a type of artificial intelligence. They used data from 14,451 individuals included in the National Health and Nutrition Examination Surveys from 2003-04 and 2005-06, all of which had been stripped of geographic and protected health information.

The research team tested whether two different AI algorithms could re-identify the patients in the dataset, both of which were fairly successful, according to study results published in JAMA last month. The algorithms identified users by learning daily patterns in step data and matching them to their demographic data.

One of the algorithms, for example, was based on a machine learning technique called the random forest method. This algorithm accurately matched physical activity data and demographic information to 95 percent of adults in the 2003-04 dataset and 94 percent of adults in the 2005-06 dataset.

"If you strip all the identifying information, it doesn't protect you as much as you'd think. Someone else can come back and put it all back together if they have the right kind of information," Anil Aswani, PhD, lead author of the study and an engineer at UC Berkeley, said in a news release.

"In principle, you could imagine Facebook gathering step data from the app on your smartphone, then buying healthcare data from another company and matching the two," he added. "Now they would have healthcare data that's matched to names, and they could either start selling advertising based on that or they could sell the data to others."

The study's findings may suggest the need for new regulations to safeguard individual's health information, according to the researchers.

"This study suggests that current practices for de-identification of [physical activity] data might be insufficient to ensure privacy," the study authors concluded. "This finding has important policy implications because it appears to show the need for de-identification that aggregates the [physical activity] data of multiple individuals to ensure privacy for single individuals."

Copyright © 2024 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.

 

Featured Whitepapers

Featured Webinars