Sponsored

De-identified ≠ risk-free: What health systems need to know about data linkage

Advertisement

Linking de-identified datasets can power game-changing healthcare research and AI — but can also raise serious privacy risks if not handled with care.

During October’s Becker’s Healthcare webinar sponsored by Privacy Analytics (an IQVIA company), Brian Rasquinha, PhD, Associate Director of Solution Architecture at Privacy Analytics, discussed common pitfalls healthcare organizations can fall into when de-identifying and linking patient data — and steps they can take to improve patient privacy while enabling the safe use of data.

Five key insights were:

  1. HIPAA de-identified sources aren’t a blanket safeguard. When linking de-identified data assets,  including data from EHRs, devices, prescriptions, consumer wearables and mortality data, re-identification risks can increase, undermining the validity of existing HIPAA protections.

  2. Organizations need solid tokenization strategies and tactics. The tokenization approach defines how linkage occurs: what identifiers can be compared, and how. Choosing the right approach will maximize the benefits of linkage, while minimizing data errors.

  3. Good data linkage must account for data variability while protecting patient privacy. There are different approaches to matching patient information across different datasets, such as matching by patient IDs, birthdates, names, or other combinations of these (and other) identifiers, which are obfuscated for matching by tokenization.

    However, real-world data can be challenging to match. When inevitable gaps, data errors, or other inconsistencies arise, the matching approach needs to be flexible enough to avoid excessive false negatives – splitting one patient’s information into multiple patient records – or false positives – overly-generous matching that combines two patients into one record.

  4. Tokenization addresses privacy issues raised by linking on identifiers. “The idea of tokenization is that instead of sharing private identifiers, which would be of concern from a privacy perspective, tokenization replaces the identifiers with scrambled values,” Dr. Rasquinha said.

    Good tokenization will have three essential features: it must be practically irreversible, be repeatable, and preserve distinguishability of patients. With these features, identifying information is transformed while retaining the aspects needed to link without exposing identifying values.

  5. Expert determination ensures tokenized data is truly de-identifiable. This method, which relies on statistical analysis, is one of two de-identification methods available under HIPAA. While rules-based Safe Harbor can be more straightforward to implement, organizations turn to expert determination when they need more detailed information, for applications like AI, and to ensure the tokenization maintains patient privacy.

“As [organizations] get more and more information about patients, they need to balance risks from the identifiability perspective, as well as the benefits from the utility perspective which motivates the linkage in the first place,” Dr. Rasquinha concluded.

To register for upcoming webinars, click here.

Advertisement

Next Up in Innovation

Advertisement