Shine a light on the unknown: Accurately profiling patient privacy risk in healthcare

Identifying the Abnormal
When considering risk in the healthcare environment – risk involving patient privacy and protected health information – profiling is critical and represents a key component of a layered analytics approach.

Profiling enables you to pinpoint when somebody has performed an action that is not only abnormal, but is also potentially risky to the organization. It does so by comparing the actions of each individual either against their own personal norms or against the normal behaviors of others in that role.

For example, if a certain hospital pharmacist usually looks at 10 elderly patients' records per week, but then looks at 80 records in a specific week, profiling would note that activity as abnormal and increase the pharmacist's risk score. Likewise, if lab technicians on the whole do not modify demographic data, but one technician does so, an alert would be generated.
Profiling provides the perfect complement to the rules that are part of an organization's risk engine. Rules are excellent at zeroing in on defined risk factors (e.g. known user actions), but they have their limitations. For instance, it is impossible to create a rule for every risky behavior. There are simply too many variables at play. Plus, those variables are dynamic: jobs change, acceptable norms change, and people change. Any attempt to create a universal set of rules to manage all risk would be fraught with frustration. The number of rules would continually grow to keep up with new risk factors, and a portion of the rules would be out of date at any time.

Profiling does not have these limitations. By sounding the alarm for abnormal behavior compared simply to typical behavior, profiling can identify anomalous behavior for many risk factors – even for those that have not been defined previously. While rules handle known risk factors, profiling is able to shine a light on the unknown.

The Data that Feeds Profiling
All profiling is not created equal, however. Some profiling has more significance than others, that is, a higher level of accuracy when it comes to defining valid risks. Two aspects come into play when looking at significance: the data that feeds the profiling, and the profiling tools that are applied to that data.

First, the data. A profile, put simply, is only as good as the data that goes into it. The more data points there are and the richer the information those data points represent, the better the profiling will be.

Think about it in terms of a fingerprint match that will place someone at the scene of a crime. If two fingerprints have seven data points in common, that doesn't give you much confidence in the accuracy of a match – hundreds or thousands of prints might match those seven data points equally well. But if there are 20 data points in common between two fingerprints, you can have a high degree of confidence that you have caught the culprit. Similarly, the fewer data points there are in a profile, the more likely you are to get false positive alerts. The more data points there are, the more confidence you can have that your alerts are pointing to real issues.

There are two primary ways of collecting data: log files and network capture. Log files are the most basic and limited. Because log files were originally intended for application performance insight, they contain a constrained number of data points that communicate limited information. This limited information often does not contain the user behavior data necessary to perform accurate profiling.

For example, a log file may note that a worker conducted a search. However, it typically won't show the results that search returned, how many records were involved, if there was a patient SSN included, etc. There are also codes, sub-codes, and additional data points at the application layer that aren't in the logs, all of which can help the analytics engine make better decisions and reduce false positive alerts.

Network capture, on the other hand, provides a real-time view and a record of every activity that takes place between user and host – including what is happening at the network and application layer. For instance, when a worker attempts to "Break the Glass" network capture will show when the attempt was made, the "Break the Glass" security pop-up that appeared, the reason code selected for access, what patient information screens were then viewed, and what actions were taken on those screens. The same is true for Search and Print actions, in addition to the user action, network capture includes the PHI and PII that the worker sees as the result of the action. The depth and richness of that data provides a more complete profile, which can then be used to more accurately detect abnormal behavior or broken business processes.

Additionally, by providing full visibility of activity, network capture aids in the investigation of alerts. For example, if an investigation takes place two weeks after the event, the investigator is still able to view exactly what the user saw and did – screen by screen, and action by action – providing critical context and detail. For example, a log file might tell you that an employee's credentials were used to gather patient information. But network capture could provide the additional information necessary to determine if the action was performed by the employee, or if external malware had hijacked the employee's credentials.

The Profiling Toolbox
Once data has been obtained, the real work of profiling can begin. Mature profiling makes use of a variety measurement and statistical tools, including percentage, high water mark, and Z-score (standard deviation).

Percentage
The most basic profiling tool is using percentage. Percentage is useful when you do not have a lot of deviation in your data: when the normal data is reliable and repeatable. If you graphically portrayed the normal data, it would appear as tightly-knit scatter points. A high or low percentage therefore represents a significant deviation from the norm – a point well outside the range of the regular scatter points.

High Water Mark
The high water mark helps you understand the largest number ever recorded for an activity that did not have negative consequences. For example, if the average number of social security numbers (SSNs) accessed by any one worker is 100 in a given week, but at one point a hospital worker accessed 200 for perfectly valid reasons, then the high water mark is set at 200. It represents unusual activity, but is still acceptable activity. However, if a worker accesses 400 SSNs in a week that would raise a red flag as it would be double the high water mark. Such activity could be a sign of a takeover via malware or compromised credentials.

Z-Score (Standard Deviation)
Perhaps the most important profiling tool is standard deviation. Standard deviation tells you how significant a data point's distance is from the average in comparison to the entire group's deviation for similar data points (rather than to just the average). Standard deviation is particularly useful in cases where data points are normally widely scattered, and where percentages can be misleading.

Consider standard deviation in comparison with percentage. Suppose a nurse accesses an average of 50 patient records per week. If you only used percentage as a profiling tool, you might say, "Any time the nurse accesses 200% of the norm, send out an alert." After all, 200% sounds statistically significant.

However, 200%, or 100 records, might not be statistically significant at all. If the nurse accesses 50 patient records per week on average, she might easily access 10 records one week and 120 the next – the data points are scattered very widely.
Accessing 100+ patients could easily represent normal activity for this worker. Using a percentage as a guide would likely give you a lot of false positives, and require you to investigate alerts that should never have been generated.

Using standard deviation as a tool eliminates this confusion and the false positives that it engenders. Standard deviation recognizes that there is, say, a 60 point spread on either side of 50 that is still completely "normal" for this worker. That's the middle of the bell curve. Another 60 points beyond that would still be within the bounds of acceptable activity. To truly get an "abnormal" reading according to standard deviation, therefore, the nurse might have to access over 170 records in a single week.

Implementing Optimal Profiling
Knowing when to use each of the profiling tools is the key to effective profiling. A solid profiling solution will provide you with guidelines or templates that you can use out of the box, or customize for your organization's needs. These guidelines or templates will define the tools that should be used in various circumstances to identify data that is statistically significant and warrants an investigation. By choosing the right tools, you will prioritize alerts and reduce the likelihood of generating hundreds or even thousands of false positives every month.

But profiling should never be a "standalone" feature. It should be an integrated part of a holistic analytics strategy, working in concert with rules and risk scoring (predictive analytics). Just as mature profiling considers which tool to use in a certain situation, a holistic analytics approach determines which method is most appropriate at any given time, such as rules or profiling.

By implementing a robust profiling system as part of your analytics strategy and feeding it with rich data from network capture, you will be well-positioned to have the best possible insight into your employees, your patients, and your business.

Boaz Krelbaum joined Bottomline Technologies as the General Manager and CTO of the Cyber Fraud and Risk Management line of business, following an acquisition of Intellinx Ltd by Bottomline. Boaz co-founded Intellinx Ltd. and oversaw the US Operations, including responsibility for strategic alliances. In his role as CTO, Boaz was responsible for Research & Development as well as the directions of Intellinx patented technologies. Boaz has over 20 years of experience in software development of middleware, database products and enterprise applications. Boaz holds a B.Sc. Cum Lauda in Mathematics and Computer Science from Tel-Aviv University, Israel and an LL.B. from the Tel-Aviv University's Faculty of Law. Boaz is also a certified lawyer.

The views, opinions and positions expressed within these guest posts are those of the author alone and do not represent those of Becker's Hospital Review/Becker's Healthcare. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them.

Shine a light on the unknown: Accurately profiling patient privacy risk in healthcare

Featured Learning Opportunities

Featured Whitepapers

Featured Webinars