Choosing the right analytics tools for healthcare

Yogesh Parte - Wednesday, October 10th, 2018

With the big data boom in recent years, the number of analytics tools and platforms has grown exponentially to meet the demands of the healthcare industry.

Technology has emerged to enable healthcare organizations to collect and analyze vast amounts of data to address complex problems related to image analytics, genomics and real-time data from patient sensor devices. The last decade has been dominated by the challenges of analyzing unstructured and semi-structured data from clinical notes, claims, and medical images, including X-rays, CT scans and MRIs. Now, the focus has shifted to integrating the full spectrum of structured, semi-structured and unstructured data.

Data Characteristics Drive Solution Evaluation
As technology continues to evolve, healthcare organizations can benefit from new analytics approaches and tools to meet their unique and complex analytics needs. However, to leverage these tools effectively, healthcare analytics professionals and technology decision makers must have a clear understanding of the nature of data across their ecosystem. There are three key aspects to consider:

• Nature (structured, semi-structured, unstructured)
• Complexity (in terms of dimensions and attributes)
• Volume

While many other variables play their part, the choice of analytics tools depends largely on nature, complexity and volume of data. Analyzing simple structured data, such as claims, is relatively straightforward and does not require complex system configurations or extensive processing power. Conversely, complex CT scan data requires analytics platforms with more computing resources for faster processing and reduced latency.

Next, data can be classified into distinct categories based on complexity (high or low) and volume (high or low), as shown in the quadrants below.

Optimizing Your Analytics Technology Stack

Quadrant 1: Low Complexity and Low Volume
Sample Use Case: Claims data is a good example of healthcare data with low complexity and relatively low volume. Performing analysis, such as denials management or evidence-based reimbursements, does not require significant resources.

If claims data is linked with other data sources such as clinical, admin, socio-economic data, then open-source environments can be used for statistical analysis of such data sets. Such tools provide added benefits to advanced analytics and machine learning problems and can include an easy-to-use user interface (UI) that helps users perform exploratory analysis and build machine learning models.

Quadrant 2: Low Complexity and High Volume
Sample Use Case: When millions of claims are processed spanning inpatient, outpatient, pharmacy, and lab services, such as in a large health system e.g. Blue Cross Blue Shield, it generates huge volumes of data to be analyzed and processed.

Organizations should choose platforms that perform distributed and parallel data processing to generate faster results. Consider tools that deliver faster computations by implementing computations in memory.

Quadrant 3: High Complexity and Low Volume
Sample Use Case: Complex data, such as clinician notes from electronic health records (EHRs) , require significant data processing and transformation. However, these are considered low volume, compared to large, unstructured datasets.

The options in which organizations can choose from present speed tradeoffs based on whether the computations are done in memory and overhead from creating Open Database Connectivity (ODBC) connections.

Quadrant 4: High Complexity and High Volume
Sample Use Case: Diagnostic systems like CT and MRI scanners, clinical texts and genomics generate huge volumes of unstructured data. Analyzing large, unstructured data sets is simply not practical on a single CPU. These data sets require Graphical Processing Units (GPUs) with thousands of Compute Unified Device Architecture (CUDA) cores for computation, far beyond to the dual-to-10 core of a CPU.

Organizations should choose a solution that supports numerous libraries and deep learning algorithms, including those that are used for image analytics, from a CT and MRI. Analytics of unstructured texts generated from clinical notes can be processed using Recurrent Neural Networks (RNN).

Match Data Analytics Tools to Your Goals
The need for healthcare organizations to perform accurate analysis and gain better insights from all data sources, including clinical, claims, EHRs, diagnostic systems, sensors, genomics and more, will only continue to increase. Technology continues to evolve to meet these demands, enabling healthcare organizations to take advantage of tools best-suited for specific objectives.

For example, data security or PHI (Protected Health Information) is a critical concern for healthcare applications and is usually taken care of by ensuring patient data is handled by following HIPAA compliance requirements. In essence, HIPAA is nothing but a checklist that does not depend on any programming language or framework. HIPAA compliance of models developed in open source tools can be ensured by adopting best practices, security policies and frameworks.

Ultimately, as healthcare organizations become more sophisticated in the way they create, manage and analyze data, they need to develop a clear and structured roadmap for all their analytics needs. As the data environment grows in volume, variability and complexity, choosing the right set of analytics tools for various healthcare use cases, data sets and end-user needs will become a key success factor in the long term.

Yogesh Parte is a senior data scientist for CitiusTech.

Choosing the right analytics tools for healthcare

Featured Learning Opportunities

Featured Whitepapers

Featured Webinars