Benchmarks aren’t as helpful as you might think

Spanning clinical, financial and operational metrics, benchmarks are published on virtually every aspect of the healthcare system in the United States.

They often drive the ranking of hospitals along various dimensions and are even factored into the annual bonuses for health system leadership teams. Naturally, benchmarks receive a fair amount of attention -- possibly too much.

Benchmarks can be useful, but only at the highest level of abstraction or a finely detailed “micro” level. For instance, a benchmark at the highest level that compares the operating margin of similarly-sized hospitals in zip codes with comparable socioeconomic demographics could provide a useful insight into the operational efficiency of each hospital relative to its peers.

Similarly, a benchmark about the readmission rate for patients undergoing a very specific procedure at these hospitals could highlight differences in practices that might be worth exploring.

Metrics in between, however, such as average length of stay (LOS), reimbursement per procedure, utilization of the operating rooms, etc. are subject to a myriad of factors and “adjustments” which render the information moot.

Here we will drill into benchmarks for operating rooms in particular to highlight the intricacies that prevent mid-level benchmarks from being useful. On our own blog, we will share our perspectives on how these metrics could be used to lead to a more productive determination of the specific actions that could be taken.

Why some OR benchmarks aren’t useful
Operating room metrics and benchmarks attract special attention from health system executive teams, as ORs are the financial engine for a hospital and generate a sizable proportion of overall revenue each year. But these figures are in fact unhelpful in predicting or managing those profits, for a range of reasons.

1. Non-standard metric definitions: The definitions of the key metrics are more nuanced than one might expect. Take OR utilization:

  • For the numerator, most hospitals calculate adjusted utilization, rather than raw utilization, meaning they include setup and/or cleanup minutes into the numerator. However, it’s tricky to determine exactly how the setup or cleanup minutes should be included in the numerator, and even when it’s standardized, credit adjustment would inadvertently favor those with shorter case lengths.
  • The denominator has its own set of complexities. Some hospitals define a valid room as any physical room equipped to do surgeries, while others count it only if they have the proper staff to run it. The latter would count fewer rooms than the former because most of the time, some of the ORs may be closed due to insufficient demand or staffing.

There are many more such nuances to any metric in the perioperative management world. Without a deep understanding, and some experience on how sensitive these metrics are to varied definitions and parameter configurations, one can be easily misled.

2. Lack of context: Benchmarks often fail to put the measured number in its full context. For example, the turnover time between cases in an OR is defined to be the time taken between the previous patient leaving the OR up to the time that the subsequent patient is placed in the OR. The turnover metric and a comparison with a benchmark doesn’t show the why and the how. The turnover may be lengthy if the prior case finished earlier than planned and yet the subsequent team couldn’t start their case earlier. The turnover may also be longer than expected if the surgeon, some staff, or even the patient was late. The benchmarks alone won’t help uncover the reasons behind an under-performing metric.

3. Lack of actionability: The third problem with benchmarks is that it is difficult to establish a clear action to be taken simply by comparing a generic metric to a benchmark. The mere existence of the benchmark may not result in the team doing anything materially different. Following the turnover example, if a perioperative team struggles with poor turnover metrics, it may work on finding out the reasons which drive their improvement initiatives. However, such investigations should happen regardless of the existence of a benchmark.

Finally, human beings tend to have a predictable irrationality on attribution bias. If one sees they are underperforming on a specific benchmark, they tend to believe that external environmental factors led to this, e.g. our patients are older or sicker than the average, our procedures are more complex, our cleaning standards are higher, our EVS teams cover a wider geographic footprint, etc. Meanwhile, if their metric puts them ahead of their peers, they tend to believe this is due to their own superior management practices. Such inherent bias puts people in either a defensive or a complacent position when they are presented with any benchmarks which doesn’t help create an urgent drive toward continuous improvement.

Learn more at

Copyright © 2023 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.


Featured Whitepapers

Featured Webinars