Benchmarking medical laboratory performance: survey validation and results for Europe, Middle East, and Africa

Objectives: Medical laboratory performance is a relative concept, as are quality and safety in medicine. Therefore, repetitive benchmarking appears to be essential for sustainable improvement in health care. The general idea in this approach is to establish a reference level, upon which improvement may be strived for and quantified. While the laboratory community traditionally is highly aware of the need for laboratory performance and public scrutiny is more intense than ever due to the SARS-CoV-2 pandemic, few initiatives span the globe. The aim of this study was to establish a good practice approach towards benchmarking on a high abstraction level for three key dimensions of medical laboratory performance, generate a tentative snapshot of the current state of the art in the region of Europe, Middle East, and Africa (EMEA), and thus set the stage for global follow-up studies. Methods: The questionnaire used and previously published in this initiative consisted of 50 items, roughly half relating to laboratory operations in general with the other half addressing more speci ﬁ c topics. An international sample of laboratories from EMEA was approached to elicit high ﬁ delity responses with the help of trained professionals. Individual item results were analyzed using standard descriptive statistics. Dimensional reduction of speci ﬁ c items was performed using exploratory factor analysis and assessed with con ﬁ rmatory factor analysis, resulting in individual laboratory scores for the three subscales of “ Operational performance ” , “ Integrated clinical care performance ” , and “ Financial sustainability ” . Results: Altogether, 773 laboratories participated in the survey, of which 484 were government hospital laboratories, 129 private hospital laboratories, 146 commercial laboratories, and 14 were other types of laboratories (e.g. research laboratories). Respondents indicated the need for digitalization (e.g. use of IT for order management, auto-validation), automation (e.g. pre-analytics, auto-mated sample transportation), and establishment of formal quality management systems (e.g. ISO 15189, ISO 9001) as well as sustainably embedding them in the fabric of laboratory operations. Considerable room for growth also exists for services provided to physicians, such as “ Diagnostic pathways guidance ” , “ Proactive consultation on complex cases ” , and “ Real time decision support ” which were provided by less than two thirds of laboratories. Concordantly, the most important kind of turn-around time (TAT) for clinicians, sample-to-result TAT, was monitored by only 40% of respondents. Conclusions: Altogether, the need for stronger integration oflaboratoriesintotheclinicalcareprocessbecameapparent and should be a main trajectory of future laboratory management. Factor analysis confirmed the theoretical constructs of the questionnaire design phase, resulting in a reasonably valid tool for further benchmarking activities on the three aimed-for key dimensions.


Introduction
Laboratory performance is a relative concept but nevertheless of prime importance, as are quality and safety in the total testing process [1]. Therefore, repetitive benchmarking appears to be essential for sustainable improvement in health care. The general idea of this approach is to establish a reference level, upon which the status quo may be quantified, and improvement strived for. For patient safety in general the seminal report "To Err is Human" might arguably play such role [2]. And even though the performance of medicine in general has certainly greatly improved over the last decades, patient safety has debatably stayed about the same since its publication.
The landmark follow-up report "Improving Diagnosis in Health Care" examined ways to improve quality and safety in medicine [3]. A central pillar of these efforts is diagnostic quality which rests on laboratory performance in general. The latter influences the majority of clinical decisions, while accounting for only about 2% of direct healthcare cost [4,5]. The last two years of the pandemic have more than adequately demonstrated the cost effectiveness of well-established diagnostic processes on the population level. Laboratory operations have been vital for "rapid and effective contact tracing, implementation of infection prevention and control measures in accordance with national recommendations, and adequate support to the patient", and will probably also be essential for the vaccination phase [6,7].
But even though laboratory operations play a central role in medicine and, compared to other medical specialties may exhibit a high level of standardization, when compared to other industries standardization still appears to lag behind [8][9][10]. In addition to the work on harmonizing external quality assessment schemes, there have been some international endeavors towards benchmarking of overall medical laboratory performance, among others the Q-Probes program of the American College of Pathologists and the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) Working Group on Laboratory Errors and Patient Safety (WG-LEPS). Overall though, the strategic drive towards standardizing medical laboratory operations appears rather low [11][12][13][14].
Even before the pandemic, there appeared to be an interest towards standardizing quality indicators in laboratory medicine, and the role of laboratories during the past two years has certainly further improved upon this. However, the number of laboratories actually participating in international quality indicator schemes has been rather stagnant. Plebani et al. have thus coined the term "quality indicator paradox", describing the discrepancy between the general interest of laboratories to improve on efficiency, quality, and patient safetyand the extent of effective activity to this regard [15].
Clearly, conclusive relations between the high-level concept of general laboratory performance and various underlying concepts need to be established to be able to effectively counter deeply engrained practice challenges [16]. Laboratory management methods have traditionally been developed rather hands-on and are thus not wellestablished in the academic literature. Accordingly, the number of commonly used key performance indicators is limited [17]. A few notable exceptions might be grand totals (e.g. number of patients, number of orders, number of samples), temporal measures (various definitions of turnaround times), and resource measures (e.g. number of fulltime equivalents, laboratory space).
Overall, the literature on medical laboratory performance benchmarking is sparse [18]. Prior to starting the multi-stage initiative that also resulted in this specific study, a comparable approach was published for the Asia-Pacific region, but no publications existed for other regions or the global community [19]. Due to the heterogeneity of healthcare systems, direct generalizability of estimates from one part of the globe to other parts or even from one country to nearby countries, as in Europe, is limited [20]. Thus, we aimed to generate common estimators for the high-level concept of medical laboratory performance with similar data gathering procedures in different countries as a basis for measurably better health care.
To deal with the rather unconstrained situation, a multistage approach was chosen. A questionnaire-based tool was designed to allow for effective and efficient sampling of medical laboratory performance. At first stage, the data elicitation approach should be tested on a pilot sample, the results of which meanwhile have been published together with the questionnaire [18]. At second stage, the questionnaire itself should be validated on a larger samplewhich is the topic of this manuscript. At third stage, which is currently under preparation, insights gained during stages one and two should inform the first global rollout.

Questionnaire
The questionnaire used consisted of 50 items and was identical to that used and published during stage one as described in the introduction [18]. About half of the items were designed to provide a general overview for laboratory operations and the other half were aimed to specifically address the three key dimensions of medical laboratory performance "Operational performance", "Integrated clinical care performance", and "Financial sustainability". The iterative design process considered previous knowledge from the literature and author experience as well as feedback from informal focus groups. The latter consisted of about 20 people each, varying slightly due to availability of participants, and included medical doctors, technicians, workflow experts, biologists, and laboratory directors.

Sample
The target population for this ongoing strategic effort are medical laboratories, the general approach aiming to be vendor independent. Due to considerations of practicality, at this point a convenience sample with a broad range of diagnostic providers was aimed for, including e.g. Abbott, Roche, Siemens, Sysmex, Beckman, Werfen, Biomerieux, Becton Dickinson, Stago, and Diasorin. Abbott representatives in the EMEA region (Europe, Middle East, and Africa) were asked to approach general medical laboratories both with and without Abbott equipment, requesting participation in the study. Given consent by the laboratory, the questionnaire was then filled out online using the platform SurveyMonkey with support of specifically trained Abbott customer representatives. Only where filling out online was not directly possible, the survey was completed in paper form and subsequently entered manually.

Statistical methods
Statistics and visualization were performed using the free software environment R version 4.1.2 [21]. For descriptive statistics, results generally are presented at least as numbers and percentages for nominal scale, median and inter-quartile range (IQR) for ordinal scale and mean and standard deviation for higher measurement levels. A set of 20 items (for the specific items see online Supplemental Material) was selected for dimensional reduction using exploratory factor analysis with OBLIMIN rotation. The resulting allocation of items to factors (see Table 1) was then submitted to confirmatory factor analysis using the R psych package [22].

Results
Overall, 773 laboratories from 39 countries across EMEA (Europe, Middle East, and Africa) responded to the survey, including the initial pilot sample of 65 laboratories from Germany, Austria, and Switzerland [18]. The top 10 countries accounted for 77% (n=597) of responses: Italy (n=126), Russia (n=112), Saudi Arabia (n=90), Spain (n=64), United Kingdom (n=43), Germany (n=42), Greece (n=40), United Arab Emirates (n=35), South Africa (n=24), and Serbia (n=21). About 80% were hospital laboratories, most of them from governmental hospitals (n=484), about a fifth were private hospitals (n=129), the rest were commercial laboratories (n=146) and some other laboratory types (n=14), e.g. research laboratories (see online Supplemental Material, section Q1). Commercial laboratories were defined as medical laboratories "that are not associated with hospitals or healthcare facilities and that often provide a broad range of services over a wide geographical area" [23].
For further analysis, the 14 "Other" laboratories were excluded due to relevant differences between the three main categories of laboratories and other types of laboratories, resulting in a sample size of 759 for the remainder of this publication. As a brief statement on laboratory sizes surveyed: government hospital laboratories served on average 1,353 patients per day, private hospital laboratories served on average about half this number (625 patients per day), and commercial laboratories served about a quarter more than government hospitals (1,645 patients per day). Results beyond those presented in the body of this manuscript can be found in the online Supplemental Material; for the concrete numbers presented e.g. for patients per day here see online Supplemental Material, section "Items 1-10", subsection "Q4 patients per day".

Operational performance
Items subsumed under the header "Operational performance" during survey design roughly correspond to items 11 through 24 in the published questionnaire. Overall, there was a rather low use of formal quality improvement programs in all three categories of laboratories, such as LEAN Six Sigma or Activity Based Costing (ABC) compared to informal approaches. For the latter, continuous development programs for employees were applied most frequently, followed by survey-based approaches (patient satisfaction survey, clinician satisfaction survey, employee satisfaction survey, descending in that order; see subsection "Q11 Laboratory improvement program" in the online Supplemental Material). In this Benchmark survey, 10 key performance indicators (KPIs) for operational performance were assessed in addition to turn-around time (see subsection "Q12 Key Performance Indicators" in the online Supplemental Material). These were: employee productivity, workspace utilization, extent of auto-validation, reduction of consumable waste, reduction of expired reagents, return on investment (ROI), instrument noise levels, systems uptime, rerun rates, and blood smear review rates. Reduction of expired reagent stock was most commonly measured as KPI with more than half of all participating laboratories actively engaged. Systems uptime/downtime and rerun rates were measured slightly less often, coming in as second and third most frequently measured KPIs. The measurement of workspace utilization and ROI was indicated by only around 10 to 20% of participating laboratories, all other KPIs were actively measured by about one third of the laboratories. Overall, measurement of KPIs was more common in the commercial than the hospital sector, in particular the use of ROI calculations (e.g. Total Cost of Ownership, TCO; Total Value of Ownership, TVO).
Turn-around time (TAT), as the most frequently monitored KPI, was additionally examined in more detail. "Lab TAT" (see Figure 1 in [18]) was by far most often monitored (80% of the laboratories using TAT as KPI; Figure 1), followed by "sample-to-result TAT" and "analytic TAT" that were monitored by about half of respondents. Compared to the overall average, the proportion of private hospital laboratories and commercial laboratories using sample-toresult TAT was higher (about 50% of respondents) than for government hospital laboratories lower (about 30% of respondents). All other types of TAT were monitored by about one third of the laboratories or less, varying slightly across the different lab types.
Regarding frequency of TAT monitoring, laboratories appeared to prefer either real-time or monthly retrospective procedureseach of the two categories indicated by about one third to half of respondents, depending on laboratory type (see subsection Q16 in the online Supplemental Material). Considering that the majority of laboratories indicated the usage of TAT as KPI but less than half of them regularly reviewed TAT, frequency of TAT monitoring appears to be of minor importance. The monitoring of TAT for very specific assays (Cardiac, Renal, Liver Blood gas, Blood count or PCR) was positively indicated by more than 55% of responding hospital laboratories compared to about 40% of responding commercial laboratories (see subsection Q21 in the online Supplemental Material).The laboratory information system (LIS) was engaged in the measuring of TAT in half of the laboratories that used TAT as KPI. More often the LIS was used for basic IT functions like result reporting, age/gender related rules and basic statistical reporting (in descending order). Advanced IT solutions on the other hand were not as widely used. Most common was the monitoring of reagent supply, with 37% of the government and private hospital laboratories compared to 48% of the commercial laboratories engaged, with "Audit trail end to end traceability for reagents controls and consumables" and "Sample location tracking throughout the lab" a close second and third, respectively. Information Dashboards to display KPIs as well as Smartphone related alerts and applications were used by less than 20% of the laboratories, with commercial laboratories using "Smartphone related alerts and applications" about double the rate of the hospital sector (see Figure 2). In the commercial sector advanced IT solutions were generally more widely used compared to the hospital sector, with 40-60% of the commercial laboratories using Audit trail end to end traceability for reagents controls and consumables, monitoring reagent supply, tracking sample location throughout the lab and using Smartphone related alerts and applications, compared to about a third of hospital laboratories (Figure 2).

Integrated clinical care performance
Two thirds of all laboratories provided services above and beyond simple measurements to physicians. Among these were (in descending order, see Figure 3) alerts, results and interpretations, reflexive test suggestions and proactive consultation on complex patient cases. Real time decision support and diagnostic pathway guidance appeared to be applied by fewer laboratories. In addition, the proportion of private hospital laboratories providing these services was still smaller compared to government hospital and commercial laboratories, especially when comparing reflexive test suggestions.
Around 80% of hospital laboratories (87% for private hospital laboratories, 78% for government hospitals) participated in various diagnostic committees, commercial laboratories trailing with 57%. Similar proportions across the laboratory types were found for regularly reviewing guidance given for diagnostic pathways, with annually reviews being the most common. Markedly, about a third (28%) of commercial laboratories never reviewed diagnostic pathway guidance.
About half of the government hospital laboratories measured appropriate testing to avoid over-and underutilization, which is more common compared to the private hospital sector (38%) and commercial laboratories (35%). About a third of laboratories combined data digitally with other disciplines (e.g. imaging, histopathology, clinical functional tests). Only about a quarter of laboratories attempted to directly measure patient outcomes, while general feedback regarding high-quality laboratory performance was gathered more frequently (by about two thirds of respondents) using patient satisfaction surveys, clinician satisfaction surveys, and employee satisfaction surveys.

Financial sustainability
The most common certification/accreditation was ISO15189 and ISO9000. Less than 10% of the laboratories were accredited according to JCIA (Joint Commission International Accreditation) or CAP (College of American Pathologists) and only four laboratories used Medical Laboratory Evaluation (MLE). The implementation of international certification/accreditation varied across countries. In United Kingdom and United Arab Emirates, and South Africa only about 20% of laboratories had no international certification/ accreditation out of the set of ISO15189, ISO9000, CAP, and JCIA (see Figure 4). In contrast, in Russia and Serbia about 80% of laboratories had none of the certifications/ accreditations mentioned above.
Not only the general approach to sustainable management differed widely between laboratories, but also the amount of digitalization aiding in process optimization. The proportion of private hospital laboratories using autovalidation was lower (about a third) for clinical chemistry, immunoassays and haematology compared to the other two laboratory types (about half). A fully electronic test ordering process was established for only 26% of hospital laboratories and only 16% of commercial laboratories.
Staff productivity, on the other hand, was higher in the commercial sector, processing on average about 20% more tubes per full-time equivalent (FTE) than government hospital laboratories and even 70% more than private hospital laboratories. Staff productivity was highest for all laboratory types in clinical chemistry, immunoassays, and haematology ( Figure 5).
Pre-analytical automation was used by 45% of government hospital laboratories, followed by 33% of commercial laboratories, and private hospital laboratories lagging behind with only 22% using pre-analytical automation (see online Supplemental Material, section Q42). Almost two thirds of the commercial laboratories (60%) indicated that they work with integrated (and individual) clinical chemistry (CC) and immunoassay (IA) instruments in the same laboratory, compared to 50% of the hospital laboratories (see online Supplemental Material, section Q44). Altogether, 11% of the participants maintained dedicated IA and CC laboratories.

Factor analysis
Exploratory factor analysis was calculated using three factors as suggested by design and screenplot (see online Supplemental Material, section "Factor analysis", first plot in subsection "Exploratory factor analysis"), see Table 1. Loadings suggested combining to factor 1 (termed "Operational performance") the items 11 (laboratory improvement program), 12 (key performance indicators), 15 (type of TAT monitored), 16 (frequency of TAT measurement), 21 (TAT monitored for very specific assays), 34 (use of basic IT solutions), and 35 (use of advanced IT solutions). Factor 2 was termed "Financial sustainability" and comprised items 9 (certification/accreditation), 13 (use of auto-validation), 33 (percent of ordering in digital form), 43 per 36 (primary tubes clinical chemistry per full-time equivalent), 43 per 39 (primary tubes clinical chemistry per square meter), 42 (use of pre-analytical automation), and 44 (integration). Factor 3 was termed "Integrated clinical care performance" and comprised items 25 (services provided to physicians), 27 (diagnostic committees), 28 (review of diagnostic pathways), 29 (utilization management), 30 (combining data digitally with other disciplines), and 31 (measurement of patient outcomes).
Correlations were 0.28 between subscales "Operational performance" and "Integrated clinical care performance", 0.18 between subscales "Operational performance" and "Financial sustainability", and 0.07 between subscales "Integrated clinical care performance" and "Financial sustainability". All three subscales exhibited distributional symmetry to a reasonable degree, with the third one hinting at bimodality and positive skewness ( Figure 6). Projections to all three subscales can be found in Figure 7, with the median for each subscale set to 100 and about half of observations for each subscale falling in the range of 80-120.

Discussion
Medical laboratories routinely use transversal and longitudinal comparison to interpret individual patient results. Indeed, the use of both is deemed indispensable for quality and safety in clinical medicine [24]. Widening the focus from the individual patient to the individual laboratory, the corresponding tools of external quality assessments and quality control in general are not as standardized as one might wish. Widening the focus even more from individual parameters to a high-level view of laboratory operations, corresponding benchmarking data is almost non-existent. To begin to alleviate the lack of comparability is the goal of this ongoing effort.
The results of this study can be viewed from two perspectives: firstly, 773 laboratories actively shared performance data which is described extensively in the online Supplemental Material; secondly, the data available allowed to validate to some degree the general process and specific questionnaire used for data elicitation. One might argue, that at this point the latter might even be more important than the former because of the necessity to establish a streamlined process for long-term longitudinal observation. Indeed, even for the relatively simple and generally accepted tool of external quality assessment (EQA) programs, legal obligation often appears to be necessary for continued participation [25].
The future path for this endeavor has been outlined to some degree in the introduction: following the pilot study for the region of Germany, Austria, and Switzerland, the current activity widened the geographic region under consideration to all of Europe, the Middle East, and Africa. But since only a global estimate can truly benchmark the laboratory community as a whole, global rollout certainly needs to be the next step. In this spirit, the questionnaire used on the one hand has been shown to have a reliable core for its subscales "Operational performance", "Integrated clinical care performance", and "Financial sustainability"; on the other hand, some items lack discriminative power and thus need to be further examined for the next survey round. As questionnaire validation is an ongoing endeavor, for benchmarking activity to stay relevant the set of items needs to have not only stability of a core set of items but also some relevance for transient phenomena, as the current pandemic.
Considering the latter and the necessity of delivering not only accurate but also timely results in order to be able to curb the spread of the virus, one might be surprised that only 567 (75%) medical laboratories use TAT as KPI.
Indeed, quality has traditionally been the main focus of medical laboratories, up to the degree that no result is better than the wrong result, while speed has traditionally been the focus of clinicians [26]. And while the response of the laboratory community to the challenges of the pandemic has been impressive, the continued need for stronger involvement of the laboratory in the entire diagnostic-therapeutic cycle appears evident.
To this regard, the current trend towards international certification (e.g. ISO 9001) or accreditation (e.g. ISO 15189) needs to be strengthened [27,28]. In addition, the need for quality should be balanced with the need for timely results delivery [29]. As a minimal requirement, each and every laboratory should use TAT as a KPI, preferentially not only one kind of TAT but several complementary measures, as "pre-lab TAT" and "lab TAT" and its combination as "sample-to-result TAT" (see Figure 1 in [18] and Figure 1). A fully electronic test ordering process may be a crucial step in this direction, but today is only used by 176 (23%) medical laboratories responding to the survey.  Each point represents one individual laboratory on the three subscales "Operational performance", "Integrated clinical care performance", and "Financial sustainability"; its approximate positioning in relation to others is identifiable by value or color for the lower quartile (<80 or red), middle 50% (80-120 or amber), or upper quartile (>120 or green).
As can be expected by design and seen in Figure 2, the subscales of "Operational performance", "Integrated clinical care performance", "Financial sustainability" correlate to some degree. On a similar trajectory, one might ponder the correlation of adherence to valid quality control schemes, success in EQA rounds and laboratory performance in general [28]. The former two might be furthered via legal obligation driving more frequent and informative EQA participation and availability of public databases. The latter would be expected to be rather market driven but nevertheless critical for a high level of quality and safety in the entire diagnostic-therapeutic process. Indeed, it is time to start discussions in the medical community and with insurance providers whether data enrichment and additional services provided by the laboratory might contribute to the overall better value and thus merit additional remuneration for the laboratory.
The typical cost of laboratory service amounts to only about 2% of total healthcare expenditures, so there is little opportunity to optimize cost in the laboratory per se. But the upstream and downstream (i.e. preanalytic and postanalytic) phases of the total testing process provide tremendous opportunity [30]. Initial steps to this regard would add value to the services provided to physicians that are currently offered by only about two thirds of laboratories, e.g. reflexive test suggestions, proactive consultation, real time decision support, and diagnostic pathway guidance. Follow-up activities might use even more sophisticated systems to target human attention, as it has been shown that this approach increases safety in highthroughput environments [31,32]. Indeed, complex challenges might currently better be approached by combining human and computer reasoning than by any of the two approaches alone [33].
Of course, there are several limitations of this study. One of them is the data generating process. Due to resource considerations, a representative sample could not possibly be achieved. Thus, it was opted to go for a convenience sample and strive to optimize on survey completion rates. Consequently, typical measurement quality criteria of objectivity, reliability, and validity rate probably less than ideal. This might be enhanced by social desirability bias, as humanhuman interaction was chosen as the main interface for data elicitation. Altogether, however, the instrument developed should work as a good intermediate step towards global medical laboratory performance benchmarking [34].
To conclude, this study presents a first high-level view on laboratory performance benchmarking with a focus on management for the EMEA region and serves as a starting point towards global benchmarking to this regard. In combination with the set of quality indicators already established by WG-LEPS of the IFCC, this has the potential to improve overall laboratory performance. As quality and safety in medicine in general and diagnostics in particular are relative measures, benchmarking is of prime importance to internationally establish a minimal standard of service. Indeed, standardization is highly relevant because otherwise participants tend to overestimate their performance, as can be highlighted by the 98% of laboratories in this study rating their overall operational performance at or above industry average when directly prompted (see Q8 in the online Supplemental Material). Of course, benchmarking is an ongoing endeavor, as is questionnaire validation. Thus, future studies are needed to further improve benchmarking on a global level.