Method evaluation in the clinical laboratory

Method evaluation is one of the critical components of the quality system that ensures the ongoing quality of a clinical laboratory. As part of implementing new methods or reviewing best practices, the peerreviewed published literature is often searched for guidance. From the outset, Clinical Chemistry and Laboratory Medicine (CCLM) has a rich history of publishing methods relevant to clinical laboratory medicine. An insight into submissions, from editors’ and reviewers’ experiences, shows that authors still struggle with method evaluation, particularly the appropriate requirements for validation in clinical laboratory medicine. Here, we consider through a series of discussion points an overview of the status, challenges, and needs of method evaluation from the perspective of clinical laboratory medicine. We identify six key high-level aspects of clinical laboratory method evaluation that potentially lead to inconsistency. 1. Standardisation of terminology, 2. Selection of analytical performance specifications, 3. Experimental design of method evaluation, 4. Sample requirements of method evaluation, 5. Statistical assessment and interpretation of method evaluation data, and 6. Reporting of method evaluation data. Each of these areas requires considerable work to harmonise the practice of method evaluation in laboratory medicine, including more empirical studies to be incorporated into guidance documents that are relevant to clinical laboratories and are freely and widely available. To further close the loop, educational activities and fostering professional collaborations are essential to promote and improve the practice of method evaluation procedures.


Introduction
Method evaluation is one of the critical components of the quality system that ensures the ongoing quality of a clinical laboratory. Yet, it is one of the tasks that laboratories still find challenging to perform [1][2][3]. This challenge may be contributed to by a lack of sufficient understanding of the principles underlying method evaluation, varying requirements from different local regulatory agencies and the availability of different guidelines and not always directly applicable to laboratory medicine [1,2]. The lack of standardisation in the requirements and guidance leaves this important task susceptible to subjective interpretation and selective implementation.
As part of implementing new methods or reviewing best practices, the peer-reviewed published literature is often searched for guidance [4]. In this context, clinical laboratory medicine journals have a supporting role in appropriate method evaluation through their acceptance of manuscripts that have used guidelines appropriate for laboratory medicine. Moreover, high-impact clinical laboratory journals effectively play a leadership role for the profession by providing, through their publications, guidance and creating a forum for discussion/debate. As such, the Journal of Clinical Endocrinology and Metabolism's mandate that manuscript data related to sex steroids requires measurement by mass spectrometry, provided guidance, raised debate and defined acceptance criteria for publication [5,6]. Likewise, Clinical Chemistry and Laboratory Medicine (CCLM) has led the discussion on defining quality in laboratory medicine, with a few examples cited here [4,[7][8][9][10][11][12][13][14][15][16][17][18][19].
From the outset, CCLM has a rich history in publishing methods relevant to clinical laboratory medicine. Looking at the first issue successive decades, we see CCLM's clear role in supporting appropriate methods: in 1963 (the first issue of the journal), there were a number of methods, including a classic recognised method for plasma steroids [20]; in 1973 there is an article on standardisation of temperature for enzyme assays [21]; in 1983 the recognition of interference in a bilirubin spectrophotometric assay [22]; and in 1993 pre-analytical variation, biological variation, analytical variation and references ranges [23,24]. Using keywords of method evaluation and method validation, a PubMed search revealed over 2000 articles published by CCLM since the millennium. An insight into submissions, from editors' and reviewers' experiences, shows that authors still struggle with method evaluation, particularly the appropriate requirements for validation in the space of clinical laboratory medicine.
Here, we consider through a series of discussion points an overview of the status, challenges, and needs of method evaluations from the perspective of clinical laboratory medicine.

Defining method evaluation
Method evaluation (as either validation or verification) is the systematic execution of laboratory experiments to collect and analyse objective data to characterise the analytical performance of a laboratory method to ensure it is fit for its intended clinical purpose [25]. This evaluation usually occurs before the formal implementation of the method, but can also occur post-implementation, which is based on specific needs, i.e.:

Pre-implementation
In general, there are regulatory and/or accreditation requirements to perform method evaluation before a laboratory method is implemented into routine clinical practice. Thorough method evaluation processes provide valuable insights into the behaviour of the method under routine operational conditions that can inform other areas of laboratory quality planning, such as quality control practices. It also serves as the baseline performance the laboratory should, at the very least, strive to maintain.

Post-implementation
Method evaluation processes may also be performed after a laboratory method is implemented as part of a troubleshooting strategy when analytical issues are encountered. Under this scenario, the objective data can assist laboratory practitioners in identifying the likely cause(s) of failure. When performed following resolution of the issue, the evaluation procedures assure that the performance of the laboratory method has been restored. A prominent recent example of the need for post-implementation evaluation is the biotin interference in immunoassays using biotin-steptavidin interaction, whereby the recent clinical practice of prescribing or self-medicating with high doses of biotin led to analytical interference in previously well-established laboratory methods [26,27].
Method evaluation can generally be divided into two parts, method validation, in which the performance of a laboratory method is primarily established. Secondly, method verification, in which the performance of a laboratory method is verified against an established claim, is generally provided by the manufacturer [28][29][30]. In practice, this distinction confuses the boundaries of experiments required for verification of a commercial kit vs. the more extensive needs for validation of laboratory-developed tests. A brief explanation of the differences is provided below, along with the components examined for each part in Table 1.
Method validation should establish the longitudinal performance of a laboratory method under diverse operational conditions to capture all sources of variability with subsequent incorporation into baseline performance characteristics. To facilitate capture of the underlying method variability, this may involve but is not limited to evaluating over a successive number of days, across multiple instruments and laboratory staff, a large number of replicates or samples, varying environmental conditions and finally, multiple calibrators and reagent lot changes. Method validation is a highly resourceintensive process, requiring significant staff time and access to patient samples. It is generally performed when establishing a new laboratory method or when there is a significant change to the method, such as a change in reagent formulation.
Method verification can be considered as an abbreviated version of method validation that the end-user laboratory performs. Although it may be less resource intensive than a method validation exercise, method verification can nonetheless be burdensome to an end-user laboratory with modest resources and time constraints. When implementing a new laboratory automation system or new generation platform, a large repertoire of laboratory methods may be required to be verified simultaneously.
There are several key components when undertaking method evaluations ( Figure 1). Precision, accuracy, patient sample comparisons, linearity (measurement/reportable ranges) and analytical sensitivity are generally well accounted for in method evaluation processes [1,2]. On the other hand, analytical specificity, sample carryover, dilution recovery and collection device verification may not always be performed. In this latter component, it is commonly assumed that the performance of a laboratory method is transferable between different manufacturers of blood tubes containing a similar preservative/anticoagulant. However, this may not always hold, as differences in components and materials used in collection devices can significantly impact laboratory analysis [26,31,32].
A central part of method evaluation often overlooked in manuscripts submitted for publication in clinical laboratory medicine journals is the inclusion of peer comparison data through external quality assurance (EQA) providers. In our collective experience as peer reviewers, this occurs too frequently. EQA is considered central to evaluating method performance and a requirement of laboratory accreditation [7]. Whilst EQA does not exist for a method developed to measure a new biomarker, participation in EQA programs assists clinical laboratories in assessing and monitoring their long-term accuracy. As such, the inclusion of EQA performance as part of the published method (when available), we suggest, supports the robustness of the evaluation and the likelihood of reproducibility of the results if implemented by another laboratory.

Discussion points
In this section, we consider six key high-level aspects of clinical laboratory method evaluation that potentially lead to inconsistencies.

Standardisation of terminology
The concept and practice of metrology have developed differently under various organisations related to the clinical laboratory, such as the Clinical and Laboratory Standards Institute (CLSI), International Organization for Standardization (ISO) and the European Committee for Standardization (EC). Consequently, the definitions and terminology used for method evaluation vary in the guidance documents produced by these organisations and change as concepts evolve.
For example, in making analytical performance claims, manufacturers use a variety of terms for detection capabilities such as sensitivity, analytical sensitivity, minimum detection limit, the lower limit of detection, the limit of blank, biological detection limit, limit of detection, functional sensitivity, the lower limit of quantitation and limit of quantitation [33]. The CLSI guideline EP17-A2 recommends standardising terminology, including statistical definitions for detection capabilities with the use of the following three terms: the limit of blank (LOB), the limit of detection (LOD) and the limit of quantitation (LOQ) [34].
Therefore, there is an urgent need to harmonise the terms to facilitate the global application of standards and guidelines.

Selection of analytical performance specifications
Analytical performance specifications define the boundary within which a laboratory method should perform [35][36][37][38][39]. Analytical performance specifications for method evaluation should be defined a priori. This ensures that the clinical requirements and context of the laboratory method are considered when determining the performance specifications.
Analytical performance specifications can be determined from a hierarchical model based on clinical outcomes, followed by biological variation and finally, state-of-the-art [40]. Analytical performance specifications can also differ under varying clinical scenarios [36]. The decision for selecting an analytical performance specification should be defined and documented, so that the rationale can be referenced and considered by other laboratory practitioners and clinical end users [2]. Analytical performance specifications are not selected to fit the observed method evaluation data obtained as they may not meet the clinical needs. Other statistical assessments of laboratory analytical performance may include the total error (with the calculation of sigma metrics) or measurement uncertainty. These have recently been discussed in CCLM and elsewhere, including the recent opinion paper from the editors of CCLM [9].
The current variation and type of analytical performance specifications create inconsistencies in the interpretation of method evaluation data and, whilst academic in nature, adds to the confusion of method evaluation acceptance criteria [41].

Experimental design of method evaluation
The experimental design is the protocol by which the method evaluation is carried out. Many different protocols can exist for a particular component(s) of method evaluations [2]. A recent multi-country survey showed wide variation in the practice of method evaluations in clinical laboratories [2]. These laboratories use a mix of protocols consistently spilt across internally developed protocols (26%), national/international guidance documents (28%) or in combination (22%).
For example, precision profiles of quantitative laboratory methods can be established by repeated testing over a varying number of days, the number of runs per day and the number of replicates per run. Often underappreciated, experimental design has a large impact on the false acceptance and rejection rates of manufacturers' performance claims, and consideration should be given to an objective approach for the intended clinical use case [42].
There are many guidance documents available on method evaluation. These documents are often prescriptive, may have differing approaches, lack detail when describing the underlying principles for the recommendation, and some are likely to be less aligned to optimal method evaluation principles. There is often also a lack of robust objective data showing the clinical performance (false acceptance rates, false rejection rates) of the recommended protocol, nor the impact of varying specific components in experimental design, such as reducing the number of samples or replicates on the outcome of the method evaluation, particularly experimental power.
This lack of evidence-based information makes it challenging to determine the optimal experimental design that balances clinical performance and resource availability. As a result, a method evaluation may be performed purely to satisfy regulatory requirements without sufficiently assessing the fitness for clinical purposes.

Sample requirements of method evaluation
Additional constraints and evaluation needs may exist in specific population groups and sample types. It is generally desirable to use clinically relevant and matrixmatch material when performing method evaluation to ensure the performance data is commutable with patient samples. However, this may not always be entirely feasible. Non-matrix matched materials may be used to overcome difficulties in obtaining large volumes of samples (e.g., for an extended precision study of uncommon sample types such as pancreatic cyst fluid), samples with extreme analyte concentrations and samples with unstable analytes (e.g., blood gases).
As an example, it is often necessary to pool patient samples together or enrich the analyte concentration through spiking or altering the sample matrix to remove undesirable constituents and/or improve stability. These manipulations are a pragmatic solution to produce materials that meet the analytical and operation requirements for method evaluation. However, they often alter the sample matrix to the extent that these materials may no longer be commutable with patient samples. In addition, the pooling of samples may reduce the impact of individual sample-specific effects and artificially reduce analytical variability [43].
The capability to analyse specialised sample matrices, such as dried blood spots and salivary samples, has greatly expanded [44]. Compared to more conventional matrices such as serum, there are additional important considerations with the method evaluation study design and acceptance criteria for clinical applications. Examples include specific evaluation of collection variability, sample-to-sample variability, sample homogeneity, punch point effect, sample stability, matrix effect, and extraction recovery [45].
Through guidance and education, an improved understanding of the analytical and clinical critical components associated with method evaluation for specific sample types and population groups is strongly encouraged.

Statistical assessment and interpretation of method evaluation data
Experimental design and subsequent statistical analysis are intimately linked and must be defined before commencing the evaluation procedure. The experimental design plays an important role in ensuring the method evaluation produces results suitable for an unbiased and objective statistical analysis. Like the experimental design, many different statistical assessments can be performed on method evaluation data.
Different statistical techniques applied to the same data set can lead to different conclusions. At the same time, the optimal statistical approach may depend on the experimental design and characteristics of the observed results. For example, a dataset that conforms to Gaussian assumptions may require a different statistical approach from one that does not. Hence, it is imperative that the correct statistical approach is defined and appropriately applied. This requires an understanding of the underlying principles and assumptions of the statistical technique.
There is a need for more publications that objectively assess the different statistical approaches to provide guidance on which method may be optimal under difference circumstances (e.g., expertise, resources). Additionally, there is also a need for more accessible statistical tools, either using common statistical software such as Excel or freely available online statistical tools such [42,46]. While R Project is free statistical software, this requires some time to achieve proficiency. However, the work of Roche Diagnostics Biostatistics division should be noted in the development of free add on packages such as the Method Comparison and Regression (mcr) and Variance Components Analysis (VCA) packages, which laboratories may find useful in their method evaluation procedures [47,48].
A recent and long-awaited publication in CCLM guides statistical analysis in laboratory medicine. This document is important for laboratory medicine professionals as many guidance documents lack clear descriptions of the statistical principles, assumptions, and limitations of the statistical approaches. This impedes the application of the appropriate statistical approach in method evaluation procedures. To further compound the challenge, many laboratories may not have access or the expertise to operate more complex statistical software packages, with costs of some custom software being beyond the reach of laboratories in limited resource settings [16].

Reporting of method evaluation data
Many method papers published in clinical laboratory journals use guidelines that are not necessarily applicable to laboratory medicine, e.g., the US Food and Drug Administration guideline that is intended to validate drugs and biologicals. This may be partly related to the lack of appropriate guidance documents developed by clinical laboratory professionals that are freely available or open access. Another consequence of such a lack of guidance documents is the wide variation in reporting method validation findings for clinical laboratory publications (including new and emerging technologies) that limits the ability to reliably reproduce these findings. Following on, this may lead to the suboptimal application of a measurement procedure and the recommendation of measurement procedure requirements into international clinical guidelines.
Additionally, there is a lack of standardised reporting of method evaluation results by manufacturers. This lack of transparency impedes the ability of the laboratory to select the laboratory method that meets their clinical needs. It also hinders the ability to verify the performance of the method in the laboratory against the manufacturer-derived performance benchmark. Worryingly, there is a trend from manufacturers to provide 'design' claims (particularly for imprecision), either in the Instructions For Use or as separate documents, that are not supported by an objective experimental design or statistical analysis and can be several folds larger than those observed experimentally. If not carefully scrutinised, a laboratory may inadvertently consider an observed performance not fit for clinical purpose as 'verified'. Examples include the Siemens IFU:SARS-COV-2 serology, Experimental CV: 1.8-4.0%; 'designed claims': 12-15% [49] and Thermofisher Procalcitonin: Experimental CV: 2.2-12.8%; design claims: 15-25% [50].
Appropriate reporting of method evaluation data is hampered by: (1) use of inappropriate guidelines; (2) lack of open access guidelines suitable for laboratory medicine; and (3) lack of standardised reporting of method evaluation results by the manufacturers. Leadership and advocacy from professional societies (e.g., IFCC) and high-impact clinical laboratory medicine journals (e.g., CCLM) will support improvement in these areas, benefiting the practice of clinical laboratory medicine.

Potential solutions
While there is still considerable work required to harmonise the practice of method evaluation in laboratory medicine, there are pathways forward. A pre-requisite of such an aim is harmonising surrounding terminology and setting minimum standards, including the analytical performance specifications and study design of the components for evaluation. More empirical studies are needed to demonstrate the trade-offs between resource requirements (emphasising limited resource settings) and clinical performance (in terms of power, false acceptance and rejection rates) of different experimental designs and statistical approaches. These works can then be incorporated into guidance documents relevant to clinical laboratories and are freely and widely available. To further close the loop, educational activities and fostering professional collaborations are essential to promote and improve the undertaking of method evaluation procedures.
On a final note, CCLM has a long and illustrious history of providing peer-reviewed published methods. For each method evaluation component, there are varying experimental protocols used, with some considered more robust than others. Hence, there is a strong need to provide direction to the appropriate protocols for clinical laboratories. With the work of the IFCC (Working Group on Method Evaluation Protocols) and close links with this journal, there is an opportunity to provide the appropriate method evaluation guidance needed to support publication quality in the future.