Performance specifications for measurement uncertainty of common biochemical measurands according to Milan models

Objectives: Definition and fullfillment of analytical performance specifications (APS) for measurement uncertainty (MU) allow to make laboratory determinations clinically usable. The 2014Milan Strategic Conference have proposed models to objectively derive APS based on: (a) the effect of analytical performance on clinical outcome; (b) biological variation components; and (3) the state of the art of the measurement, defined as the highest level of analytical performance technically achievable. Using these models appropriately, we present here a proposal for defining APS for standard MU for some common biochemical measurands. Methods: We allocated a group of 13 measurands selected among the most commonly laboratory requested tests to each of the three Milan models on the basis of their biological and clinical characteristics. Both minimum and desirable levels of quality of APS for standard MU of clinical samples were defined by using information obtained from available studies. Results: Blood total hemoglobin, plasma glucose, blood glycated hemoglobin, and serum 25-hydroxyvitamin D3 were allocated to the model 1 and the corresponding desirable APS were 2.80, 2.00, 3.00, and 10.0%, respectively. Plasma potassium, sodium, chloride, total calcium, alanine aminotransferase, creatinine, urea, and total bilirubin were allocated to the model 2 and the corresponding desirable APS were 1.96, 0.27, 0.49, 0.91, 4.65, 2.20, 7.05, and 10.5%, respectively. For C-reactive protein, allocated to the model 3, a desirable MU of 3.76% was defined. Conclusions: APS for MU of clinical samples derived in this study are essential to objectively evaluate the reliability of results provided by medical laboratories.


Background
The main scope of laboratory medicine is to provide useful information for correct clinical decision-making in order to significantly contribute to the quality of health care [1]. Particularly, to make laboratory determinations clinically usable and to ensure that the measurement variability does not gain the upper hand obscuring the clinical information supplied by the test results, it is essential to estimate the measurement uncertainty (MU) of results produced by procedures intended to measure biological measurands, and to verify that the estimated MU is within objectively defined analytical performance specifications (APS) [2]. Recently published documents have recommended how medical laboratories should correctly estimate MU, basically endorsing the so-called "top-down" approach using appropriate internal quality control (IQC) data and commercial calibrator information [2][3][4]. MU at the level of clinical samples should be the combination of all uncertainty contributions accumulated across the entire traceability chain, starting with MU of reference materials, extending through the in vitro diagnostic (IVD) manufacturers and their process for assignment of calibrator values and MU, and ending with the random variability of measuring systems. Very important to the scope of the present paper, the ISO Technical Specification 20914 dealing with the MU estimation also emphasizes that the magnitude of MU should be suitable for a result to be used in a medical decision [3]. Therefore, the definition of an allowable MU is essential to ascertain if estimated MU for a given laboratory result may significantly affect its interpretation [5][6][7].
The Strategic Conference organized in Milan in 2014 by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) defined three models for establishing APS, i.e., model 1, based on the effect of analytical performance on the clinical outcome; model 2, based on components of biological variation (BV) of the measurands; and model 3, based on the state of the art of the measurement (defined as the highest level of analytical performance technically achievable) [8,9]. The models use different principles and do not constitute a hierarchy; therefore, some models are better suited for certain measurands than for others, and the attention should primarily direct toward the measurand and its biological and clinical characteristics. Accordingly, criteria for allocating laboratory measurands to different models for APS were elaborated and practical principles were proposed [10]. Briefly, model 1 applies for measurands that have a central role in diagnosis and monitoring of a specific disease; model 2 should be used for measurands under strict metabolic control; and model 3 should be used for measurands that cannot be assigned to the first two models.

Defining performance specifications for measurement uncertainty
For MU, the relevant goal that should be fulfilled is that related to the allowable random variability of patient results, as the correct trueness transfer along the traceability chain should permit to achieve unbiased (or negligibly biased) results [2,11,12].
The Milan model 1 ("outcome-based APS") evaluates, directly or indirectly, the impact of a certain analytical variation on laboratory data interpretation and related clinical consequences. As studies investigating the direct impact of performance of laboratory measurements on clinical outcome are seldom available, indirect outcome studies are usually performed [13]. To be included in this model, the measurand should have a central and welldefined role in the decision making of a specific disease or clinical situation and test results should be interpreted through established decision limits [9,10]. Therefore, the model applies better to measurands which measurements are harmonized, when it is possible to define a common, method-independent threshold and consequently the impact of the MU in terms of patient misclassification. The APS for MU are defined by identifying the MU level corresponding to the misclassification rate which is considered clinically acceptable [14,15].
The Milan model 2 ("BV-based APS") should be adopted when the measurand concentrations in a certain biological fluid are under strict homeostatic control or when the measurand has stable concentrations, i.e. is in a "steady state" status when a subject is not sick. The APS for MU are calculated from intra-individual BV (CV I ) of measurand through an adaption of the classical formula for deriving analytical goals for random variability [11,16]. This approach is based on the concept that the total variation of laboratory results is dependent from standard MU and CV I according to the following formula adding both sources of variation linearly as variances: (MU 2 + CV I 2 ) 1/2 .
Being the contribution of CV I to the total variation incompressible, it is therefore necessary to minimize MU that should be modulated according to the specific CV I of a given measurand. It is agreed that desirable standard MU should be at maximum equal to 0.50 CV I . When a measurand cannot be included in either model 1 or model 2, it should be placed under model 3 ("state of the art-based APS"). To derive APS for MU by using this model it is necessary to identify the highest quality of analytical performance that is at present technically achievable [8]. The major components of the process are as follows: (1) assess combined MU for the selected measurand as recommended by ISO Technical Specification 20914 by using widely employed measuring systems; (2) identify the MU from the best performing system as the 'desirable' APS; and (3) establish the 'minimum' APS as being 50% greater than the desirable one. We previously reported the example of the measurement of intact human chorionic gonadotropin (hCG), intended as a test to detect pregnancy [4]. Intact hCG is indeed not primarily used for clinical diagnosis (situation to be fulfilled for using model 1) and it is not normally produced in healthy status (situation to be fulfilled for using model 2).
Finally, grading different levels of quality for APS (e.g., minimum and desirable) is also important because it may stimulate IVD manufacturers to work for improving the quality of assays in case of unacceptable or minimum performance [9,17]. In particular, if the desirable APS is defined, the quality level can be further modulated to minimum as follows: minimum APS = desirable APS + 0.50 × desirable APS. If minimum APS is only established, the quality level can be further modulated to desirable as 0.67 × minimum APS.

Selection of measurands and their categorization according to the models
Now that the theory has been consolidated, it is necessary to move to practice. Particularly, it is required to assign each measurand assayed in the medical laboratory to one of the three models described above and to define APS for its MU for suitable clinical application of the test. In this study, in line with the premises discussed above, we defined APS for MU of the most commonly requested biochemical measurands after their categorization according to the proper model. We identified 13 measurands after examining the yearly number of test requests received by the laboratory of 'Luigi Sacco' academic hospital in Milan and selecting the most requested measurands in different biochemistry categories (e.g., metabolites, ions, enzymes, etc.), excluding coagulation factors and cell components. We considered blood total hemoglobin, plasma glucose, blood glycated hemoglobin (HbA 1c ), plasma creatinine, plasma urea, plasma total bilirubin, plasma sodium, plasma potassium, plasma chloride, plasma total calcium, plasma alanine aminotransferase (ALT), plasma C-reactive protein (CRP), and serum 25-hydroxyvitamin D 3  . Tighter APS may be also required to allow for the variation seen in the pre-analytical phase due to variability of glucose stability in vitro [20].

Blood HbA 1c
HbA 1c test also plays a crucial role in the monitoring and diagnosis of diabetes, with clearly defined decision limits [23]. A simulation study by Nielsen et al. [24] evaluated the influence of analytical variability of HbA 1c measurement on the number of undiagnosed patients with diabetes. With a maximal variability of 3.0% the percentage of undiagnosed patients with diabetes in the study population was 2%. This misclassification rate increased to 3.7% if the analytical variation grew up to 3.7% (data derived from Fig. 2 of ref. [24]).

Total hemoglobin in blood
The measurement of total hemoglobin in blood is central to diagnose anemia and decision levels for its clinical application are available (i.e., 130 g/L in men, 120 g/L in nonpregnant women, and 110 g/L in children) [25]. Therefore, model 1 should be adopted. Among all the selected measurands belonging to this model, total hemoglobin was however that for which we and other authors [13] were unable to identify solid studies evaluating the impact of analytical variability on diagnostic accuracy. For total hemoglobin, a number of old studies interrogating clinicians as to what changes in laboratory results are judged as being important are available, but consensus at the EFLM Strategic Conference definitely considered this approach too subjective [8]. One paper in some ways represents a [partial] exception by providing a model accepting a false positive rate of 5% in classifying patients [26]. In this study, the authors identified a 2.8% goal for analytical variability of total hemoglobin measurements.

Serum 25(OH)D3
25(OH)D3 concentrations in serum is currently thought to be the best estimate of vitamin D status of an individual. Although different sets of guidelines for defining clinically relevant states of vitamin D status, especially vitamin deficiency and insufficiency, have been proposed [27], in the light of 25(OH)D3 clinical role, model 1 for APS derivation should be preferred. Stöckl et al. [28] analyzed the impact of analytical variability on interpretation of clinical results by using decision limits commonly used in assessing 25(OH)D3 results for the diagnosis of mild or moderate vitamin insufficiency (30 µg/L or 20 µg/L, respectively) or deficiency (12 µg/L). Assuming a tolerance limit of 20% misclassifications, the quality goal for analytical variability was defined as ≤15%. However, the authors themselves mentioned that the misclassification rate of 20% was chosen on an arbitrarily basis, being probably a too permissible outcome in the field of vitamin status evaluation. In a following clarification they stated indeed that a maximum analytical variability of 10% would be desirable [29]. Quite recently, Cavalier et al. [30] has proposed APS for MU based on the physiological variation of 25(OH)D3 concentrations over time as follows: MU <13.6% to detect a difference at p<0.05 ('minimum' MU) and 9.6% to detect a difference at p<0.01 ('desirable' MU). Interestingly, these APS are not so different from the above-mentioned proposals formulated by the Thienpont's group using the simulation approach [28,29].

Measurands belonging to model 2 Plasma ions
Plasma ions (i.e., sodium, potassium chloride, and total calcium) are the typical measurands that should be allocated to model 2 [10]. Their concentrations are tightly controlled by homeostatic mechanisms, including hormonal control and renal function. For the BV data of these measurands, only one paper graded 'A' according to BIVAC-QI was retrieved [31]. The CV I estimates of sodium (0.53%), potassium (3.92%), chloride (0.98%), and total calcium (1.81%) were used to derive APS for MU.

Plasma creatinine and urea
As the kidney function finely controls their plasma concentrations and assures their stability when a subject is in good health, these measurands should be assigned to the BV-based model. The Aarsand's paper previously mentioned for plasma ions [31] also represents the reference for deriving urea CV I (14.1%), while another paper from the EuBIVAS project provided the CV I of creatinine (4.40%) [32].

Total bilirubin in plasma
The bilirubin metabolism is a physiologic process devoted to the elimination of catabolic products that arise from the destruction of red blood cells. Bilirubin concentrations in plasma are therefore well controlled and the measurand is allocable to model 2. Aarsand et al. [31] estimated the CV I of this measurand at 20.9%.

Plasma ALT
Allocation of plasma ALT to the correct model to derive APS is a tricky issue. Although the definition of its cut-offs has been largely debated [33], this enzyme has surely a clinical role in various liver diseases. This would require the use of model 1. However, at this point in time outcomebased data are not available to enable APS setting for ALT. The EFLM Working Group on Biological Variation has suggested that, since the enzyme demonstrated rather stable catalytic concentrations in healthy individuals, it is rational to use the BV-model to derive APS. The group, in a paper graded 'A' by themselves, has reported a mean CV I estimate of 9.3%, which in our opinion is not properly expressing a strong enzyme stability, even because a number of subjects and/or drawned blood samples in the study were a priori excluded from the analysis because outliers [34].

Measurands belonging to model 3 Plasma CRP
CRP is the most sensitive of the acute phase proteins and its concentrations in plasma increase rapidly in many diseases involving tissue damage or inflammation. Besides not having a role in the decision making of a specific disease, this measurand is a biologically challenging analyte and this makes impossible to derive reliable CRP BV data [35,36]. Considering that neither of the first two Milan models are suitable for CRP, the model 3 should be used to derive APS for this measurand. A recent paper, by estimating MU of four widely used measuring systems, has defined as desirable a MU for CRP of 3.76% [37]. Table 1 summarizes the Milan model allocation and APS for standard MU on clinical samples for the selected biochemical measurands previously discussed. To provide a preliminary perspective on suitability of the application of these APS, we also added for comparison information on current state-of-the-art performance of our medical laboratory. The MU associated with the laboratory measurements was estimated by using IQC data to derive the random components of MU and commercial calibrator information, as previously described [2]. In particular, random variability of measuring systems was estimated as CV from 6-month consecutive measurement data of a serum-based fresh-frozen control material, randomly analyzed daily during ordinary laboratory activity [4]. As can be seen, the majority of the desirable MU in the table seem achievable and laboratories can expect to meet the corresponding APS, some are barely achievable (i.e., total calcium and CRP), and some (i.e., plasma sodium and chloride) not achievable, then laboratory professionals may know that industry improvements are required [17].

Concluding remarks
In this paper, we elaborated a synopsis of APS for MU of the most commonly requested biochemical measurands, according to models developed and recommended by the EFLM Strategic Conference held in Milan in 2014, to be used in laboratory practice to validate MU of employed measuring systems. We developed this process according to two main aspects: (1) allocation of each measurand to one of the three models on the basis of its biological and clinical characteristics, and (2) definition of APS for MU by reviewing available literature and selecting adequate information.
Identifying, for a specific measurand, the most appropriate model to derive APS is an essential step towards the definition of suitable MU. We believe that it is not acceptable to use the BV-based model to derive APS for all measurands just because the BV information is more easily obtainable. Measurands with defined role in diagnosis and monitoring of a specific disease should be tested in outcome-based studies and appropriate APS defined. It is a fact, however, that in the last years there has been much progress in improving the quality of BV estimation. By contrast, in our study the assignment of an APS for model 1 appears to be often based on a single study. Performing high quality outcome-based studies is therefore a fundamental requirement for making stronger recommendations about APS for measurands that should be allocated to this model. The model 2 should not be used for measurands having not sufficient homeostatic control. In general, one should always be wary of too high biological CV estimates. The formula for combining standard MU and CV I is dependent on both distributions being Gaussian and with wide CV I , this could not be fulfilled [38]. Furthermore, even if an incorrect estimate due to non-Gaussian distribution of evaluated data can be excluded, high biological CVs for sure represent high individual variability and lack of "steady state" conditions. On the other hand, the myth of state of the art as a 'rescue' model when APS correctly obtained with other models appear too stringent for a certain measurand should be dismantled. MU estimate gives objective information about the measuring system quality and becomes helpful in identifying measurands that need analytical improvement for their clinical use only when validated against suitable APS [2,6]. The case of plasma electrolyte measurement recently reported by us showed the efficacy of MU evaluation by using objectively derived APS in driving laboratories to improve the quality of provided results [17].
APS proposed in this paper represent the total MU budget that should be fulfilled at the level of patient results to make the laboratory test information fit for purpose, when combining MU of the measuring system employed in the individual laboratory to that accumulated along all the steps of metrological traceability chain. To facilitate the achievement of these goal it is therefore essential to define the entity of all MU contributions across the different steps of the metrological traceability chain in use and understand how much of the total MU budget can be used at each level. We recommended that no more than one-third of total MU budget should be consumed by the MU of higher-order references and no more than half of the budget is used when MU is combined at the commercial calibrator level [5,7]. By applying an upside-down approach, it is therefore possible from APS for MU proposed in this paper to calculate the MU goals that should be fulfilled at each previous step of metrological traceability chain [39]. This may contribute to achieve MU APS at the bottom of the metrological traceability chain (i.e., on patient samples) making laboratory results able to satisfy clinical needs.
Research funding: None declared. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.