The impact of measurement uncertainty on the uncertainty of ordinal medical scores based on continuous quantitative laboratory results

The publication of the ISO/TS 20914:2019 practical guidance on measurement uncertainty (MU) has triggered a discussion on the calculation of MU related to derived biological quantities [1–4]. These papers postulate that the combined uncertainty of a derivedbiological quantity canbe estimated by combining the individual measurand MU’s [1–5]. To obtain an accurate MU, it is important to correct for correlations between the measurands [1, 5]. These correlations may cause interactions between the various measurands that can either increase or decrease the total combined uncertainty [1, 6]. However, neither the ISO20914:2019 nor the papers mentioned above suggest approaches for ordinal medical algorithms based on continuously expressed measurands with known uncertainty. Here we present how the uncertainty of continuous measurements impacts the uncertainty of ordinal medical algorithms, and we provide guidance on the calculation in extension of ISO/TS 20914:2019 examples. An ordinal medical algorithm is a score-based system where several continuously expressed measurands are translated into individual ordinal scores as a result of the application of absolute thresholds on continuous original results. The scores for the individual elements are then summed into an overall combined score, which is used to make medical decisions. When exceeding a predefined diagnosis or treatment limit, the patient is diagnosed or treated, respectively. As the individual continuous measurands have an uncertainty, so will the total end score of the medical algorithm. However, the impact of MU of the measurands on the end score will differ per patient, as the proximity of the measurands to their respective cut-off scoring values is different. Therefore, the uncertainty approach for ordinal measurands (e.g., by Bashkansky et al. [7]) does not apply, since it does no right to the continuous base of the data where some values that are close to a threshold are more prone to uncertainty than those that are more remote from thresholds. It is important to have knowledge of the scoreuncertainty, as patients with additional risks or benefits could become apparent and currently used scoring systems could perhaps be improved. In line with other calculations in ISO 20914:2019 we propose that the determination of the score-uncertainty is done by the following steps: (1) Calculate the relative expanded uncertainty %U(y) of the involved measurand y in the medical score in line with ISO20914:2019. We have used examples A3 and A4. To calculate %U(y), we used the following data: – urw: Calculate the uncertainty under intermediate precision conditions by using internal quality control data from routinely used commercial quality control materials of the past 2 years for every level used. – ucal: Obtain the uncertainty of the calibrator(s) from themanufacturer. Note thatmanufacturers *Corresponding author: Marith van Schrojenstein Lantman, MSc, Department of Laboratory Medicine, Radboud University Medical Centre, Geert Grooteplein Zuid 10, 6500HB, Nijmegen, The Netherlands; Laboratory for Clinical Chemistry, Part of Result Laboratorium, Amphia Hospital, Breda, The Netherlands; and SKML, Organisation for Quality Assurance of Medical Laboratory Diagnostics, Radboud University, Nijmegen, The Netherlands, E-mail: marith.vanschrojensteinlantman@radboudumc.nl Marc H. M. Thelen, Department of Laboratory Medicine, Radboud University Medical Centre, Nijmegen, The Netherlands; Laboratory for Clinical Chemistry, Part of Result Laboratorium, Amphia Hospital, Breda, The Netherlands; and SKML, Organisation for Quality Assurance of Medical Laboratory Diagnostics, Radboud University, Nijmegen, The Netherlands. https://orcid.org/0000-0003-1771-669X Clin Chem Lab Med 2021; 59(8): e309–e312

To the Editor, The publication of the ISO/TS 20914:2019 practical guidance on measurement uncertainty (MU) has triggered a discussion on the calculation of MU related to derived biological quantities [1][2][3][4]. These papers postulate that the combined uncertainty of a derived biological quantity can be estimated by combining the individual measurand MU's [1][2][3][4][5]. To obtain an accurate MU, it is important to correct for correlations between the measurands [1,5]. These correlations may cause interactions between the various measurands that can either increase or decrease the total combined uncertainty [1,6].
However, neither the ISO20914:2019 nor the papers mentioned above suggest approaches for ordinal medical algorithms based on continuously expressed measurands with known uncertainty. Here we present how the uncertainty of continuous measurements impacts the uncertainty of ordinal medical algorithms, and we provide guidance on the calculation in extension of ISO/TS 20914:2019 examples.
An ordinal medical algorithm is a score-based system where several continuously expressed measurands are translated into individual ordinal scores as a result of the application of absolute thresholds on continuous original results. The scores for the individual elements are then summed into an overall combined score, which is used to make medical decisions. When exceeding a predefined diagnosis or treatment limit, the patient is diagnosed or treated, respectively. As the individual continuous measurands have an uncertainty, so will the total end score of the medical algorithm.
However, the impact of MU of the measurands on the end score will differ per patient, as the proximity of the measurands to their respective cut-off scoring values is different. Therefore, the uncertainty approach for ordinal measurands (e.g., by Bashkansky et al. [7]) does not apply, since it does no right to the continuous base of the data where some values that are close to a threshold are more prone to uncertainty than those that are more remote from thresholds.
It is important to have knowledge of the scoreuncertainty, as patients with additional risks or benefits could become apparent and currently used scoring systems could perhaps be improved.
In line with other calculations in ISO 20914:2019 we propose that the determination of the score-uncertainty is done by the following steps: (1) Calculate the relative expanded uncertainty %U(y) of the involved measurand y in the medical score in line with ISO20914:2019. We have used examples A3 and A4. To calculate %U(y), we used the following data: u rw : Calculate the uncertainty under intermediate precision conditions by using internal quality control data from routinely used commercial quality control materials of the past 2 years for every level used. u cal : Obtain the uncertainty of the calibrator(s) from the manufacturer. Note that manufacturers may report the expanded uncertainty rather than the uncertainty. u bias : Since none of the measurands involved had an unacceptable bias that needed immediate correction by the laboratory during the period involved, no uncertainty for bias correction u bias needed to be applied. k: to determine %U(y) we opted for a 95% confidence, which results in a coverage factor of 2. A graphical representation of this theory is shown in Figure 1. The examples make clear that the same uncertainty in the continuous results can result in a different score uncertainty between patients based on the closeness of the individual measurands to scoring thresholds. Therefore, the ordinal medical scoring uncertainty is not only a magnitude that depends on the measurement uncertainty of the underlying measurements, but also on the distribution of the measurement results over the measurement range.
To demonstrate the impact of MU on scoring uncertainty we have applied our approach to two commonly used medical scoring algorithms using real patient data from our hospital with the application of MU data of the laboratory tests involved from our laboratory.
The Child-Pugh score is used to decide whether to operate on liver patients, and provides a mortality prognosis for patients [8]. The difference in Child-Pugh classification A, B and C is the mortality associated with surgery, respectively 10, 30 and 70-80%. The overall 1-year survival differs between Child-Pugh A, B and C patients as well: 100, 80 and 45%, respectively. Because of these major mortality differences and altered clinical decisions based on this classification, it is important to incorporate the MU in to the clinical decision limit. We included patients if values of albumin, bilirubin and INR were recorded in the laboratory information system (LIS). The used u rw and, if available, u cal are shown in Table 1. As the encephalopathy and ascites scores were not available, they were scored randomly and MU was not calculated for these factors. Of the 6,558 observations in the dataset, inclusion of the MU Figure 1: Figurative representation of effects of measurement uncertainty on a hypothetical score system of two patients. Patient 1 (yellow) has a continuous result that translates into a score of 3 on item 1 and a continuous result that translates into a score of 1 on item 2. The 95% interval (shown as horizontal bars) are fully within the region of these scores and therefore, inclusion of measurement uncertainty does not lead to a different score. However, for patient 2 identical uncertainties in the continuous results over span more than one scoring category for both item 1 and item 2. Inclusion of the measurement uncertainty does lead to a different score, as in item 1 the horizontal bar crosses the cut-off point between score 1 and 2, whereas in item 2 the horizontal bar crosses the cut-off point between score 3 and 4. Thus, in patient 2, the score is 5, with a 95% interval of 4-6.  The APACHE II score is an often-used score in many clinical decisions at intensive care units. The APACHE II score is based on age, Glasgow coma score (GCS), pO 2 , temperature, mean arterial pressure, arterial pH, sodium, potassium, creatinine, haematocrit and white blood cell count [9]. We included patients into the analysis if values of the APACHE II measurands were recorded in the laboratory information system (LIS) and assessed whether inclusion of the MU would result in a different clinical decision-limit (APACHE score>13.5 [10]). The used u rw and, if available, u cal are shown in Table 1. The markers GCS and temperature were scored randomly and MU was not calculated for these factors. Of the 509 patients in the dataset, inclusion of the MU resulted in a potential different APACHE II medical decision in 41 patients (8.05%).
These data are in conjunction with recent publications and an editorial regarding effects of MU on calculated biological variables, and research is important to make further advances in clinical chemistry. Yet, no consideration has been given to medical algorithms, of which, due to their mathematical construct, a standardised uncertainty cannot be calculated. On a patient level, one can assess the individual score uncertainty, which can aid clinicians in their decision-making process. The examples of the Child-Pugh score and APACHE II score effects shown above indicate that the impact of MU on the clinical decision may be substantial, which warrants future efforts to conduct research on the uncertainty of other medical scoring systems.
Even when MU of the measurands involved is identical between institutions, the impact of the scoring uncertainty on the associated clinical decision may differ between institutions if their patients differ in their pre-test probability for the condition assessed by the algorithm. Therefore, every healthcare institution should calculate the effects of MU on their particulate patients to assess the possible effects.
This method of establishing the effects of MU on ordinal medical algorithm scores is based on the assumption that scores are summed. When scores are multiplied, attention has to be given to the effect of correlation, as described in earlier mentioned papers [1,[4][5][6].
Our approach to scoring uncertainty can promote the undertaking of new studies assessing whether the MU should be within, without or crossing the medical decision limit. In this way, the acceptable decision uncertainty can set the goal for the allowable MU for the measurands involved in its calculation. Due to the compensating effect of correlation between the different elements in a scoring algorithm, certain measurands might require stricter APS when used as standalone markers as opposed to when they're used in medical algorithms [2]. We hope this will give an impulse to new research into more APSs that are based on medical need as formulated in the first EFLM strategic meeting in Milan 2014 as the Milan 1a and 1b and will improve patient outcome and optimize healthcare that is already in place [11].

.%
Per measurand, the relative standard uncertainty of the calibrator %u cal and, if available, %u cal of a second concentration is displayed. Also, the relative imprecision of the measurement procedure %u rw at different concentration levels are shown. The relative total uncertainty %u(y) of the measurands is shown in the last column that combines the information in the u cal and u rw column at particular concentration levels.