Indirect estimation of reference intervals using first or last results and results from patients without repeated measurements

Abstract Objectives Indirect methods for the estimation of Reference Limits (RLs) use large data pools stored in modern laboratory information’s systems. To avoid correlation between observations repeated results from each patient should be excluded. Some data pools obtained are anonymized, and thereafter the data cannot be re-identified. The effect of the procedure of data selection on the estimations is not investigated yet. Methods We considered four parameters. Data sets were enclosed from two sources: a university hospital and a laboratory primarily reflecting a patient population from medical practitioners. Four algorithms were used for data selection, which generate first, last, all and non-repeated values. RLs were estimated through these data sets and compared. Results This study showed the broader reference range estimated by indirect methods if using the whole data set compared to first/last values or non-repeated values. Conclusions The use of all data without a filtering step results in a significant bias whereas the choice of first or last values has nearly no impact. The exclusion of repeated measurements results in narrower RLs. This influence confine the use of anonymous data sets where filtering is impossible for the estimation of RLs by indirect methods.


Introduction
The interpretation of results of medical laboratory tests are largely based on reference limits (RLs) and decision limits [1]. The preselection process aims to identify healthy individuals to select the reference population [2,3]. As an alternative, large data sets from laboratories consisting of a mixture of non-diseased and diseased individuals can be used to estimate RLs using indirect methods [4][5][6]. Until now there exist numerous algorithms for the indirect estimation of RLs published like Hofmann, Bhattacharya [7] and more sophisticated algorithms like the truncated minimum chi-square (TMC) [8] or the truncated maximum likelihood (TML) [9,10]. The main idea of the two last approaches is to identify and separate the distribution of the non-pathological results from the total laboratory data sets. These methods are based on the following assumptions: First, non-pathological values can be modelled by a distribution family. Second, the main part of the data set contains only non-pathological values and third, the overlap between the distributions of non-pathological and pathological values is only partial. The TML method was applied on different data sets and the results are published [10][11][12][13].
Indirect methods for the estimation of RLs require model assumptions. The most prominent one is that the prevalence of pathological values in the used data set is small. From a statistical point of view, the absence of a correlation between observations needs to be ensured. This leads to the requirement to include only one measurement per individual and the exclusion of repeated results from one patient [14]. As an option to ensure both conditions, only results from patients without repeated measurements ("non-repeated population") could be included. The medical rationale to exclude data from subjects who have more than one measurement for the analyte of interest is that these subjects are more likely to be diseased [14].
Typical LIS data sets allow this filtering because they usually contain the information about the individual, even if they are pseudomized. On the other hand, some large data pools obtained from LIS or hospital information systems are anonymized, and the data cannot be re-identified. Thereby it is not possible to filter for first or last values and to fulfil the assumption of non-correlated data. The aim of this study was to investigate the effect on the estimated RLs if filtering for first or last values or if patients with repeated measurements are excluded. The different filtering strategies were applied to two real world data sets: University hospital data containing a high percentage of patients with repeated measurements and data from a laboratory analysing mainly specimen from medical practitioners.

Data sets
To investigate the effect of filtering for first or last values or the exclusion of patients with repeated measurements, we chose four representative parameters: alkaline phosphatase (AP), gamma-glutamyltransferase (GGT), sodium (NA) in plasma and thrombocytes in whole blood. The first data set (C) is from the University Hospital Cologne and includes 1,566,409 measurements for AP (2004-2018), 1,166,020 for GGT (2014-2018), 2,176,200 for thrombocytes (2010-2018) and 2,093,722 for NA (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). The second data set (P) is from a laboratory in the same geographical region primarily reflecting a patient population from medical practitioners. This data set consists of data from seven years (2013-1/2020) and includes 651,579 measurements for AP, 1,307,076 for GGT, 1,356,691 for thrombocytes and 711,875 for NA. We have defined two indexes (measures) for each data set to describe the incidence of repeated measurements: number of measurements overall number of patients overall number of patients overall number of patients without repeated measurements Where I 1 delivers the average frequency of measurements per patient, I 2 gives the proportion of the total number of patients to the number of patients without repeated measurements. For I 1 =I 2 =1 we have a data set with only one measurement for each patient. Notice that I=I 1 × I 2 gives the proportion of the total data to the data from patients without repeated measurements. Values of I 1 and I 2 for AP, GGT, thrombocytes and NA for the first data set (C) were: I 1 (AP)=4.8, I 2 (AP)=1.8, I 1 (GGT)=5.5, I 2 (GGT)=1.9, I 1 (thrombocytes)=6.2, I 2 (thrombocytes)=2 and I 1 (NA)=6, I 2 (NA)=2. 3. In short, the average frequency of measurements per patient in University Hospital Cologne was 5-6 times (in the corresponding time interval) and approximately 8-10% of the collected data in the corresponding time interval come from "non-repeated" subpopulation.
The second data set (P) includes also patients with repeated measurements but to a lower extent: I 1 (AP)=2.8, I 2 (AP)=1.6, I 1 (GGT) =2.9, I 2 (GGT)=1.7, I 1 (thrombocytes)=2.7, I 2 (thrombocytes)=1.9 and I 1 (NA)=2.8, I 2 (NA)=1.8. In short, the average frequency of measurements per patient in this laboratory was 2-3 times (in the corresponding time interval) and approximately 20-22% of the collected data in the corresponding time interval come from "non-repeated" subpopulation. Summarized, the average frequency of measurements per patient (I 1 ) in the University Hospital Cologne was twice as high as in the laboratory (P) (5 vs. 2.5), and the proportion of the whole collected data to the data from only "non-repeated" subpopulation in the University Hospital Cologne was also twice as high as in the laboratory (P) (10:1 vs. 5:1).

Analytical methods
In the first data set thrombocytes were analysed using EDTA-blood on Sysmex-counters (Sysmex; Norderstedt). Platelet counts were carried out with the impedance-method on analysers of the XE-and XN-series. NH4-Heparin-Plasma was used for the analysis of sodium, AP and GGT on Roche/Hitachi-Cobas-Systems (Roche, Penzberg): GGT with an enzymatic colorimetric assay, AP with the IFCC-method (colorimetric assay) and sodium by ISE.
In the second data set thrombocytes were counted in EDTA-blood on Sysmex-analysers (Sysmex; Norderstedt) using the impedancemethod of the XN-series. NH 4 -Heparin-Plasma or Serum was used for the analysis of sodium, AP and GGT on Siemens-ADVIA-Chemistry-Systems and Siemens-Atellica CH-Systems (Siemens Healthcare Diagnostics; Eschborn): GGT and AP with modified IFCC-methods (colorimetric assays) and Sodium by ISE.

Statistical methods
Firstly, for each data set four subsets were generated. (1) "All values", which contains all data, (2) "first values" containing only the first value from each patient, (3) "last values" containing only the last value from each patient and (4) "non-repeated subset" containing only subjects who have only one measurement of the analyte of interest. Four data sets were generated from the University Hospital Cologne (C) data set and the data set from the laboratory (P: specimen from medical practitioners) (after elimination of non-valid values). We display the distribution of each subset by grouped boxplots after age (for male and female separately) to compare the trend lines of quantiles of these subsets with each other. The whiskers of the boxplot are defined as 95/75/25 and 5%.
Secondly, after these summary statistics, the lower and upper reference limits (lRL, uRL) using these four data sets are estimated and compared with each other. To estimate RLs we have applied the TML method of Arzideh et al. implemented in software R. The estimations were performed for male and female adult subjects, separately (age ≥18). To compare the estimated RLs through four data sets with each other, we consider the permissible uncertainty (pU) of calculated RLs. As discussed in [15,16] to evaluate the clinical relevance of a possible difference between RLs obtained from different subpopulations we use the pU to check if the reference limits can be considered as comparable. For the lower RLs (lRL) we calculate pU(lRL) through the "first" value data set and thereafter the lower (lRL 1 [first]) and upper (lRL 2 [first]) limits of uncertainty of them and check if the estimated lRL through the other three data sets are within or outside these limits. The same procedure has been applied for the upper RLs.
In the third step, we consider the age-dependency of the RLs and compare the performance of the above defined subsets, correspondingly. The indirect method of Arzideh et al. was applied on predefined age classes and the estimated lower and upper RLs (lRL, uRL) were smoothed through some spline functions. The results of agedependent RLs for four data sets were compared.

Results
Alkaline phosphatase (AP) Figure 1 shows the boxplots and thus the distribution of AP values vs. age for female subjects for four data sets "all-", "first-", "last-" and "non-repeated" populations for the data set from the laboratory C. As the number of the data for nonadults (defined as age <18) was not enough, we concentrated us on the results of adults. As expected, the quantile lines indicate an increasing trend with increasing age, especially after 40 years old in all four data sets for female subjects. The trend lines for "first-", "last-" and "non-repeated" data sets appear similar, whereas the data set "all-" shows large quantile lines. Comparing the results of "all" subjects by two data sets (the results for male subjects and for the data set from the laboratory P are displayed in Supplemental Material B, Figure B4) it is obvious that the data from the University Hospital (C) contains more pathological values than the data from the laboratory (P), as expected. This phenomenon is vanished by comparing "first-", "last-" and "non-repeated" subpopulations from two sources. Additionally, the quantile lines of "non-repeated" data sets are lower than from other data sets, specially, for elder patients. For male subjects as seen for female subjects, the data set "all" indicates to contain more pathological values, than the other three data sets, especially in the case of data from the University Hospital (see Supplemental Material, Figure B4).
The reference limits were estimated through the indirect method and are shown in Table 1. As age-dependency of AP values for female subjects is reported in some publications [17], we have considered and analysed the data for a younger age group (18-39 years, Table 1). To compare the estimated RLs through four data sets (first, last, all and non-repeated values) with each other, the permissible uncertainty (pU) of the estimated RLs via "first value" data set have been calculated and checked whether the estimated RLs through other data sets ("last", "all" and "non-repeated" subsets) lay in the corresponding intervals (see Table 2: lRL 1 (first) and lRL 2 (first) for the lower RL and uRL 1 (first) and uRL 2 (first) for the upper RL). For (C) the estimated RLs from "last" and "non-repeated" values are equivalent to those estimated from "first" values, but these are not equivalent with those estimated from "all" values, as shown in Table 2. The same is observed for the estimations for the younger age class (18 ≤ age < 40) (data not shown). For (P) with specimen from medical practitioners not only the estimated RLs from "last" and "non-repeated" values are equivalent, but also relative identical with those estimated from "first" values. Furthermore, estimated RLs through "all" values are also equivalent (but not almost identical and always wider) with those estimated from "first" values, as shown in Table 2. The "whiskers" of the boxplot are defined as 95/75/25 and 5%. Red lines defined the lower/upper RLs used in the corresponding laboratory (35-105 U/I). From above to the bottom: "all-", "first-", "last-" and "non-repeated-" data sets. Y-axis in U/L, X-axis age in years.
Age-related continuous RLs: estimated splines for lower RLs and upper RLs for four data sets, all/first/last/nonrepeated values for female and male subjects are shown in Figure 2 (for data set C and data set P separately). As expected, and considered (in Figure 1), the estimated RLs for AP by all four subsets indicate that the upper and lower RLs increase by increasing age for female subjects, especially for the age between 40 and 60 years. Additionally, the estimated uRL from "first" and "non-repeated" populations are comparable to those from literature. The estimated uRLs by "all" values are overestimated. For male subjects the estimated curves for upper and lower RLs through "first", "last" and "non-repeated" values are similar (see Figure A2, Supplemental Material). In summary: Thrombocytes: The estimated uRLs for female subjects were higher than those for male subjects. The estimated lRL and uRL indicate no considerable age trend for female subjects. Though, the estimated lRL for male subjects decreases with increasing age. This trend has been reported in [18]. The estimated lRL and uRL from "all" values are underestimated or overestimated correspondingly and thereby seem to be not valid (see Table A1 and Figure B1).
GGT: Generally, the estimated RLs from "all" values were not equivalent with the estimations of the other data sets (Table B4). The estimated continuous curves for uRLs indicate considerable dynamic trends with increasing age for female and male subjects. While for female subjects the upper RL increases with increasing age monotonically, for male subjects the upper RL increases with increasing age until 55-60 years and thereafter decreases. This phenomenon has been reported also in literature [19]. See Table A2 and Figure B2.
Sodium: The estimated lower RLs through "all" values for all subpopulations were considerably lower than those estimations from other data sets using data from University Hospital but not for data from the laboratory (P) (see Table A3 and Figure B3).

Discussion
Clinical decisions are affected by in vitro diagnostics in up to 70% [20]. This decision is based on reference intervals, so the reference interval is a vital part of the information supplied by clinical laboratories to support interpretation Table : Estimated RLs (lRL and uRL) for AP (U/L) values obtained from a hospital (C) and a laboratory (P) through data sets of all, first, last and non-repeated values, for male and female adults subjects and for young population (age between  and  years).  [14]. In conclusion, the quality of a laboratory test result is affected by the quality of the reference interval. The indirect methods for the estimation of reference limits use data sets from patient archives usually stored in laboratory information systems. These data-mining strategies in general rely on large data sets of good quality. In some cases, it is mandatory to pool the data from different laboratories which is only possible if the prerequisites described by Zierk et al. are fulfilled [21]. In the case of the estimation of RIs the minimal information used by most approaches is the measured, unit, method, identifier/pseudonym, gender, age and for some methods the date and time of analysis (to analyse and exclude drift effects) or the ward. Depending on local legal regulations the data often needs to get anonymized as described by Holst et al. [22] sometimes even before pooling of the data. Anonymization requires the removal of identifiers or characteristics thus assuring that a link between the data and the individual is impossible. But for the estimation of RIs the information about the individual is needed to fulfil the assumption of analysing noncorrelated data. This is a prerequisite which also applies to direct methods, where 120 results from different individuals are used for the calculation. By filtering e.g. for first or last values it is assured that only one result per individual contributes to the estimated RLs. This strategy aims to prevent a possible bias caused by repeated measurements and is supported by a number of authors [14].

Data specifications
If only anonymized data is available such an upstream filtering step is made impossible. We tested for the effect using two real world data sets: University hospital data and data from a laboratory analysing specimen from medical practitioners. In our approach the data set "all" reflects the anonymized data and the "first-", "last-" and "nonrepeated" data sets represent pseudomized data where filtering for first or last values or the exclusion of patients with repeated measurements was applied. In these data sets we observed a considerable difference in the proportion of patients with multiple measurements, in the case of the university hospital data set only 8-10% of the collected data in the corresponding time interval come from the "non-repeated" subpopulation. In the general practitioner's laboratory approximately 20-22% of the collected data in the corresponding time interval come from the "non-repeated" subpopulation. This difference and the fact, that the results from the practitioner´s laboratory reflect a "healthier" subpopulation supports the view of some authors to prefer data from outpatients [23]. However, one can argue against this approach, that one may has excluded some healthy subjects, with some high/ low values, and thereby the estimated standard deviation may be underestimated.
Our results for four representative analytes, thrombocytes, AP, GGT and sodium show the significant impact on the indirect estimation of reference intervals if using the whole dataset instead of first, last results or results from patients without repeated measurements. Typically, the analysis of the whole data set leads to wider RIs and thus to a lower quality of the estimated RI. This is especially true for data from large hospitals where the number of repetitive measurements is high.
In conclusion, the results clearly show the influence of the different filtering strategies which lead to non-identical Table : Comparison of permissible uncertainty (pU) of estimated RLs (lRL and uRL) for AP values obtained from a hospital (C) and a laboratory (P) through "first" data set with estimated RLs through "all", "last" and "non-repeated" data sets (U/L).

Male
Female lRL  (first) and lRL  (first): lower and upper limits of pU of estimated lRL through the "first" value data set, correspondingly. uRL  (first) and uRL  (first): lower and upper limits of pU of estimated uRL through the "first" value data set, correspondingly (only results for age ≥ years are shown). Bold raw(s)/values indicate(s) non-equivalent estimations as those by first values. Nrep, non-repeated.
reference limit estimations even from a clinical view.
Especially the use of all data without a filtering step results in a significant bias whereas the choice of first or last values has nearly no impact. The exclusion of repeated measurements results in narrower RLs. This influence confines the use of anonymous data sets where filtering is impossible for the estimation of RLs by indirect methods.

Conclusions
In conclusion, the results clearly show the influence of the different filtering strategies which lead to non-identical reference limit estimations even from a clinical view.
Especially the use of all data without a filtering step results in a significant bias whereas the choice of first or last values has nearly no impact. The exclusion of repeated measurements results in narrower RLs. For the estimation of RLs by indirect methods the inevitable filtering steps must be carried out before anonymisation.
Research funding: None declared.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: Authors state no conflict of interest. Ethical approval: Not applicable.