Establishment of reference intervals for immunoassay analytes of adult population in Saudi Arabia

Background: This is a second part of report on the IFCC global multicenter study conducted in Saudi Arabia to derive reference intervals (RIs) for 20 immunoassay analytes including five tumor makers, five reproductive, seven other hormones and three vitamins. Methods: A total of 826 apparently healthy individuals aged ≥18  years were recruited in three clinical laboratories located in western, central and eastern Saudi Arabia using the protocol specified for the global study. All serum specimens were measured using Abbott, Architect analyzers. Multiple regression analysis (MRA) was performed to explore sources of variation of each analyte: age, body mass index (BMI), physical exercise and smoking. The magnitude of variation of reference values (RVs) attributable to sex, age and region was calculated by ANOVA as a standard deviation ratio (SDR). RIs were derived by the parametric (P) method. Results: MRA revealed that region, smoking and exercise were not relevant sources of variation for any analyte. Based on SDR and actual between-sex differences in upper limits (ULs), we chose to partition RIs by sex for all analytes except for α-fetoprotein and parathyroid hormone (PTH). Age-specific RIs were required in females for ferritin, estradiol, progesterone, testosterone, follitropin, luteotropin and prolactin (PRL). With prominent BMI-related increase, RIs for insulin and C-peptide were derived after excluding individuals with BMI > 32 kg/m2. Individuals taking vitamin D supplements were excluded in deriving RIs for vitamin D and PTH. Conclusions: RIs of major immunoassay analytes specific for Saudi Arabians were established in careful consideration of various biological sources of variation.


Introduction
The reference interval (RI) is the most widely used tool for interpreting laboratory test results. This tool helps clinicians to differentiate between healthy and non-healthy individuals. RIs are typically derived from healthy subjects. The multicenter study conducted by the IFCC Committee on Reference Intervals and Decision Limits (C-RIDL) was unique in the number of participating countries and number of included tests. In the first part of this study we attempted to derive RIs for 28 clinical biochemistry analytes, such as electrolytes, enzymes, glucose, lipids, iron, uric acid and proteins [1]. In this part we derived RIs for the most common analytes measured by immunoassays including hormones, tumor markers, vitamins, iron stores for the Saudi population according to harmonized protocol of the global study. We also investigated the association of those analytes with body mass index (BMI), sex, age, exercise and smoking status. In Saudi Arabia, there were studies targeting RIs for a few immunoassay analytes [2][3][4], but no study investigated RIs and influencing factors of their derivation for a diverse immunoassay analytes systematically.
The importance of our study resides in the following points: 1. It is part of the international multi-center project led by the IFCC, C-RIDL. The other collaborating countries, as of now, are the USA, Turkey, Japan, the UK, China, India, South Africa, Argentina, Russia, Pakistan, the Philippines, Nepal, Bangladesh, Kenya, Nigeria, Ghana, Egypt and Malaysia in the order of joining. 2. Twenty major immunoassay tests are targeted in the study. 3. Current analytical methods and the latest automated instruments were used. 4. The harmonized protocol developed in C-RIDL by consensus has been adopted for recruitment and data analyses 5. The Saudi population, similar in ethnic composition to the Arab gulf countries, is the only population involved in this global study, for which alcohol intake is prohibited. Therefore, this study provides a great opportunity to compare immunoassays reference values (RVs) with those from countries with widely different demographic profiles in term of typical partitioning factors such as BMI, alcohol intake and smoking.
As with part one of this study, it will be an opportunity to address the controversies over the selection between parametric (P) and non-parametric (NP) methods for derivation of RIs [5,6] and over what criteria to use in partitioning RVs by sex and age.

Materials and methods
The framework of the study was described in the first part of this report [1]. In brief, the number of apparently healthy Saudi subjects recruited in this study was 826 from across the Kingdom of Saudi Arabia. All subjects were recruited using the C-RIDL protocol with some modification to make it suitable for the culture of Saudi Arabia [7,8]. Twenty-two of the subjects were excluded either due to unqualified status under the protocol or due to possession of overtly extreme values among test results. Subjects were distributed across the region as follow: western [Jeddah] (51% = 409), central [Riyadh] (19% = 152) and eastern [Hassa] (30% = 243). The ages of subjects were between 18 and 65 years (48.2% males, 52.8% females). Names, abbreviations, assay methods, and imprecision of the analytes are listed in Table 1.
The questionnaire items were adapted from the one used in the previous Asian project, but modified to meet Saudi local needs [7]. In each region, apparently healthy subjects were recruited from more than one city and from various professions as described previously. To eliminate the factor of different analytical methods, all samples were analyzed collectively in the central laboratory located in Jeddah as described in the global protocol.

Blood collection and handling
Participants were asked to avoid excessive eating/drinking the night before sampling. They were asked to be fasting for at least 10 h before sample collection. The amount of blood to be drawn was between 15 and 20 mL. The time of sampling was set at 7-10 AM. The blood was drawn after the participant had sat quietly for 30 min to avoid variations due to postural influence and physical stress. The waiting time was used for checking the questionnaire. For blood collection, four plain serum tubes containing clot activator were used. The tubes were left at room temperature before centrifugation, which was performed within 1 h. After separation of the serum, the specimens were then promptly divided into five to 10 aliquots of 1 mL each using well-sealed freezing containers (CryoTubes) and be immediately stored at −80 °C. All the aliquots were shipped in a box filled with sufficient amount of dry ice to the central laboratory in the western region (Jeddah) for collective measurement.

Quality control
A quality control (QC) program was in use on a daily basis as usual.
For the purpose of this study, a dedicated QC monitoring was also performed by use of multiple commutable specimens prepared in the central laboratory (Jeddah) as suggested by the C-RIDL common protocol and standard operation procedure 2 (SOP 2) [9]. Specifically, a mini-panel of sera from five healthy individuals (two males and three females) was prepared and measured over the period of collective measurements in order to assess between-day variations of test results. RIs determined centrally in Jeddah laboratory were converted to those for each of the other participating laboratories (Riyadh and Hassa) through cross-checking results. Before sending samples to the central laboratory in Jeddah, 20 specimens were measured in each participating laboratory for the purpose of comparison. The linear structural relationship (the major axis regression) was used to convert RIs established by the centralized assay to the values of each participating laboratory. The participating laboratories in Jeddah, Riyadh and Hassa all used the same Abbott (Architect) analyzers.

Methods, instruments and reagents
In this second part of the Saudi multicenter RI study, we targeted 20 commonly tested analytes that were all measured using an Abbott, Architect i2000 analyzer.  Table 1. The auto-analyzer used for the centralized assays was an Abbott Architect i2000. The assays were performed at the King Abdulaziz Medical City Laboratory located in Jeddah, Saudi Arabia, using the manufacturer's reagents, calibrators and controls.
The results were retrieved and verified using stored data using i2000 Abbott software.

Statistical analysis
The tests results were evaluated by using the same statistical methods described in the previous studies [7][8][9][10] but methods are also described here.
Data validation: Data validation started by excluding subjects with overtly abnormal results (e.g. diabetic, hepatic, renal disease, active viral infection, etc.). This step represents the primary exclusion during recruitment and before blood sampling. Twenty-two subjects were excluded by identification of non-eligible criteria such as overt diabetes, early pregnancy, post-delivery <1 year, chronic diseases, donated blood within <4 weeks previously and the presence of abnormal hemoglobin variants.
Because we targeted heterogeneous combinations of analytes, the scheme for secondary exclusion of inappropriate subjects was customized analyte by analyte. For VitD and PTH, we excluded 123 individuals with VitD supplements, 24 individuals for ferritin because of iron supplement use. For TSH, we identified 10 individuals who were under thyroxine replacement therapy, but we ignored the fact because all were euthyroid. For reproductive hormones, we excluded 29 individuals who were using hormone replacement therapy (HRT) or using contraceptive pills.
Analyses of biological sources of variation: Multiple regression analysis (MRA) was performed separately for each sex with RVs for each analyte set as an objective variable. As explanatory variables, we constantly set age, BMI, level of regular exercise (0 [none] ~7 [every day]: the value halved for light exercise) and level of smoking (0 [none] ~2 [heavy smoker]). The association of a given explanatory variable with the objective one was expressed as standardized partial regression coefficient (r p ) corresponding to partial correlation coefficient with values between −1.0 and 1.0. Because of analyzing hundreds of data, it is not appropriate to evaluate statistical significance of r p by p-value. Therefore, we set its practical significance (effect size) to |r p | ≥ 0.2 as a midpoint effect size of Cohen' small (0.1) and medium (0.3) correlation coefficient [11] and its magnitude interpreted as "slight" for 0.20 ≤ |r p | < 0.30, "moderate" for 0.30 ≤ |r p | < 0.50 and "strong" for 0.50 ≤ |r p |. Because the distributions of RVs of all analytes were skewed with long tailing to the higher side except for cortisol, prior to MRA (or ANOVA as described below), they were converted to a near Gaussian shape by either square-root or log transformation: the former for tHCY and folate, and the latter for all other analytes.
Partitioning criteria: The possible need for partitioning/subgrouping RVs by sex, region and age was primarily based on the actual magnitude of between subgroup differences expressed as a standard deviation (SD) ratio or SDR. In brief, the magnitude of betweenregion SD (SDreg), between-sex SD (SDsex), between-age SD (SDage), and net-between individual SD (SDindiv) were computed by the three-level nested ANOVA, and SDR for each factor was calculated as a ratio of each SD to the SDindiv: i.e. SDRreg, SDRsex and SDRage. The sampling sites were categorized into three regions represented by the main three cities (western; Jeddah, central; Riyadh and eastern; Hassa). Age was stratified into the following four groups: 18−29, 30−39, 40−49 and 50−65 years. The threshold level of SDR that requires partition by a given factor was set to 0.4 [11,12]. In the calculation, each component of SDs for analytes that had been transformed were first calculated under the transformed scale, and then the SDs were reverse transformed to the original scale [12].
Because we occasionally encounter a situation where SDR does not represent actual between-subgroup difference at the lower or upper limits (LL, UL) of the RI, we adopted a secondary criterion called bias ratio (BR) at UL (or LL), BR UL (or BR LL ), for judging the need to partition RVs. It was defined by the following formula as was used for a case of partitioning by sex.
where UL M , UL F , and UL MF denote ULs for male (M), female (F) and male + female (MF), respectively, and LL MF denotes LL for MF. The numerator represents actual between-sex bias in UL, and the denominator represents SD comprising the RI, which corresponds to between-individual SD. Therefore, analogous with conventional specification for allowable analytical bias [13], we adopted 0.375 as a threshold for the minimal requirement for BR UL (or BR LL ).

Derivation of RI:
For analytes that showed close association with BMI by MRA, we examined the effect of excluding individuals with BMI ≥ 32 kg/m 2 by calculating BR UL . For calculation of RIs, the P method was primarily used, based on Gaussian transformation of RVs using the modified Box-Cox formula [14]. For comparative purpose, RIs were also calculated by a NP method. In both methods, confidence intervals (CIs) of lower and upper limits (LL and UL) of the RI were determined by the bootstrap method: i.e. the final dataset after the secondary exclusion steps was randomly resampled allowing replacement until the data size is the same as the source dataset, and RIs was computed from the resampled dataset, this resampling and recalculation of RIs was repeated 50 times, and CIs for LL and UL were predicted from the repeatedly calculated LLs and ULs of the RIs.

Profile of the subjects
The demographic profile of the participants from each region was summarized in the first part of this study. The tabulation was made after deleting those with extreme values based on the criteria described in the Methods section. As a whole, there were more females than males (51.2% vs. 48.8%). The questionnaire shows 65% of males and 80% of females exercised <1 day per week.
More details about evaluation of between-region, gender, smoking status were described in the first part of this study.

QC monitoring
QC monitoring using routine QC specimens and a set of five sera (mini-panel) from healthy individuals was performed over the period of collective measurements. Between-day and within-day CVs computed from daily test results of mid-normal QC sera are as shown in Table 1. The critical level of CV was set to 1/2 of CV I ( within-individual CV), which was adopted from Biological Variation Database of the European Federation of Clinical Chemistry and Laboratory Medicine [15].

Sources of variation evaluated by MRA
MRA was performed separately for each sex to identify source factors associated with changes in RVs from among age, BMI, levels of regular exercise and smoking. As shown in Table 2, for age-related changes in males, a moderate increase with age was observed for AFP (r p = 0.395) and FSH (0.392), a slight increase with age observed for VitD (0.220), and slight decrease with age for progesterone (−0.266), slight age-related increase was also noted for PSA (0.243).
On the other hand, females generally showed more pronounced age-related changes: a strong to moderate increase with age observed for FSH (r p = 0.638) and LH (0.425), a moderate decrease with age for estradiol (−0.409) and testosterone (−0.373), a moderate increase was also noted for VitD (0.397), and a slight increase for AFP (0.265), CEA (0.225), ferritin (0.248), and VitB12 (0.263), and a slight decrease for PRL (−0.278) and progesterone (−0.237). These changes are illustrated in Supplementary Figure 1. Representative changes are listed in Figure 1.
For BMI-related changes, a slight to moderate increase with BMI was noted in males and females for insulin (0.289, 0.354) and CPep (0.368, 0.372), respectively. A moderate reduction due to the increase of BMI was observed for testosterone only in males (−0.344). In females, a slight increase proportionate to BMI was noted for PTH (0.251). These BMI-related changes are illustrated in Figure 2 for representative analytes. For the exercise and smoking-habit related changes, none of the analytes showed appreciable changes.

SDR for sex and age-related changes
In judging the need for partitioning RVs by sex, region and age, we calculated the magnitude of between-subgroup variations of RVs as an SDR by use of ANOVAs: threelevel nested ANOVA for combined analysis of the three factors, followed by a two-level nested ANOVA targeting region and age, performed separately for each gender.
The results are summarized in Table 3. By adopting 0.4 as a significant effect size for SDR, between-sex differences expressed as SDRsex was significantly high for 11 analytes (PSA, testosterone, ferritin, FSH, LH, estradiol, progesterone, tHCY, CA125, CEA and PLR, in that order of magnitude). The SDRs for region (SDRreg) were below the threshold level for all analytes. SDRs for age (SDRage) were significant for two analytes (AFP and FSH) in males, and in five analytes (FSH, estradiol, LH, testosterone, VitD, CPep) in females.

Derivation of RIs
RIs were computed by both P and NP methods. Comparison of the RIs are shown in Supplementary Figure 2. It was obvious that ULs of RIs tended to be higher by the NP method than by the P method for the majority analytes. Ninety percent CIs of UL by the NP method were generally wider. On the other hand, the validity of the P method was confirmed by successful Gaussian transformation regardless of analytes as indicated by the linearity of probability paper plot (for the range of 10-90% cumulative frequency) and non-significant Kolmogorov-Smirnov (KS) test results shown in Supplementary Figure 3. The only exception was a bimodal distribution of progesterone RVs in female below 45 years of age, for which we adopted the RI by the NP method. We adopted RIs by the P method for all other analytes. TSH was an exception to this scheme because we encountered an unnatural tailing of RVs toward the higher side, which we regarded as brought about by latent autoimmune thyroiditis. Therefore, we examined the effect of truncating TSH RVs above 8 mIU/L as they are obviously abnormal. As shown in Supplementary   Figure 4, we applied the probability paper plot analysis for each sex by setting cumulative frequency on Y-axis and TSH values on X-axis before and after the exclusion. The UL and LL of each RI was determined by a fitting least-square regression line, and reading out the cumulative frequencies of 2.5 and 97.5% as LL and UL. After this confirmation, we applied the P method to the truncated TSH RVs.
A list of RIs derived by various means are shown in Supplementary Table 1 for all analytes. The need for deriving RIs partitioned by sex, age or status of BMI ≤ 32 kg/m 2 were decided primarily on the basis of the magnitude of SDRsex and SDRage, and r p for BMI. For deriving age specific RIs for sex hormones in females, we partitioned RVs by the status of menopause (MP), which was judged from age and information on menstruation in the questionnaire as well as a balance in values of estradiol, progesterone, FSH and LH. As a result, 90 belonged to the MP group and 317 to the pre-MP group. For the latter, 288 were valid for calculating the RIs after excluding females under HRT or taking contraceptive pills. On the other hand, for males and analytes other than sex-hormones, we chose to partition RVs arbitrarily at 45 years  of age to ensure 100 or more subjects in the higher age group.
From the extensive list of RIs in the Table, we chose to adopt RIs partitioned by sex for CEA, CA125, ferritin, tHCY, estradiol, progesterone, testosterone, FSH and LH, based on their high SDRsex. As an exception, sex-specific RIs were also adopted for analytes with SDRsex < 0.4 when between-sex difference in ULs was high: i.e. BR UL ≥ 0.375, as commented in the column named "SDR & decision" of Supplementary Table 1. These include VitB12, folate, VitD, insulin, CPep and TSH. For AFP and cortisol, between-sex differences at UL were greater than the threshold of BR UL , but the actual difference seen in Supplementary Figure 1 are obviously minor and thus we chose not to partition RVs for the two analytes.
For adopting age-specific RIs in males, we took into consideration both SDRage and between age-subgroup differences in the UL, BR UL , in a similar manner as above. As a result, we found age partitioned RIs necessary for FSH and testosterone in males and for ferritin, estradiol, progesterone, testosterone, FSH, LH and PRL in females.
Due to the high association (r p ) between BMI and test results of insulin, CPep and testosterone (as shown in Table 2) and due to the high prevalence of increased BMI in Saudi individuals, we examined the effect of five graded BMI restrictions on RIs: BMI all, ≤32, ≤30, ≤28 and ≤26 kg/m 2 as shown in Supplementary Table 2. We found that, with stepwise increase in BMI restriction, data sizes reduced progressively, but the changes in RIs were not as prominent or mixed. Therefore, as a compromise for avoiding reduction in data size, we chose the threshold of BMI = 32 kg/m 2 , which we adopted in deriving RIs for chemistry analytes [1]. We also examined the effect of applying the latent abnormal values exclusion method, which we applied in the study of chemistry RIs. However, the effect was a little less than that of BMI restriction (data not shown). Consequently, based on the BR of UL with/ without BMI restriction shown in Supplementary Table 2, we adopted the BMI restricted RIs of insulin for females and CPep for both sexes.
The list of final RIs we adopted accordingly, from Supplementary Table 1, are shown in Table 4.

Discussion
Due to the heterogeneity of the included tests in this study, we encountered various difficulties in finding appropriate schemes for deriving RIs optimized for each analyte. For example, for insulin, CPep, testosterone and PTH, we noted an appreciable association of their RVs with BMI and examined the effect of excluding individuals with BMI ≥ 32 kg/m 2 (n = 165 or 20.5%). An appreciable lowering of UL was noted for insulin and CPep but not for testosterone and PTH.
One of the difficulties in deriving RIs is the situation in which RVs of all analytes, except for cortisol, were prominently skewed with long tailing to the high concentration side. In such a case, SDR, which represents the variations of subgroup means from the grand mean, tends to give a lower value, but actual between-subgroup differences at ULs, measured as BR UL , tended to reveal wide between-UL difference. This phenomenon was true to between-sex differences in RVs of VitB12, folate, VitD, insulin, CPep, PRL and TSH, for which SDRsex was below the threshold of 0.4, but, by contrast, BR UL well exceeded the threshold of 0.375. This finding led us to evaluate BR UL in making the final decision for the need of partitioning RIs when the distribution of RVs is highly skewed.
In relation to the finding of highly skewed distributions of RVs in nearly all immunoassay analytes, it is of note that the P method was unanimously better than the NP method because the latter method is known to be highly sensitive to the presence of extreme values in the periphery of skewed RV distribution [16].
Regarding the regionality in test results for any of the 20 analytes examined within the three different regions of Saudi Arabia, based on SDRreg, we found no major regional differences in RVs for any analyte. However, a slight tendency of between-region differences was observed for folate, tHCY, cortisol, and PRL in males and for insulin and testosterone in females. These may be attributable to the fact that age of participants was lower in Riyadh, and BMI distribution of participants from Hassa was higher with some differences in lifestyles among the three regions of Saudi Arabia.
Results obtained from different countries involved in the global study compared to the Saudis were investigated in the first part of this study [17,18]. One of the most significant results was the median BMI value for Saudis which was the highest compared to other participating countries for both males and females. Unfortunately the number of included immunoassays varies among the involved countries (Japan, China, India, Turkey, Russia, the United Kingdom, South Africa and the USA). In the interim report of the global study, it was possible to align RVs based on test results of a serum panel measured in common. Comparison of the panel test results for 17 immunoassays (five tumor markers, six reproductive hormones, four other hormones and two vitamins) were done across five to nine countries depending on the analyte. Although variable degrees of between-laboratory bias existed in the majority of analytes, narrow-scatter around the regression line made it possible to align values across the countries. In this study, we re-aligned the test results of selected analytes from other countries to those of Saudi Arabia based on all-pairwise comparisons of the panel test results. From the analysis, we found RVs for insulin is peculiar to Saudi with the values highest among the countries in both genders (Supplementary Figure 5). Increased levels of insulin can be explained by the high prevalence of obesity and diabetes in Saudi population. Another interesting finding was the decreased level of ferritin in Saudi females. This is compatible with a previous report of low serum ferritin in Saudi females [19].
In Table 4, RIs are compared to the manufacturer RIs (kit inserts). Several immunoassays ULs and LLs were appreciably different than the kit insert RIs. For example, ULs for insulin and PRL in males are appreciably higher compared to the kit insert RIs. For PTH, the derived LL and UL are almost 60% and 72% higher than the kit insert LL and UL, respectively. The high PTH can be explained as a compensation for VitD deficiency in adult Saudi population as explained below.
Ferritin, folate, VitB12 and CPep immunoassays had lower ULs when compared to their kit insert limits. In Table 4, the lowered UL compared to kit inserts values for ferritin, VitB12 and folate strongly support our conclusion regarding the low iron level in Saudis compared to other populations in the first report [1] and the findings agree with a previous study [19]. Unfortunately, little information about samples collection, methods and procedures are given in the kit inserts. Therefore, most of the above observations cannot be discussed in detail. For example, the high ULs for PRL in Saudi males and females are increased for about 53% and 81%, respectively, when compared to the kit insert ULs.
In general, compared to the kit insert RIs, our derived RIs are more reliable because they are based on a welldefined harmonized protocol that specifies sampling procedures, measurements and statistical analyses considering sources of variation of each analyte.
The decreased VitD LL values for males and females (22 and 18 nmol/L, respectively) compared to the LL value suggested by the expert panel (75 nmol/L) as indicated by the kit insert [20], is evidence for the high prevalence of VitD deficiency in the adult Saudi population. This finding is supported by the previous published studies [3,21,22]. It is known that the recommended minimum level of VitD is 50 nmol/L, as advised by Food and Nutrition Board at the Institute of Medicine of the National Academies [23]. Our data shows that about 79.3% of males and 70.4% of females in adult Saudi population who participated in this study are deficient in VitD (Supplementary Figure 6A). In addition to this, we found that the median level of PTH differed appreciably from the cut-off limit of VitD (<50 or ≥50 nmol/L) regardless of sex [p < 0.001 by the Mann-Whitney U test for both sexes] (Supplementary Figure 6B).
One limitation of this study was the relatively small number of subjects recruited to partition RVs of female fertility hormones according to the status of MP. For TSH, we did not test for antithyroid autoantibodies for the definitive exclusion of individuals with autoimmune thyroiditis. Therefore, we had to resort to the empirical graphical method to reduce the influence of individuals with latent thyroid dysfunction.

Conclusions
This study was conducted as a part of the international collaborative project for the derivation of RIs. It is the largest and first study for the Saudi population to establish RIs for the major immunoassay tests of high clinical demand by use of the internationally harmonized protocol. Partitioning of RIs by sex, age and BMI status was necessary for many analytes. As stated in our first report, the outcome from this study should be applicable to other neighboring countries in the Middle East, taking into account the common culture, religion, language, life style, foods and the prevalence of related diseases such as increased BMI and diabetes.