Methodological considerations in determining sex steroids in children: comparison of conventional immunoassays with liquid chromatography-tandem mass spectrometry

Objectives: In laboratory medicine, external quality assessment (EQA) schemes have become versatile tools for detecting analytical ﬂ aws. However, EQA schemes are lacking for pediatric sex steroid levels. We aimed to investigate the suitability of di ﬀ erent estradiol and testosterone immunoassays in a pediatric setting in comparison with clinical liquid chromatography-tandem mass spectrometry (LC-MS/MS) assays. Methods: The study was conducted by sta ﬀ and the advisory group on endocrinology at Equalis, the Swedish provider of EQA schemes for laboratory medicine. The test material consisted of ﬁ ve pooled serum samples from children who were either prepubertal or in puberty. Clinical laboratories enrolled in Equalis EQA schemes for estradiol and testosterone were invited to participate, as were clinical laboratories using LC-MS/MS-assays. Samples were analyzed by either routine immunoassays (n=18) or in-house LC-MS/ MS assays (n=3). Results: For estradiol, LC-MS/MS assays showed a high degree of conformity with interlaboratory coe ﬃ cients of variation (CV) below 24.2 %. Reported levels were between 4.9 ± 1.2 and 33.9 ± 1.6 pmol/L (group mean ± standard deviation). The direct immunoassays had lower precision; their CVs were up to 81.4 %. Reported concentrations were between 25.3 ± 18.1 and 45.7 ± 19.4 pmol/L, an overestimation compared to LC-MS/MS. Testosterone LC-MS/MS also showed a high degree of conformity, CVs were below 13.4 %, and reported concentrations were from 0.06 ± 0.00 to 1.00 ± 0.11 nmol/L. The direct immunoassays had a larger discrepancy between results; CVs were up to 95.8 %. Concentrations were between 0.12 ± 0.11 and 0.85 ± 0.23 nmol/L. Conclusions: For the safe diagnosis and determination of sex steroids in children, analysis with mass spectrometry-based methods is recommended.


Introduction
Sex steroid determinations are invaluable tools in the investigation and treatment of adrenal and sex development disorders in children.Therefore, specific, sensitive and quantitative assays are of crucial importance.Based on studies primarily using samples from adults, a growing amount of evidence indicates a significant analytical bias of sex steroid immunoassays in the lower concentration ranges [1][2][3][4][5][6].There is a need for a more elaborate approach to the problem with regard to different assays.Moreover, there is a lack of awareness concerning the flaws in immunoassays and what reasonable hormone levels are in children.
Most clinical laboratory methods used today for the determination of sex steroids are commercially available chemiluminescent immunoassays (CLIA) that use patient sera that has not been pretreated or purified.These methods have the advantages of being user-friendly, suitable for high throughput automated platforms and capable of fast turnaround times.For the most part, they are designed for fertility investigations in adults and are not validated to quantify the low levels of sex steroids found in prepubertal children, which typically are 100 times lower [7,8].Immunebased methods without sample pre-extraction generally do not quantify estradiol below 50-100 pmol/L or testosterone below 1-3 nmol/L with acceptable reproducibility [1][2][3][4]9].Since external quality assessment (EQA) schemes for pediatric samples are not commonly available, the analytical problems associated with different types of assays are likely to be very hard to detect for many clinical laboratories, and the problems might be underestimated.With that background, along with recently published pediatric reference intervals for estradiol and testosterone performed by CLIA, one may question the relevance of those references.
Since the late 1990s, clinical practitioners in Sweden have had access to sensitive assays in terms of an extraction estradiol radio-immunoassay (RIA) and a sensitive testosterone-RIA together with sex-and pubertyspecific reference intervals adopted in-house and validated at a specialist laboratory [10][11][12][13].In the last decade, sensitive in-house-developed clinical assays for sex steroid determinations based on liquid chromatography-tandem mass spectrometry (LC-MS/MS) have become more available and are now generally considered golden-standard assays for many steroid hormones.These assays have technical advantages that significantly improve the analytical selectivity required for differentiation between closely structurally related steroids.Moreover, visual presentation of results in the form of chromatograms allows detecting possible analytical interferences.
The aim of this study was to investigate the suitability of different estradiol and testosterone immunoassays to quantify levels of these hormones in a pediatric setting, and to compare the analytical performance of the immunoassays to available validated clinical LC-MS/MS-assays.The study is based on results from a national distribution of serum samples from children via Equalis, the Swedish provider of EQA schemes for laboratory medicine.

Samples
This method comparison study was conducted by staff and the advisory group on endocrinology at Equalis AB, Uppsala, Sweden in August 2020.The test material consisted of five pooled serum samples from children 4-18 years of age, without any additives.In each material, about 30 leftover samples, either from prepubertal children (samples A and E), boys in early puberty (sample B), girls in early-mid puberty (sample C) or boys in early-mid puberty (sample D), were pooled together.
The sera were left-over samples from hormone determinations at Tillväxtlaboratoriet, Gothenburg, Sweden.The deidentified samples were stored at −20 °C for a maximum of three months and sent frozen in Falcon tubes (Sarstedt) to Equalis AB, Uppsala, Sweden.After being divided into 1 mL aliquots, samples were sent to participants in polypropylene micro tubes (Sarstedt) at 21 °C (±1 °C), to participants.Stability tests were conducted in previous studies.Evaluation showed that serum sex steroids were not affected by storage at 21 °C (±1 °C) for three days, in +2 to +8 °C for up to three weeks, long-term storage in −20 °C or repeated thaw/freeze cycles [12,13].

Methods
Clinical laboratories in Sweden enrolled in the Equalis EQA scheme were invited to participate in a specially composed EQA round (Article number: KSP 014; Round 2020:01).Participants were informed that the samples were pooled from patient plasma, but not that the samples were derived from children.They were instructed to analyze the samples according to their normal routine procedures either within three days if stored at +2 to +8 °C, or within 16 days if stored in a freezer.
Participants analyzed the external quality controls either by using commercially available CLIAs (n=17) or RIA (with in-house pre-extraction for estradiol, n=1).In addition, clinical laboratories that had a validated in-house LC-MS/MS-assay were asked to analyze the samples in both their CLIA and LC-MS/MS-assay (n=3).
After determining estradiol and testosterone concentrations, participants registered the data with Equalis, where it was compiled and the analytical performance was reported back to the participants.
For estradiol and testosterone determinations most participants used Modular E & cobas e601/e602/e801 from Roche Diagnostics.Further information on the instruments used in the comparison study is depicted in Table 1A and 1B.
The method validation data obtained for the three different LC-MS/ MS instruments provided by the laboratories are summarized in Table 2. Further detailed information for laboratory 1 has been published elsewhere [14].

Data analysis
For evaluation purposes, the data in this study was primarily analyzed in alternative/additional approaches other than the EQA schemes that use ISO 13528:2015 (statistical methods for use in proficiency testing by interlaboratory comparison).Three approaches to immunoassay performance evaluation were applied.Absolute bias in relation to LC-MS/ MS was calculated and graphically presented in a Bland-Altman plot.Relative bias was calculated as relative bias (%) = (immunoassay concentration -LC-MS/MS concentration) × 100/LC-MS/MS concentration.Acceptable imprecision between methods was defined as 20 %, in accordance with criteria used in international EQA schemes for these analyses.Data results within each method group were presented as mean, median, standard deviation (SD) and coefficient of variation (CV).The CLIAs Ortho Vitros, Roche Cobas and Siemens Advia Centaur registered some of their results as <reporting limit.For evaluation purposes, the results of those laboratories that reported results <reporting limit were assigned the values of reporting limit −1.0 pmol/L for estradiol and reporting limit −0.01 nmol/L for testosterone.In addition, results were calculated when outliers and results below the reporting limit were omitted.

Results
Nineteen out of twenty-one invited clinical laboratories in Sweden participated in this explorative external control round focusing on sera from children, including two sera from children without signs of puberty and three sera from children with early signs of pubertal development.Three laboratories reported results from LC-MS/MS-based methods for both estradiol and testosterone.Of these three laboratories, all also reported estradiol results with CLIA, while only one reported testosterone.The number of users for each instrument and their respective reporting limits are summarized in Table 3.

Estradiol
The results for estradiol quantified by LC-MS/MS are shown in Figure 1A and Table 4.The LC-MS/MS assays showed a high   degree of conformity at the following measured levels (mean ± SD); 4.9 ± 1.2 pmol/L, CV 24.2 %; 8.5 ± 1.3 pmol/L, CV 15.3 %; 9.4 ± 1.0 pmol/L, CV 10.7 %; 16.6 ± 0.6 pmol/L, CV 3.3 %; 33.9 ± 1.6 pmol/L, CV 4.7 % for samples E, A, B, D and C, respectively.Thus, in the analytical range of 9-34 pmol/L, the concentrations determined with the three different assays deviated no more than ±11 %.However, for the two sera with the lowest concentration, the relative bias was greater and represented concentrations close to and below the lower limit of quantitation (LLOQ) in the LC-MS/MS assays.Only one of the LC-MS/MS assays were presented with pediatric reference intervals.The extraction-RIA [15] showed a high level of conformity with the LC-MS/MS-based methods which deviated −9% to +12 % in the concentration range 17-35 pmol/L (samples D and C) and ±33 % at 8-9 pmol/L (samples A and B) (Figure 2B).However, for sample E, the relative bias was greater and represented concentration close to the LLOQ.
The results for estradiol determined by the CLIAs are shown in Figure 2 and Table 4.In general, the CLIAs overestimated the estradiol concentrations as the determined levels were (mean ± SD); 25.3 ± 18.1 pmol/L, CV 71.6 %; 32.6 ± 25.3  2A).The other four CLIAs deviated between −29 % and +196 % at 17 pmol/L and between +15 % and +68 % at 34 pmol/L (Figure 2B).Most laboratories used the Cobas Roche instrument with variable results . CLIA results below reported limit and outliers omitted . CLIA results below reported limit and outliers omitted The results of those laboratories that reported results<reporting limit were assigned the values of reporting limit-.pmol/L for estradiol and reporting limit-.nmol/L for testosterone.CLIA, chemiluminescent immunoassay; EQA, external quality assessment; LC-MS/MS, liquid chromatography-tandem mass spectrometry; RIA, radio-immunoassay.
(Figure 2C and D).At 17 pmol/L and below, all results were higher compared to LC-MS/MS, but at 34 pmol/L, both underand overestimating results were seen, ranging between −49 % and +95 %.

Testosterone
The results for testosterone quantified by LC-MS/MS are shown in Figure 1B.Like estradiol the LC-MS/MS assays showed a high level of agreement.The determined concentrations were (mean ± SD): 0.06 ± 0.00 nmol/L, CV 5.The results for testosterone determined by immunebased assays are shown in Figure 3 and Table 4.The evaluation revealed both under-and over-estimations compared to LC-MS/MS, and a larger discrepancy between method results (mean ± SD); 0.12 ± 0.11 nmol/L, CV 95.8 %; 0.11 ± 0.05 nmol/L, CV 46.1 %; 0.46 ± 0.12 nmol/L, CV 26.9 %; 0.80 ± 0.28 nmol/L, CV 34.7 %; and 0.85 ± 0.23 nmol/L, CV 27.3 %; for samples E, A, B, D and C, respectively.In prepubertal samples (A and E), the immune-based assays both under-and overestimated the testosterone concentration with −52 % to +158 % relative bias at 0.19 nmol/L.At 0.62 nmol/L all methods gave 2-34 % lower results compared to LC-MS/MS, except the CT RIA as overestimated by +18 % (Figure 3B).In samples from pubertal children containing 0.99 nmol/L and 1.00 nmol/L testosterone, respectively, the relative bias ranged between −32 % and +46 %.A distinctive trend was that methods with over-estimating results at low testosterone concentrations were underestimating at higher concentrations and vice versa (Figure 3A and B).The most striking result was the underestimation throughout of the Cobas Roche analyzer, with up to −72 % deviation in samples from children during puberty (0.6-1.0 nmol/L; samples B-D) (Figure 3D).

Discussion
Since the introduction of EQA schemes in clinical laboratory medicine, they have become versatile tools for the detection of analytical flaws and for following the performance of diagnostic assays.For maximum usefulness, EQA schemes, need to include samples that represent the diagnostic challenges for which the assay is intended.The schemes also serve as tools to increase the conformity of assays, especially in terms of the accuracy of levels determined by different methods, facilitating the clinical interpretation and exchange of results between different sites.Since samples in the EQA schemes are generally derived from an adult population, the current study aimed at investigating the analytical aspects of the low concentrations found in pediatric serum samples.The availability to accurate and sensitive estradiol assays is crucial for a pediatric endocrinologist in order to distinguish between prepubertal and early pubertal levels, and in monitoring the treatment of pubertal disorders, as  One result from Ortho Vitros and one from Siemens Avida Centaur, were designated as outliers (reported values <0.49nmol/L and <0.24 nmol/L at 0.06 nmol/L) and therefore omitted in the graphic presentation, panel B. Lower panel: detailed plot of analytical bias (C) and relative bias (D) for the eleven Cobas Roche e601/e801 analyzer series and e411 analyzer, estimated using the mean value of three LC-MS/MS.Each connected line represents one laboratory's testosterone results.One laboratory result was designated as an outlier (217 % relative bias at 0.06 nmol/L) and therefore omitted in the graphic presentation.
well as monitoring the impact of aromatase-inhibitor treatment on different disorders in children [16][17][18].Therefore, established and reliable sex-and puberty-specific reference intervals for sex steroids are required, which is a great challenge for each analytical laboratory.Finding healthy volunteers from different age and pubertal stage groups is quite difficult.Just dividing into age groups is not very helpful as individuals mature more or less independently of chronological age.Hence, due to the necessity of pubertal assessments and methodological challenges, pediatric reference intervals are poorly defined, which complicates the interpretation of test results [8,9,19].In this survey, only one laboratory presented puberty-specific reference intervals for the CLIA used, and another one for RIA and one for LC-MS/MS.Still, adopting and implementing reference intervals supplied by the manufacturer, published data or determined at another laboratory, the test cannot be done without caution.In this study we show that even when using the same CLIA, test results may differ significantly between laboratories.
To date there are only four publications on estradiol concentrations in girls during pubertal development performed by MS-based methods [20][21][22] or extraction-RIA [23].In these studies, plasma concentrations of around 10-20 pmol/L are associated with the start of pubertal development in girls.In addition, there are few published scientific articles that make an effort to establish puberty-or age-specific reference intervals with CLIA [24][25][26][27][28][29].Unfortunately, due to lack of specificity and sensitivity in the CLIA, the studies result in intervals with wide overlaps between ages or throughout pubertal stages, and estradiol concentrations in prepubertal children are 10-20 times higher, compared with results from MS-based methods.Hence, these references are not useful for the diagnosis of pubertal disorders, since based on these intervals it is not possible to distinguish between prepubertal and pubertal stages.Since high estradiol concentrations in young girls are associated with precocious puberty, spontaneous onset of puberty and malignancy, overestimation and false results may lead to unnecessary anxiety for the parents and the child, unnecessary investigations of the child, or inadequate therapeutic drug decisions or monitoring.On the other hand, elevated or pathological test results may also be masked by falsely high reference intervals.
In this specially composed round of the EQA scheme, using sample materials from children at prepubertal and pubertal levels, the five CLIAs showed a large spread in estradiol results as well as a clear overestimation.None of these estradiol methods could distinguish between samples from a prepubertal child or children in early puberty.
However, there was good agreement between the three LC-MS/MS methods, and the extraction-RIA showed a high degree of conformity with the LC-MS/MS methods, in line with previous recommendations [8].
In recent years awareness has increased in Sweden that extraction RIA or MS-based assays are required to achieve adequate accuracy, sensitivity and specificity for pediatric applications.This is a result of persistent collaboration and networking between laboratory scientists with high expertise, clinical chemists and pediatricians.Several users in Sweden forward child samples to a laboratory with high-sensitive methodology, others determine the estradiol concentration with an immune-based method and re-run samples with LC-MS/MS if the result is below LLOQ, while other laboratories leave the responsibility to the pediatric endocrinologist to choose the adequate method.The latter method requires that the pediatrician is familiar with the principles of the methods used which is impossible or at least very difficult for the uninitiated.The clients also get deceptive information from assay vendors about their intended use for the CLIA analysis.Based on this study's results, we argue to narrow the intended purpose in manufacturers' inserts and on laboratories' websites to analysis of sex steroids in adults, not children.
Over a decade ago, the largest problem with commercially available testosterone CLIAs was overestimation [1].Ever since, life science companies have focused on standardization and obtaining higher sensitivity in testosterone assays, and to some extent, they have succeeded.However, this study reveals that significant analytical bias in respect to the LC-MS/MS assay persist.The immune-based methods showed a large spread in results where both under-and over-estimation occurred.The most frequent method used, Cobas Roche, differed considerably in results between the 11 participating laboratories, with consistent under-estimation of testosterone results in samples below 1.0 nmol/L, as well as high inter-laboratory variability.
There was good agreement between the three inhouse developed and validated LC-MS/MS methods.When comparing testosterone LC-MS/MS methods, the variation was roughly ±10 % throughout the concentration range.This is a typical pattern if one (or more) of the assays is calibrated a bit differently.Thus, we believe all testosterone assays have good precision, but they differ a bit more in accuracy compared to the estradiol assays.This is a good illustration of why EQA schemes for estradiol/testosterone LC-MS/MS assays are important.Such programmes would probably help laboratories detect imperfect calibrated assays, which may be difficult to detect with an EQA scheme intended for higher concentrations.
Reliable testosterone determinations are crucial for a pediatric endocrinologist to distinguish between prepubertal and pubertal plasma levels in boys, diagnose hyperandrogenism in girls, and to distinguish between healthy and pathological states [30][31][32].Testosterone concentrations above 0.5 nmol/L are associated with the onset of male puberty [12,20,33,34].In this study, the absolute bias of CLIAs ranged from −0.3 to + 0.1 nmol/L around 0.6 nmol/L, and −0.4 and + 0.5 nmol/L around 1 nmol/L which makes it impossible to separate prepubertal from pubertal levels.This is reflected by the published pediatric reference intervals analysed by CLIA, in which intervals are presented with wide and sometimes complete overlaps between ages or pubertal stages [25-27, 29, 35].False results or results that are difficult to interpret may lead to unnecessary investigations of the child or inadequate treatment.In comparison the three LC-MS/MS methods had an absolute bias of −0.09 to +0.07 at 0.6 nmol/L, and −0.11 to +0.11 at 1 nmol/L, which makes them considerably more useful and reliable.
Altogether, false test results may not only affect pediatric care.The present results are also applicable for postmenopausal women.In this respect, commercially available CLIAs may mislead investigation and diagnosis of peri-and postmenopausal symptoms, which include hormone levels in the studied range [33,36].

Conclusions
This study enabled the following general conclusions to be drawn from pediatric samples.

Figure 2 :
Figure2: Graphical representation of differences in estradiol quantitation, in order of evaluated concentration, in five child samples labelled A-E, between various immunoassays and liquid chromatography tandem mass spectrometry (LC-MS/MS) methods.Upper panel: Six immunoassays' analytical biases (A) and relative biases (B) estimated using the mean value of three LC-MS/MS.Each symbol represents the estradiol mean value of each method's results with standard deviations as error bars.■ = Abbot Alinity I (n=1), • = Beckman UniCel DxI 800 (n=2), ; = Cobas Roche e601/e801 analyzer series and e411 analyzer (n=10), : = Ortho Vitros 3,600 (n=1), A = Siemens Avida Centaur (n=2), = = Cisbio CT RIA + in-house extraction (n=1).Two results from Abbot Alinity, were designated as outliers (1,591 % relative bias at 4.9 and 1,303 % at 8.5 pmol/L) and therefore omitted in the graphic presentation.Lower panel: Detailed plot of analytical bias (C) and relative bias (D) for ten Cobas Roche e601/e801 analyzer series and e411 analyzer, estimated using the mean value of three LC-MS/MS.Each connected line represents one laboratory's estradiol results.

Figure 3 :
Figure3: Graphical representation, in evaluated concentration order, of differences between testosterone liquid chromatography coupled to the tandem mass spectrometry (LC-MS/MS) methods and the various immunoassays for five serum samples labeled A-E.Upper panel: Six immunoassay's analytical bias (A) and relative bias (B) estimated using the mean value of three LC-MS/MS.Each symbol depicts the testosterone mean value of each method's results with standard deviation error bars.■ = Abbot Alinity I (n=1), • = Beckman UniCel DxI 800 (n=2), ; = Cobas Roche e601/e801 analyzer series and e411 analyzer (n=11), : = Ortho Vitros 3,600 (n=1), A = Siemens Avida Centaur (n=1), = = Cisbio CT RIA (n=1).One result from Ortho Vitros and one from Siemens Avida Centaur, were designated as outliers (reported values <0.49nmol/L and <0.24 nmol/L at 0.06 nmol/L) and therefore omitted in the graphic presentation, panel B. Lower panel: detailed plot of analytical bias (C) and relative bias (D) for the eleven Cobas Roche e601/e801 analyzer series and e411 analyzer, estimated using the mean value of three LC-MS/MS.Each connected line represents one laboratory's testosterone results.One laboratory result was designated as an outlier (217 % relative bias at 0.06 nmol/L) and therefore omitted in the graphic presentation.

( 1 )
Commercially available estradiol immunoassays are not suitable for diagnosis in children.(2) Commercially available testosterone immunoassays have an uncertainty of reproducibility in the low range that each individual user should consider.(3) For the safe diagnosis and determination of sex steroids in children, analysis with MS-based methods is recommended.(4) Every pediatric endocrinologist or laboratory scientist should be familiar with the principles and pitfalls of the sex steroid methods they use.(5) To increase conformity of methods used for diagnostics in children, participation in an EQA scheme is highly recommended.(6) We recommend that manufacturers of sex steroid CLIA tests and laboratories that analyse them advertise these tests as recommended for adults, not children.(7) Since reliable sex steroid quantitation in children requires extremely high expertise, collaboration between biomedical laboratory scientists and clinicians is highly advantageous.

Table A :
Summary of performance characteristics for the routine immunoassays used for estradiol analysis.Pediatric reference intervals were downloaded from the attended laboratories' website, if present; otherwise, they were taken from published data.Puberty-specific intervals were given priority over age-specific intervals.For Siemens Advia Centaur there were no published data available.

Table B :
Summary of performance characteristics for the routine immunoassays used for testosterone analysis.Pediatric reference intervals were downloaded from the attended laboratories' website, if present; otherwise, they were taken from published data.Puberty-specific intervals were given priority over age-specific intervals.For Siemens Advia Centaur there were no published data available.Functional sensitivity defined as the lowest analyte concentration that can be reproducibly measured with an intermediate precision CV less than or equal to  %. b Consistent puberty-specific reference intervals in relation to Tanner stages were reported by two laboratories.Another three laboratories reported reference intervals divided into age groups, data not shown.c According to in-house validation.

Table  :
Summary of estradiol and testosterone liquid chromatography-mass spectrometry (LC-MS/MS) validation data provided by the users.
LLOD, lower limit of detection; LLOQ, lower limit of quantification; TS, Tanner stage.Additional information on the mass spectrometric assays: Extraction methods; Laboratory  used SLE plates (Biotage) and methyl tert-butyl ether in heptane, Laboratory  used SLE plates (Biotage) and ethyl acetate in heptane, Laboratory  used liquid-liquid extraction using tert-butyl ethyl ether.Analytical columns used; Laboratory  used XBridge C . µm, . ×  mm followed by Kinetex Biphenyl . µ, . ×  mm, Laboratory  used XBridge C . µm, . ×  mm followed by BEH Phenyl  × . mm, . µm.Mobil phases; All Laboratories used deionized water/methanol with ammonium fluoride as an additive.Internal standards; All Laboratories used C labelled internal standards except Laboratory  which used D Estradiol as internal standard for estradiol.

Table  :
Number of users and their reporting limits for the immunoassays used.(participants in EQA round : [article number: KSP ], Equalis AB, Uppsala, Sweden).

Table  :
Serum estradiol and testosterone results from participants in EQA round : (article number: KSP ), Equalis AB, Uppsala, Sweden.Samples A and E were derived from prepubertal children and samples B-D from children in early pubertal development.