The versatility of external quality assessment for the surveillance of laboratory and in vitro diagnostic performance: SARS-CoV-2 viral genome detection in Austria

Objectives: External quality assessment (EQA) schemes provide information on individual and general analytical performance of participating laboratories and test systems. The aim of this study was to investigate the use and performance of SARS-CoV-2 virus genome detection systems in Austrian laboratories and their preparedness to face challenges associated with the pandemic. Methods: Seven samples were selected to evaluate performance and estimate variability of reported results. Notably, a dilution series was included in the panel as a measure of reproducibility and sensitivity. Several performance criteria were evaluated for individual participants as well as in the cohort of all participants. Results: A total of 109 laboratories participated and used 134 platforms, including 67 different combinations of extraction and PCR platforms and corresponding reagents. There were no false positives and 10 (1.2%) false negative results, including nine in the weakly positive sample (Ct ∼35.9, ∼640 copies/mL). Twenty (22%) laboratories reported results of mutation detection. Twenty-five (19%) test systems included amplification of human RNA as evidence of proper sampling. The overall linearity of Ct values from individual test systems for the dilution series was good, but inter-assay variability was high. Both operatorrelated and systematic failures appear to have caused incorrect results. Conclusions: Beyond providing certification for participating laboratories, EQA provides the opportunity for participants to evaluate their performance against others so that they may improve operating procedures and test systems. Well-selected EQA samples offer additional inferences to be made about assay sensitivity and reproducibility, which have practical applications.

(IVD) to be monitored by external quality assessment (EQA) schemes [1]. While the contribution of EQA schemes to postmarket surveillance is not yet fully developed and standardized, EQA for medical laboratories has a tradition going back decades [2]. In such schemes, EQA providers distribute samples with known but undisclosed properties or content. Participating laboratories analyze these samples and report results to the EQA provider. Reported results are compared with assigned targets and scored as either matching (correct) or non-matching (false). Participants learn in individual reports how comparable their results are with those of their peers and gain additional educational effects from the general report [3]. Therefore, their motivation to participate in EQA is to have results assessed by an independent third party and to compare them with those obtained with both identical and alternative analytical systems. However, mere assessment of performance does not fully exploit the real potential of EQA schemes. The underutilized potential of such schemes lies in the aggregation of results, which provides an overview of the analytical performance and competencies of laboratories and test systems. This data is therefore useful for both post-market surveillance and epidemiological purposes.
Molecular diagnostics for SARS-CoV-2 infection from nasopharyngeal, nasal or oro-pharyngeal swabs were available in early 2020 and were quickly adopted by laboratories around the world. SARS-CoV-2-specific real time reverse transcription (RTq-) PCR is currently the gold standard for confirmation or exclusion of infection, monitoring COVID-19 therapy and infectious and epidemiologic followup testing [4,5]. According to manufacturers' specifications, the reliability of such test results should be high. By and large, previously reported EQA results have confirmed that the various participating systems perform well for SARS-CoV-2 viral genome detection [6][7][8][9][10][11]. However, we believe an EQA scheme offers an opportunity to further probe the specific skills and competencies of participants by including samples with specific properties; and to our knowledge this approach has not yet been reported.
During a pandemic, the preparedness of laboratories to face arising challenges is of great interest. For example, it has been reported that several mutations in the SARS-CoV-2 genome are associated with reduced sensitivity of routine RT-qPCR systems [12][13][14]. To reduce the risk of positive samples escaping detection, the International Federation for Clinical Chemistry and Laboratory Medicine (IFCC) recommended the use of at least two SARS-CoV-2-specific gene targets in test systems [15]. An EQA can check to what extent this recommendation has been implemented. Another interesting point that EQA schemes can clarify is the preparedness of laboratories to interpret and report divergent or unexpected results (e.g., target gene "dropout"). This may include, for example, further examinations of putative SARS-CoV-2-positive samples for possible mutations or including amplification of human RNA as an internal control of sample quality [16].
The Austrian Association for Quality Assurance and Standardization (ÖQUASTA) is the national EQA provider in Austria and serves most of the relevant laboratories in the country. The EQA scheme "SARS-CoV-2 virus genome detection" was established in cooperation with the National Reference Laboratory for Respiratory Viruses at the Center for Virology of the Medical University Vienna, Austria, in early 2020, and two rounds have since been performed. For the third round (the first round in 2021), samples were selected so that the measurement results provided information in addition to detection of viral nucleic acid. Herein, we evaluated the current nationwide performance in SARS-CoV-2 genome detection, and we assess specific properties of laboratories and test systems using a specifically designed panel of sample materials.

Materials and methods
Organization of the EQA scheme The "SARS-CoV-2 virus genome detection" scheme of ÖQUASTA and the Center for Virology of the Medical University Vienna corresponds to the general EQA scheme as summarized by the European Organization for External Quality Assurance Providers in Laboratory Medicine (EQALM) [17].
For quality control, all samples were stored at −80°C and one panel was tested before shipment as follows: after thawing, samples were kept at ambient temperature for 2 days (anticipated maximum shipping time) before further storage at 4°C for 4 days (maximal expected storage time before testing). Thereafter, samples were measured with an in-house SARS-CoV-2 molecular detection assay, and quantified by digital droplet PCR, as described previously [20]. Briefly, nucleic acid (NA) was extracted from a 200 μL aliquot of the original sample using the NucliSENS easyMAG platform (bioMérieux, France). The elution volume was 50 μL. For RT-qPCR, each 25 μL reaction mixture contained 12.5 μL of 2× reaction buffer, 0.4 μL of a 50 mM magnesium sulfate solution, 1 μL of SuperScript III RT/Platinum Taq Mix (SuperScript III One-Step RT-PCR System, Invitrogen, Germany), 0.5 μL each of LightMix ® Modular SARS-CoV (COVID-19) E-gene and LightMix ® Modular EAV RNA Extraction Control (TIB Molbiol, Berlin, Germany) and 10 μL of eluted NA as the template. Thermal cycling was performed using a Light Cycler 480 II (Roche, Vienna, Austria) under the following conditions: reverse transcription at 55°C for 20 min; denaturation at 94°C for 3 min; 45 amplification cycles of 94°C for 15 s and 58°C for 30 s (acquisition steps); a final incubation at 40°C for 30 s. Analysis of the five positive samples was repeated three times and used as reference values (Table 2). Samples S1, S2, and S4 tested positive for beta actin as a marker for the presence of human genetic material.
Differences of C t values in test systems employing more than one target gene

Sample dispatch and instructions for participants
Samples were dispatched to participant laboratories by overnight delivery at ambient temperature conditions. The instructions provided with the samples recommended that the material either be tested immediately or, alternatively, be stored at 4°C for a maximum of 4 days after arrival until analysis and to conduct tests in the same manner as for routine samples. Individual laboratories that wanted to participate

Reported results
Total Suspected or proven mutation Mutation detected  with more than one protocol had to register each test system separately and received independently shipped sample sets for each of them.

Data collection and assessment
Results of the qualitative interpretation (positive, negative, invalid) and test systems used were to be reported to the EQA provider via a web portal, email, fax or on paper in regular mail within 12 days.
Participants were asked to report target gene(s) and C t values for both virus-specific genes and, if applicable, for human RNA used by their test systems as a quality control, along with qualitative results. A sample was classified as "Invalid" if the results for SARS-CoV-2-specific gene targets and human RNA were negative. Only qualitative results were officially assessed for certification in this round. Assignment of qualitative targets for samples followed variant (c) "from knowledge of the origin of preparation of the proficiency item(s)" in chapter 11.3.1 of the applicable standard ISO 13528:2015, "Statistical methods for use in proficiency testing by interlaboratory comparison" [21]. Reported results were collected and evaluated in customized QuoStat software (QuoData, Dresden, Germany). Individual results were judged as either correct (=matching the target) or incorrect (=not matching the target) and in addition to individual reports, data for all test systems were made available in a general report. Statistical analysis of reported C t values included reporting basic summary statistics (mean, ranges, standard deviations and coefficients of variation) stratified by sample and/or gene target. Mann-Whitney U tests were used to compare C t values between targets for a given sample, using Type I Error rate of alpha = 0.05.

Information on participants
Results from 109 laboratories with a total of 134 individual test systems were available for evaluation. The test systems included combinations of nucleic acid extraction platforms and reagents and RT-qPCR platforms and reagents. In total, 67 different combinations of extraction and detection systems were used (Supplementary Material, Figure S1). Sample volume used varied from 2.5 to 900 μL and maximum cycle count varied between 38 and 50, as reported by participants and not verified by the authors. Ninety-one laboratories participated with one test system/protocol, and 18 laboratories used more than one test system/protocol. In particular, 14 laboratories used two, two laboratories used three, and two individual laboratories used either four or five distinct test systems, respectively. Gene-specific C t values were reported for a total of 122 registered test systems.
IFCC's recommendation to use at least two different target genes in SARS-CoV-2 detection systems was met by 79 (65%) participant test systems (two target genes were reported for 68 (56%), and three targets were reported for 11 (9%) test systems). A total of 25 (19%) participant systems included testing for human RNA as an internal sampling control. Human RNase P was reported to be used for 15 systems, beta actin for six, beta globin for three, and VPS29 for one system. Furthermore, 20 (22%) laboratories reported that they routinely test SARS-CoV-2-positive samples for mutations ( Table 2).

Analytical performance
A total of 124 out of 134 test systems (93%) reported correct results for all five positive and both negative samples ( Table 2). A total of 1,006 individual C t values were reported from 122 test systems targeting the following viral gene targets: E (250 C t values); N (337); S (68); RdRP (112); ORF1ab (105); or "other" undisclosed gene targets (128), which may have included a single result obtained from two or more gene targets. Summary statistics of C t values were calculated for individual targets and "other" target genes (Table 2 and Figure 1).
We calculated the mean differences of C t values in three test systems that target more than one gene ( Figure 2). For GeneXpert/Xpert Xpress SARS-COV-2, mean difference between C t for the Nand E-gene was 2.6; for Roche Cobas 6800,8800/cobas SARS-CoV-2 it was 0.1 for the Eand the ORF1ab-gene; for Simplexa Liaison MDS Realtime PCR/ Simplexa COVID-19 Direct Reaction Mix it was 0.4 for the ORF1aband the S-gene.
The three samples from the dilution series were used to assess the linearity of C t values (Figure 3). While we did not perform statistical analysis of these data, we note that approximately linear results were obtained from many test systems, but some were obviously non-linear (see Figure 3).
All test systems that included an internal control correctly reported the presence/absence of human RNA in all samples except for one participant reporting a false negative result. The sensitivity for human RNA (inferred from C t values) varied between samples (coefficient of variation of C t values for a given sample <16%). We found no differences in C t values between targets for the same sample (Mann-Whitney U tests, p>0.05) ( Table 2 and Figure 1).
The sample containing B.1.1.7 variant virus (S4) was reported positive by all participants, but only 20 (22%) reported evidence of a mutation. Of these, 19 participants specifically reported S4 as either positive for the substitution in the spike protein at site N501Y and/or the deletions at amino acid positions H69/V70 or identified the sample as a lineage B.1.1.7 virus. One participant indicated they suspected variant lineage B.1.351 or P1 in Samples 3, 4, and 7. We noted that the S-gene target PCR of the test system of two participants (2× TaqPath) did not detect the lineage B.1.1.7.

Discussion
Evaluation of results of this round revealed good analytical performance of participants. There were no false positives and only a few false negatives, which were mostly in the sample with the highest mean C t value ("weakly positive"). As this was the third EQA round, we note the improvement in the false negative ratio over the two previous rounds: in the first and the current third round the weakly positive samples had roughly similar C t values (38 and 36, respectively) and yet the rate of false negatives fell from 40.3 to 6.7% (per 67 and 134 reported tests, respectively) [22].  Boxplots of five test samples (S2 and S4 are patient-derived swabs; S6, S3, and S7 are a 1:1, 1:10, 1:100 dilution series, respectively, of virus culture supernatant) in the left column show 50% inter-quartile range and median within the box for various test systems (x-axis) targeting specific viral gene targets (subheading in italics). The plot titles indicate the sample number, the reference C t value, and the template copy number as determined by digital droplet PCR. "Other" = results obtained from two or more gene targets in one channel and therefore not shown separately. The histogram in right column displays the distribution of reported C t values (y-axis is the same as in left column boxplots) for each target, indicating the mode and total sample number.
In some cases, we could deduce the possible cause for the false negative results. One participant reported false "negative" for S3 and S7 using a test system that requires 400 µL of primary sample, but the participant reported that only 30 µL was used. Each C t value reported by this participant was 3-4 cycles higher than other participants' results, and therefore it is likely that low template led to a false negative result for two samples. Two participants reported the interpretation "negative" for S7, and yet reported C t values between 36.8 and 39.5. We assume these were interpreted "negative" according to an internal laboratory cut-off value. For another five, no cause for the false negative result could be identified based on the data provided.
IFCC's recommendation to use at least two different target genes in SARS-CoV-2 detection systems was met by only about two thirds of participant test systems. We were able to analyze three test systems that utilize two gene targets for differences in C t values between the targets. These differences may be useful for laboratory quality control, for example, if the difference deviates beyond defined limits there may be indirect evidence of possible mutations in the target sequences. We note that one test system had a uniformly large difference in C t values between the targets for the same sample (mean 2.6), the differences in targets between the other two test systems were similar (Figure 2). This may have implications for the use of C t values as a qualitative assessment of patient status.
Reporting of C t values of positive results is still requested by medical authorities and may offer additional qualitative assessment of a virus-positive result [23]. However, our results clearly demonstrate that caution should be taken when comparing and interpreting C t values. First, we observed the well-documented property of RT-qPCR, that primer-probe sets specific for different genomic targets produce different C t values for the same sample. This is likely related to reaction efficiency and/or the relative abundance RNA template for the different viral genes. Second, the range of C t values reported for the same target, in the same sample, by different participants was on average approximately 10 C t units. However, the variability of C t values was less than in the previous round of this EQA scheme [20]. Finally, in the previous EQA scheme, the data seemed to suggest that laboratories were at least consistently over-or under-estimating the mean C t value for a given sample. We therefore tested this hypothesis with a  Figure 3: Linearity of C t values in S6, S3, and S7 of the most frequently used gene targets for a dilution series derived from a virus isolate grown in cell culture.
Lines connect the C t values reported by each participant from a dilution series of nCoV-19/Austria/CeMM0360/2020 [EPI_ISL_438123] (S6 undiluted, S3 = 10-fold dilution, S7 = 100-fold dilution) from various test systems for five viral gene targets (E, N, S, RdRP, and ORF1b) or "other" for systems which calculate C t based on two gene targets.
dilution series of a virus culture supernatant. The results revealed that many test systems reported approximately linear changes in C t according to template concentration, but some were notably non-linear ( Figure 3). Possible reasons for the high variability revealed by individual C t values in general, and departures from expected efficiency in the dilution series, include differences in assay performance and/or in EQA materials characteristics. Commutability of an EQA material describes its ability to properly mimic the behavior of real patient samples that have not been modified [24,25]. Although a formal commutability analysis could not be performed, the fact that similar results were obtained in both swab and virus culture samples suggests that assays are likely not greatly impacted by matrix effects, at least for the materials involved in this study. Although the amplification of human RNA as an internal control (e.g., as evidence of proper sampling via swab) was reported by some labs, it does not appear to be very widespread. However, if laboratories do not supervise correct sampling, including such an internal control may be helpful to identify invalid samples and prevent the reporting of false negative results. As mentioned, we noted variability in the volume of sample used for extraction, and believe this, combined with assay efficiency related to PCR reagents (primers, polymerases, buffer systems), may have had the greatest impact on the variability of individual Ct values.
In conclusion, this round of EQA was beneficial in simultaneously serving several interested parties. Participant laboratories have received feedback on the quality of their individual results, which had improved over prior rounds. Supervisory authorities can read information about the general analytical performance and laboratory preparedness; and here we note that, from a technical standpoint, considerations should be made concerning the use of C t values to assess patient status. Finally, we demonstrated that EQA may be useful to IVD manufacturers (as well as diagnostic laboratories) to compare the performance of products in routine use.

Acknowledgments:
We gratefully acknowledge all laboratories that participated in this study and made special efforts to report more data than was required to participate in this EQA round. Research funding: None declared. Author contributions: CB: Conceptualized, conducted and analyzed this EQA study, wrote and edited the manuscript draft. JVC: Conducted and analyzed this EQA study, visualized data, critically reviewed and edited the manuscript. JJ: Analyzed data and provided technical EQA support, critically reviewed manuscript. PC: Provided scientific advice, critically reviewed and edited the manuscript. EPS: Provided scientific advice, critically reviewed the manuscript. MM: Performed and analyzed ddPCR of samples, wrote the respective part of the manuscript. HP: Performed and analyzed ddPCR of samples, wrote the respective part of the manuscript. AB: Performed and analyzed sequencing of samples, wrote the respective part of the manuscript. AL: Performed and analyzed sequencing of samples, wrote the respective part of the manuscript. AMP: Performed and analyzed sequencing of samples, wrote the respective part of the manuscript. LE: Performed and analyzed sequencing of samples, wrote the respective part of the manuscript. WH: Performed statistical evaluation and visualized data, critically reviewed manuscript. BB: Critically reviewed and edited the manuscript. VD: Provided scientific advice regarding commutability, wrote the respective part of the manuscript. MMM: Provided scientific advice to the overall study, reviewed and edited the manuscript. AG: Provided scientific advice to the overall study, reviewed and edited the manuscript. SWA: Provided sample material, conceptualized, conducted and supervised this EQA study, provided scientific advice to the study, reviewed and edited the manuscript. IG: Conceptualized, conducted and analyzed this EQA study, visualized data, wrote and edited the manuscript draft. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.