Variability of cycle threshold values in an external quality assessment scheme for detection of the SARS-CoV-2 virus genome by RT-PCR

Objectives: The qualitative results of SARS-CoV-2 specific real-time reverse transcription (RT) PCR are used for initial diagnosis and follow-up of Covid-19 patients and asymptomatic virus carriers. However, clinical decision-making and health management policies often are based additionally on cycle threshold (Ct) values (i.e., quantitative results) to guide patient care, segregation and discharge management of individuals testing positive. Therefore, an analysis of inter-protocol variability is needed to assess the comparability of the quantitative results. Methods: Ct values reported in a SARS-CoV-2 virus genome detection external quality assessment challenge were analyzed. Three positive and two negative samples were distributed to participating test laboratories. Qualitative results (positive/negative) and quantitative results (Ct values) were assessed. Results: A total of 66 laboratories participated, contributing results from 101 distinct test systems and reporting Ct values for a total of 92 different protocols. In all three positive samples, the means of the Ct values for the E-, N-, S-, RdRp-, and ORF1ab-genes varied by less than two cycles. However, 7.7% of reported results deviated by more than ±4.0 (maximum 18.0) cycles from the respective individual means. These larger deviations appear to be systematic errors. Conclusions: In an attempt to use PCR diagnostics beyond the identification of infected individuals, laboratories are frequently requested to report Ct values along with a qualitative result. This study highlights the limitations of interpreting Ct values from the various SARS-CoV genome detection protocolsandsuggests that standardization isnecessary in the reporting of Ct values with respect to the target gene.


Introduction
Virus genome-specific real-time reverse transcription PCR (RT-qPCR) is the gold standard for laboratory diagnosis of infections with SARS-CoV-2. Apart from a binary classification of samples into virus-positive or virus-negative, some have advocated for providing a (semi)quantitative parameter for the estimation of viral loads [1]. The discussion about the usefulness of cycle threshold (C t ) values as a basis for diagnostic, prognostic and therapeutic decisions in the management of SARS-CoV-2 infected patients is ongoing [2,3]. Recently, the guidelines for the discharge management of patients were based not only on time-lines of infection (a minimum of 10 days since the development of first symptoms and an asymptomatic period of at least 48 h), but also on momentary C t values (time consideration and/or a C t value >30) [4,5]. These potentially significant decisionsboth from a medical and an epidemiological point of viewrequire a high degree of reliability from the laboratory results on which they are based. Qualitative results have for the most part proven reliable and only some of the test protocols fail to detect SARS-CoV2 at lower virus loads (higher C t values) [6,7]. However, data on comparability are presently not available. This prompted us to compare C t values from different test systems. We decided to include a clearly positive sample, a weakly positive sample (i.e., near the limit of detection), as well as a sample with a C t value that is close to the proposed guideline for discharge (C t =30).
Results obtained in external quality assessment (EQA) schemes are ideally suited to allow comparing the analytical performance of EQA participants and/or different protocols used by these laboratories. In order to examine comparability, we analyzed the C t values reported in the second distribution of the Austrian national SARS-CoV-2 virus genome detection EQA scheme. This EQA scheme was performed in cooperation between the Center for Virology of the Medical University of Vienna and the Austrian Association for Quality Assurance and Standardization of Medical and Diagnostic Tests (ÖQUASTA).

Materials and methods
The EQA test panel consisted of five samples (900 μL each), three positive and two negative. Positive samples were obtained by 100-fold dilution of three individual SARS-CoV-2 positive oro-nasopharyngeal swab samples with 1 mL of sterile sodium chloride solution (1.5 mL swab samples + 148.5 mL 0.9% NaCl). The negative samples consisted of the diluent (0.9% NaCl solution). In order to guarantee stability of SARS-CoV-2 genomic RNA, all samples were stored at −80°C and one panel was tested before shipment as follows: after thawing, samples were kept at ambient temperature for two days (anticipated maximum shipping time) before further storage at 4°C for four days (maximal expected storage time before testing). Thereafter, samples were quantified with an in-house SARS-CoV-2 molecular detection assay. For this, nucleic acid (NA) was extracted from a 200 μL aliquot of the original sample using the NucliSENS easyMAG platform (bioMérieux, France). The elution volume was 50 μL. For real-time RT-PCR, each 25 μL reaction mixture contained 12.5 μL of 2× reaction buffer, 0.4 μL of a 50 mM magnesium sulfate solution, 1 μL of SuperScript III RT/ Platinum Taq Mix (SuperScript III One-Step RT-PCR System, Invitrogen, Germany), 0.5 μL each of LightMix ® Modular SARS-CoV (COVID19) E-gene and LightMix ® Modular EAV RNA Extraction Control (TIB Molbiol, Berlin, Germany) and 10 μL of eluted NA as the template. Thermal cycling was performed using a Light Cycler 480 II (Roche, Vienna, Austria) under the following conditions: reverse transcription at 55°C for 20 min; denaturation at 94°C for 3 min; 45 amplification cycles of 94°C for 15 s and 58°C for 30 s (acquisition steps); a final incubation at 40°C for 30 s.
The C t values of the three positive samples were repeated 6 times and mean and 95% confidence interval were as follows: S1=29.47 (29.41-29.52); S3=24.37 (24.05-24.68); and S5=35.65 (34.71-36.59) for the E-gene of the SARS-CoV-2 genome. Number of replicates was not determined. Samples S2 and S4 (containing a sodium chloride solution only) were tested negative.
Samples were distributed to participant laboratories by overnight delivery at ambient temperature conditions. Participants were instructed to either immediately test or alternatively store the material at 4°C until analysis for a maximum of four days after arrival and to perform testing in the same way as for routine samples. Individual laboratories which intended to participate with more than one protocol received independently shipped test panels for each test system.
Results of a qualitative interpretation (positive, negative) were to be submitted to the EQA provider via a web portal, email, fax or on paper in regular mail within seven days. Individual results were scored as either correct or incorrect and in addition to individual reports, data for all test systems were made available in a general report. Participants were asked to report target gene(s) and C t values along with qualitative results. Only the latter formed a basis for the assessment of the performance of participants in this EQA scheme. The schematic sequence of an EQA round is shown in Figure 1.

Results
Sixty-six laboratories participated with a total of 101 individual test systems. These included combinations of extraction platforms and reagents and RT PCR platforms and reagents. With respect to the former, 31 different extraction platforms, 14 manual methods, 34 commercial extraction reagents, and four in-house extraction reagents were reported by the participants. Concerning PCR platforms, 28 different systems, 36 commercial PCR reagents and six in-house reagents were reported. In total, 63 different combinations of extraction and detection systems were used; these are shown in Table 1. Fifty-five laboratories participated with one test system/protocol, and 11 laboratories used more than one test system/protocol. In particular, nine laboratories used two, one lab used three, and one lab used five distinct test systems. For the negative samples S2 and S4, 199 correct negative results (98.5%) and two false positive results (1.0%) were reported; two participants did not report results for S2 and/or S4. A total of 91 out of 101 test   Range % C.I.

All test systems
RdRp   systems (90.1%) reported all three positive samples correctly as positive, 10 (10%) found S5 (C t value of E-gene 35.7) to be (false) negative. A total of 466 individual C t values were reported for 92 test systems. For the E-gene, 163 C t values were reported: 159 for the N-gene; 41 for the S-gene; 40 for the RdRp-gene; 34 for the ORF1ab-gene, and 29 for other gene targets ("nd" in Figures 2 and 3, including CDC-N1, CDC-N2, CDC-N3, ORF1a, ORF1b, 5′-UTR) that were combined into one group for statistical analysis because of a low number of results for each individual test protocol.
The variability of C t values for individual gene targets obtained by different test systems (pseudonymized as 1-92) is shown in Figure 3. There were 170/466 (36.5%) results that deviated from the respective mean value by <±2 cycles and 36 (7.7%) that deviated by >±4 cycles. The latter comprised six single outliers (deviation >±4 cycles), each obtained by one individual protocol, and the remaining 30 were obtained by nine protocols that generated between two and six outliers each. Of the 10 protocols that did not detect S5 (false negatives), four reported C t values which deviated by >+2.7 for the other two positive samples, compared to their respective means: test system no. 32 yielded +5.4 for S1 and +2.8 cycles for S3 (E-gene); no. 39 yielded for S1 +5.0 for in S3 +4.1 cycles (E-gene); no. 55 yielded for S1 +5.8 and for S3 +7.8 cycles (N-gene); no. 83 yielded for S1 +5.0 and for S3 +4.7 cycles (S-gene). Three protocols that also reported S5 as false negative (no. 41, 76, and 80) reported C t values for the other two positive samples that did not deviate by more than ±2.7 from the respective means. For three of the test systems that did not detect S5 as positive, no C t values were reported.
In general, the combined results produced rather uniform distributions about their respective means per gene target, i.e. a consistent skew greater than or less than the mean was not evident (Figure 3). Standard deviations of results for the respective gene targets are shown in Table 1. However, three systems (no. 5 for the E-gene, no. 36 for the ORF1ab-gene, and no. 88 for the E-gene) presented results deviating to both sides of (i.e., either greater than or less than) the respective mean for the same gene target, of which at least one result deviated >2 cycles. The deviation in the other direction was only very small (≤1.0).  Additionally, three test systems (no. 50 for the E-, N-and RdRp-gene, no. 82 for the E-, N-, and RdRp-gene, and no. 85 for the S-and RdRp-gene) yielded results for different gene targets that were entirely greater than or less than the respective mean.
The GeneXpert ® and the cobas ® 6800/8800 systems use primers for two different gene targets. These are the E-and N-gene in the GeneXpert ® system and the E-and ORF1ab-gene in the cobas ® 6800/8800 systems. C t values obtained by the GeneXpert ® for the E-gene were lower than for the N-gene (medians for S1: 27.8 vs. 30.2, for S2: 23.9 vs. 26.3, and for S5: 36.1 vs. 38.8). The cobas ® system yielded slightly higher results for the E-gene than for the ORF1abgene (medians for S1: 29.3 vs. 28.9, for S3: 25.0 vs. 24.7, and for S5: 36.6 vs. 36.0).

Discussion
Qualitative results of the present EQA challenge were improved as compared to the previous round [7]. More than 90% of the test systems detected all positive samples correctly, as compared to 60% in the previous round. However, the false negative rate strongly depends on the limit of detection for each assay (i.e., viral loads in weakly positive samples), which were higher in this (C t 35.7 for the E-gene) than in the previous challenge (C t 38.5 for the E-gene).
Evaluation of quantitative results showed slight intraprotocol variability, a moderate dependency of the C t values on the gene target and a clear inter-protocol variability of C t values in some protocols, but not in others.
High reproducibility (low variance) was observed for the 31 participants using the GeneXpert ® (1-23, Figure 3) and the cobas ® 6800/8800 (24-31, Figure 3) systems. Absolute deviations were higher for other test systems (middle and right columns, Figure 3) and, for certain test protocols, lie outside a range of ±2 of the respective mean value. Notably, these deviations were not random, but systematic errors to one or the other side of the respective mean values (i.e., consistently greater than or less than the mean, depending on the protocol). This questions the comparability of C t values obtained for different genes and test protocols, but also offers prospect for standardization. Recommendations for standardization of nucleic acid tests have been published and test material has been made available [8,9]. The responsibility resides with the manufacturers of IVD/CE-marked test systems or individual laboratories, when using in-house methods.
Two test systems produced two simultaneous C t values from a given reaction: the E-and N-genes in the GeneXpert ® system and the E-and ORF1ab-genes in the cobas ® 6800/8800 system. In the GeneXpert ® system the difference in C t values between the two gene targets was approximately three cycles for each sample, while the difference between C t values between the two targets in the cobas ® system was much less. It is not clear whether this represents differences in relative available template of each gene target in the same sample, or whether it is due to differences in the reaction conditions. However, perhaps multiplexed assays, such as these two, could be used to standardize results between various gene targets, as reaction conditions are more controlled.
In conclusion, caution should be exerted when patient management procedures are based on SARS-CoV-2 (RT) PCR C t values. In the absence of standardization, these technical values are appropriate metric values for correct classification of samples. However, they do not meet the standard of a diagnostic test, when estimating viral loads. For the use of C t values in medical guidelines or government regulations it is advisable to validate the method according to well-defined external reference material, e.g. from an external quality assessment.