Is thyroglobulin a reliable biomarker of differentiated thyroid cancer in patients treated by lobectomy? A systematic review and meta-analysis

Objectives: The prognostic role of thyroglobulin in predicting recurrence in differentiated thyroid cancer (DTC) patients treated by lobectomy is controversial. This systematic review with meta-analysis aimed to update the current evidence deepening the reliability of circulating thyroglobulin in assessing the early response and in predictive recurrence. Methods: The methodology was registered in the PROSPERO database under the protocol number CRD42021288189. A systematic search was carried out on PubMed, Embase, Web of Science, and Scopus from September to November 2021 without time and language restrictions. The literature search strategy was based on the following keywords: Thyroglobulin AND (Lobectomy OR Hemithyroidectomy). Results: After screening 273 articles, seven studies were included in the systematic review, and only six of them were included in the meta-analysis for a total of 2,455 patients. Circulating thyroglobulin was found non-reliable in assessing early response and predicting recurrence in patients with hemithyroidectomy, especially those with a low initial ATA classification. Conclusions: Our study does not support serum thyroglobulin levels for monitoring patients with low-risk DTC treated with lobectomy, and weak evidence supports its role for intermediateor high-risk patients. Studies with longer follow-up, different study designs, and stringent inclusion/exclusion criteria are needed to evaluate the role of thyroglobulin in recurrence prediction.


Introduction
Papillary (PTC) and follicular (FTC) thyroid carcinomas belong to the differentiated thyroid carcinoma (DTC) family and account for many endocrine cancer cases. Traditionally, most DTC were treated by total thyroidectomy (TT) followed by radioiodine therapy and, with the resultant elimination of benign residual sources of thyroglobulin (Tg) secretion, the stage was set for using serum Tg measurement as a pivotal tool in DTC follow-up [1]. Since the publication of the American Thyroid Association (ATA) 2015 guidelines, the number of DTC patients treated by hemithyroidectomy (HT) has significantly increased [2]. Notably, the recurrence rate in patients treated with HT is not negligible because of the frequent multifocality and bilaterality of DTC [3]. However, in such cases, the presence of tumor foci can be obscured by Tg amounts produced by the remaining thyroid lobe volume under the influence of the current iodine status and TSH concentrations [4].
Since prompt detection and treatment of recurrence allows for an excellent clinical outcome, deepening the possible role of serum Tg changes after lobectomy is crucial in managing and monitoring DTC patients. After lobectomy, recurrence rates were about 7% [5], ranging from 4.2 to 7.1% for low-risk cancers [3]. Unfortunately, available studies on the evolution of Tg concentrations after HT produced controversial results. Our present study provides a systematic review and meta-analysis of updated literature in order to obtain more robust evidence on Tg performance in the assessment of DTC patients after HT. Specifically, we aimed to address three research questions:

Research questions
-Is circulating thyroglobulin reliable to assess the early response (measured as no recurrence within 24 months) to treatment after lobectomy? -Is circulating Tg predictive in HT as in TT without RAI? -Is circulating thyroglobulin reliable to detect persistent/recurrent disease after lobectomy?

Methods
According to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [6], we performed a detailed systematic review of published data. The methodological approach was registered in the PROSPERO (International prospective register of systematic reviews) database under the protocol number CRD42021288189.

Eligibility criteria
All original peer-reviewed research publications were considered. Eligible studies were selected according to the following criteria: (1) observational studies or case-control studies; (II) studies carried out on patients with DTC and undergoing thyroid lobectomy or total thyroidectomy without RAI. Studies involving patients with a histological diagnosis of medullary thyroid cancer or anaplastic thyroid cancer were excluded.

Search strategy and study selection
A systematic search was carried out on PubMed, Embase, Web of Science, and Scopus from September to November 2021 without time and language restrictions. The literature search strategy was based on the following keywords: Thyroglobulin AND (Lobectomy OR Hemithyroidectomy). The first (title/abstract screening) and second (fulltext assessment) steps of the search process were performed by two independent reviewers (LG and MLG), and any disagreement was discussed until a decision was made by consensus. The complete list of articles obtained through the systematic search was screened to remove duplicates and exclude not eligible articles. The potentially relevant articles to answer the research questions were screened by reading titles and abstracts. Two reviewers (LG and MLG) independently selected the eligible studies. Full texts of the remaining potentially relevant articles, those that met the inclusion and exclusion criteria, were retrieved. The final eligibility of each study was assessed, and the reasons for exclusion were recorded. All authors executed the definitive article selection. When there was disagreement, it was solved by consensus.
Two reviewers (LG and MLG) independently extracted data from included studies and reported them in a datasheet. In this case, also, any disagreement was resolved by consensus. The data collected included: (1) study characteristics: first author, year, country, observation period; (2) sample size and patients' characteristics (sex and age); (3) surgery description; (4) therapy after surgery; (5) TSH levels; (6) Tg measurements; (7) Recurrence disease criteria; (8) Tg change criteria; (9) Follow-up; (10) ATA score; (11) size of primary tumor; (12) lymph node classification; (13) tumor classification; (14) number of recurrences related to Tg levels (Increasing Tg over time vs. Decreasing or Stable Tg over time). No numerical information was extracted from the figures reported in the study publications.

Risk of bias assessment
Authors independently assessed the risk of bias of included studies using the National Institute of Health (NIH) quality assessment tool for observational cohorts (https://www.nhlbi.nih.gov/health-topics/ study-quality-assessment-tools). The tool comprises 14 criteria; the overall assessment is good, fair, and poor. Possible disagreements were resolved by discussion and consensus among all authors.

Statistical analysis
All data from eligible articles were synthesized for a systematic summary. Moreover, pooled risk estimates (RR) with 95% CI were performed to answer the third research question. The risk ratio was summarized through random-effects with the DerSimonian-Laird method. Heterogeneity was assessed using Cochrane Q-test and the I 2 statistic according to the Cochrane Handbook for Systematic Reviews of Interventions. Heterogeneity was also investigated through the Galbraith plot: specified analyses were performed to evaluate the impact of these possible sources of heterogeneity.
A cumulative meta-analysis was also reported to track the accumulation of evidence per year and sample size. Finally, a sensitivity analysis (one study removed) was conducted to investigate the impact of each study on the total results of the overall prevalence. Moreover, a comparison between Fixed and Random models was performed. Publication bias was not assessed according to Cochrane guidelines [7]. All analyses were performed through STATA17 (Stata-Corp., College Station, TX, USA).

Search results and studies characteristics
The literature search identified 551 studies ( Figure 1). After duplicated exclusion (n=278), 273 articles were screened by title and abstract. Seven studies met the inclusion and exclusion criteria. Five studies were excluded because of not reported Tg trend over time [8][9][10][11][12] (see Supplemental  Table S1). The included studies had a total of 2,455 patients. The details of the included studies are reported in Table 1.

Risk of bias assessment
The NIH showed that four out of seven studies were rated as fair [13][14][15][16] mainly because of a lack of statistical adjustments for crucial potential confounding variables such as the initial ATA-risk classification or tumor size. In addition, all included studies did not provide size justification, power description, or variance and effect estimates. Quality assessment of the included studies has been reported in Table S2.

Studies characteristics
Two thousand four hundred and fifty-seven (2,457) patients were included. Two studies were carried out in the USA [15,16], other two were conducted in Korea [13,17], one in China [18], Israel [19], and Italy [14], respectively. Median follow-up ranged between a minimum of 5 years [16] and a maximum of 12 years [15] with an average of 7.5 years and significant variability among studies (a minimum of 6 months in Colombo et al. [14] until 12 years in Vaismann et al. [15]). Patients' age ranged from approximately 20 years [15] to more than 80 [19], with a mean age of about 45 years. The percentage of the males/females ratio ranged between 6% [15] to 41% [18]; males represented 21.8% of the entire investigated population (1,920 females vs. 535 males). Two hundred and 17 patients were treated with total thyroidectomy without RAI, while the remaining   [18]. Five studies included patients treated with TSH suppression therapy after surgery [13][14][15][16]18]. Two studies had a sample size lower than 100 patients [14,15], three studies had a sample size included between 150 and 300 patients [16,17,19], other only one study included more than 600 patients [13], and finally, one study had 1,050 patients [18].
Three studies considered Tg measurements executed every six months within the first five years after surgery and every 12 months thereafter at the discretion of the attending physician [15,16,18], while the remaining four studies performed Tg measurements between 1 and 3 months after surgery for the first year and every 6-12-18 months after the first year [13,14,17,19]. Only Ritter et al. (2020) explicitly reported the interruption of Tg measurements observation after six years [19]. Tg changes criteria varied among studies: three studies defined increasing (decreasing) Tg increase when Tg increased (or decreased) more than 20% using the same assay in two consecutive measurements at least one month apart [13,15,17]; one study determined Tg change as any change verified in Tg measurements over time [16]; three studies did not report the Tg change determination [14,18,19].
Almost all included studies considered recurrence disease according to the following criteria: 1) positive cytology/histology; 2) highly suspicious lymph nodes or thyroid bed nodules on the neck ultrasound; 3) findings on radioactive iodine (RAI) scans (18-FDG-PET) scans or other cross-sectional imaging highly suspicious for metastatic disease. Only one study did not explicitly report recurrence disease criteria, but the whole work strictly adhered to the criteria mentioned above [19].
Is circulating thyroglobulin reliable to assess the early response to treatment after lobectomy?
Three studies analyzed the role of circulating Tg in assessing the early response after lobectomy [13,15,17] [13]. Recurrence within 24 months was observed in six patients (6/12: 50%): Tg trend was stable or decreasing in four patients and increasing/stable in two patients. Park et al. (2018), in a retrospective study including 208 patients with a low initial ATA risk, observed recurrence in 19 patients (9.1%), of whom 13 patients in the contralateral lobe and six patients in the central or lateral cervical LNs. Only three out of 19 patients reported recurrence within two years from the lobectomy. In all these cases, the Tg trend decreased.
Is circulating Tg predictive of recurrence in HT as in TT without RAI?
Only one study compared the predictive of Tg trend in patients treated with lobectomy or total thyroidectomy without RAI [16]. It examined the Tg trend in a cohort of 289 patients (72 treated with HT and 217 treated with TT). Tg trend decreased or remained stable in all patients. Differently, recurrence was identified in three patients treated with lobectomy (4.2% of the HT patients and 1.04% of the entire cohort) and in five patients treated with TT (2.3% of the TT patients and 1.73% of the whole cohort) in a median follow-up of three years (range: 1-7.1 years). Three patients in the TT cohort and three in the HT cohort had an intermediate initial ATA risk of recurrence. No significant serum Tg changes emerged in patients who underwent total thyroidectomy or thyroid lobectomy.
Is circulating thyroglobulin reliable to detect persistent/recurrent disease after lobectomy?
The risk ratio was not significant for patients with low initial ATA risk (1.47, 95% CI: 0.22-9.68), while it was maintained substantially in studies with low-and intermediate-risk patients (18.18, 95% CI: 2.02-163.37). Only one study considered a heterogeneous sample related to ATA risk [18]: in this case, the role of the increasing Tg was significantly high (5.31, 95% CI: 3.36-8.40) (Figure 4).
Considering studies with more than 150 patients, the risk ratio reported no changes (RR: 3.58, 95% CI: 0.47, 27.41) (see Supplemental Figure S2A). In addition, no significant changes emerged in studies with a median followup longer than seven years (RR: 4.16, 95% CI: 0.23-76.82) (see Supplemental Figure S2B). Galbraith plot for analysis of heterogeneity across studies. The green reference line (y=0) represents the "No effect" line. The red line is the regression line through the origin. The slope of this line equals the estimate of the overall effect size, which is the overall log risk-ratio. In the absence of substantial heterogeneity, we expect around 95% of the studies to lie within the 95% CI region (shaded area). Here two out of the six studies were outside the shaded region, indicating considerable heterogeneity among the effect sizes. Finally, from cumulative meta-analysis, performed using two approaches (chronological and sample-size approach), a stabilization of the risk ratio emerged from 2020 (see Supplemental Figure S3A and S3B). The sensitivity analysis showed no relevant difference in the case of different metaanalysis approaches (random vs. fixed). However, we observed a variability reduction using OR in effect size measurements (RR vs. OR) using OR (Table S3). The leave-one-out meta-analysis showed that Cho et al. (2018) and Park et al. (2018) seem to have a relatively more significant influence when compared with other studies on the estimation of the overall effect size [13,17,19] (see Supplemental Figure S4).

Discussion
The prognostic value of Tg among DTC patients treated by lobectomy is debated, and interpretation criteria are lacking. In fact, while Tg is a pivotal component of DTC followup after total thyroid ablation [1], its interpretation is challenging after lobectomy. Small Tg concentrations produced by persistent/recurrent cancer tissue may be easily obscured in the presence of a thyroid lobe producing significant Tg amounts, considering that the remaining thyroid lobe produces about half of the Tg produced by the intact thyroid gland (i.e., 10-40 ng/mL) [4].
In addition, factors such as TSH levels, lymphocytic thyroiditis, thyroid nodules, and normal serum Tg fluctuations may generate a considerable interpatient variability, resulting in overlapping Tg levels in patients with and without disease recurrence [19][20][21]. While serum Tg levels decrease spontaneously or remain stable during the median five years follow-up after TT or near TT, in the case of lobectomy, the serum Tg levels increase gradually during the follow-up (an average of 10% increase annually) without being considered a specific biomarker for disease recurrence [15].
To provide robust evidence, we performed a systematic review and, where applicable, a meta-analysis of updated literature on Tg performance in assessing risk recurrence in DTC patients after HT. As the main result of our study, we found that circulating Tg is not suitable to detect recurrent disease in low-risk DTC patients after lobectomy, while its performance ameliorates in patients with intermediate or high risk, although the evidence remains limited to very few studies. This is likely due to the lower pre-test probability of recurrence in low-risk compared to intermediate-and highrisk patients, respectively. Indeed, a low pre-test probability will result in a low absolute difference, with the consequence that even very powerful tests achieve a low absolute difference for very unlikely conditions in an individual (such as recurrence in low-risk DTC in the absence of any other indicating sign), but on the other hand, that even tests with low power can make a significant difference for highly suspected conditions (such as recurrent disease in high-risk DTC patients). Indeed, lobectomy is currently recommended in patients with low-risk intra-thyroid DTC, while a completion thyroidectomy is required when high-risk features are detected on surgical samples [22]. Consequently, a low number of patients with intermediate and, especially, highrisk DTC is expected to be treated with lobectomy alone.
Moreover, the initial ATA risk stratification categories are used as discrete categories and not as a continuum in which the risk can vary from as low as 1-2% to as high as 40-50%, depending on the specific clinical features present for each case. The discretization approach could strongly impact lowand intermediate-risk for which risks are often overlapping and can present a wide variability. For example, the low-risk category may range from 1 to 50%, while the intermediate one ranges from 10 to 40%. In terms of structural recurrence, this overlap was also observed using the dynamic risk stratification: the structural recurrence in the indeterminate and biochemical incomplete response did not show significant differences from that observed in the excellent response groups [23]. All in all, our current meta-analysis confirms the limited, if any, value of serum Tg in monitoring DTC patients treated with lobectomy according to ATA 2015 guidelines (i.e., intra-thyroid low-risk DTC).
Concerning the role of circulating Tg in predicting recurrence in HT as in TT without RAI, currently available data are limited, and a meta-analysis is precluded. Indeed, the only retrievable paper [15] found no role of serum Tg in predicting outcomes of patients treated with either lobectomy or total thyroidectomy without RAI. This, in turn, may be due to the high variability of thyroid surgery completeness and, again, to the prevalence of recurrences in different series. Similarly, no evidence can be obtained on the role of circulating Tg in assessing the early response in patients treated by lobectomy.
In summary, our study does not support the use of serum Tg to monitor patients with low-risk DTC treated by lobectomy, while its use should be considered in intermediate-and, especially, high-risk patients treated by lobectomy without subsequent completion thyroidectomy and radioiodine ablation. A potential role of Tg measurement after lobectomy may be detecting rare cases with distant metastases. In such cases, high Tg values can be recorded (i.e., Tg values > 100 ug/L) [10]; however, due to the sporadic occurrence of distant metastases, no data can be retrieved by the currently available literature. This work has several limitations. First, we retrieved only studies with retrospective design. Second, significant heterogeneity was observed among studies in the patients' selection: while some studies applied stringent inclusion criteria (e.g., patients with a low initial ATA classification or with serum TSH levels stable within the reference range and not needed for thyroid hormone replacement), others included patients more likely to have persistent disease, with a minimal volume disease, or with low-, intermediate-, or high initial ATA risk classification. Third, different Tg change assessments were adopted in included studies (i.e., changes over time, increase or decrease more than 20% using two consecutive measurements, or increase of ≥20% related to post-operative Tg values). Fourth, a clear indication about surgery (lobectomy alone or lobectomy with ipsilateral prophylactic central neck dissection) was not reported by some included studies, although a relevant difference in recurrence rate between the two techniques (lobectomy alone: 5.4%; lobectomy with ipsilateral prophylactic central neck dissection: 1.3%) [24]. Finally, the number of recurrences was very low. Fifth, the mean follow-up was 7.5 ranging from 5 to 12 years; since recurrences may occur even after 10 years [25], reported studies could not have been captured the actual recurrence.
Additionally, a relevant problem in monitoring DTC patients is represented by the presence of thyroglobulin antibody (TgAb). TgAb is present in up to 15-25% of patients with DTC and may interfere on Tg immunoassays resulting in falsely low or undetectable Tg results [4]. In such cases, TgAb levels can be adopted as a 'surrogate tumor marker.' Generally, a reduction in serum TgAb concentrations indicates that the patient is free from recurrent disease, whereas a clear increase prompts additional investigation to localize the foci of the disease (1). Unfortunately, however, no data are available in the literature to inform the follow-up of patients with DTC with positive TgAb after lobectomy. Consequently, imaging procedures (i.e. ultrasound) remain pivotal in these cases.
In conclusion, the role of serum Tg monitoring is very limited in DTC patients treated by lobectomy and, especially, in those carrying low-risk cancers. Therefore, if carried out, the results should be interpreted carefully, considering both the corresponding TSH concentration and the ultrasonographic findings.
Research funding: None declared. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

Competing interests:
LG is member of an advisory board and the speaker board of ROCHE Diagnostics (Switzerland). LC and MLG state no conflict of interest. Informed consent: Not applicable.
Ethical approval: The local Institutional Review Board deemed the study exempt from review.