The efficacy of botulinum toxin A treatment for tension-type or cervicogenic headache: a systematic review and meta-analysis of randomized, placebo-controlled trials

Objectives: The pathogeneses of chronic tension-type headache (CTTH) and cervicogenic headache (CEH) are not well established. Peripheral activation or sensitization of myofascial nociceptors is suggested as a potential mechanism and injections of botulinum toxin (BONTA) have thus been used in the treatment for both headache conditions. BONTA inhibits the release of acetylcholine at the neuromuscular junction and inhibits contraction of skeletal muscles. If the pain is precipitated by increased tone in cervical muscles, local injections of BONTA could represent a prophylactic measure. However, the treatment is still controversial, and a thorough assessment of the current evidence is required. This review aims to assess the evidence of BONTA injection as a prophylactic treatment for CTTH and CEH by reviewing and examining the quality of placebo-controlled, randomized trials. Methods: Data sources: we searched in the following databases: PubMed (including Medline), Embase, Cochrane Central register of Controlled Trials, Cinahl, Amed, SCOPUS and Google Scholar including other repository sources. BothMeSH and free keywords were used in conducting the systematic search in the databases. The search covered publications from the root of the databases to November 2020. Study eligibility criteria: The review included RCTs, comparing single treatment of BONTA with placebo on patients with CTTH or CEH above 18 years of age, by measuring pain severity/relief or headache frequency. Data extraction: The following data were extracted: year of publication, country, setting, trial design, number of participants, injection procedure, BONTA dosages, and clinical outcome measures. Study appraisal: To assess validity and quality, and risk of bias, the Oxford Pain Validity Scale, Modified Jadad Scale, last version of Cochrane Collaboration’s tool for assessing risk of bias (RoB 2), and the CONSORT 2010 Checklist were used. The trials were assessed, and quality scored independently by two of the reviewers. A quantitative synthesis and meta-analyses of headache frequency and intensity were performed. Results: We extracted 16 trials, 12 on prophylactic BONTA treatment for CTTH and four on CEH. Of these 12 trials (8 on CTTH and 4 on CEH) were included in the quantitative synthesis. A majority of the trials found no significant difference on the primary outcome measure when BONTA treatment was compared with placebo. Three “positive” trials, reporting significant difference in favor of BONTA treatment, but two of these were hampered by low validity and quality scores and high risk of bias. Conclusions: There is no clear clinical evidence supporting prophylactic treatment with BONTA for CTTH or CEH.


Introduction
Headache is one of the most costly health problems [1]. Chronic tension-type headache (CTTH) is characterized by bilateral headache ≥15 days a month and has a 1-year period prevalence of 2.2% [2]. CEH is characterized by a unilateral headache and the prevalence of cervicogenic headache (CEH), is less frequent, but affects about 15% of patients with a chronic headache [3][4][5]. Both types of headache account for high rates of disability [4,6].
The underlying pathophysiology of these two types of headache is not well established [7,8]. Tension-type headache has for years been explained by nociceptive input from muscular tender or trigger points (MTrP's) [9], due to a disturbed function of the neuromuscular endplate and exaggerated muscular depolarization with contractions, compressing the small vessels and thereby leading to muscular ischemia [10]. This explanation has later been challenged [11,12]. Psychological conditions like anxiety, depression [13,14] and stress [15] seem to play an important role, and experimental research indicates a sensitization of sensory pathways and disturbed pain regulation [16], possibly mediated by serotonergic [17], cholinergic [18], and inflammatory mechanisms [19,20]. CEH is by the International Headache Society (IHS) defined as a secondary disorder where cervical spine components like spinal vertebrae, intervertebral discs and soft tissue elements [21,22], and musculoskeletal dysfunction are believed to play an important role [23][24][25].
Both CTTH and CEH are conditions difficult to treat. Botulinum toxin A (BONTA) is used clinically for several conditions, including migraine [26], but for CTTH and CEH the treatment has been based on the notion of a disturbed neuromuscular function [27]. The compound blocks the release of acetylcholine at the neuromuscular junction and may partially paralyze muscles for a few months [28][29][30][31]. A systematic review and meta-analysis published in 2012, focused on BONTA treatment for episodic and chronic migraine, chronic daily headache and CTTH [32]. In this review CTTH BONTA was not associated with fewer attacks per month. Two more recent reviews, including low quality studies like non-controlled trials and prospective as well as retrospective cohort studies, however, concluded that botulinum toxin seems to be effective in the management of CTTH [33,34]. The evidence for BONTA treatment for CEH is limited and in a review, published in 2002 [35], only one randomized controlled trial [36] and a case study [37] were identified, and the results were contradictory. Thus, there is a need for an update on the clinical evidence of BONTA treatment for CTTH and CEH, which is based on highquality studies.

Aims
This systematic review and meta-analysis are limited to placebo-controlled, randomized trials studying the efficacy and safety of BONTA as prophylactic treatment of CTTH and CEH. Our aim was to examine and rank the quality, validity, and risk of bias of the trials in order to critically assess the clinical evidence of one session with BONTA injections as a prophylactic treatment for CTTH and CEH. Ethical approval was not required for a review of selected publications.

Criteria for considering trials relevant for this review
A well-focused question based on the PICO 1 framework was formulated in order to ensure the identification of appropriate search terms for finding relevant literature-evidence in writing this systematic review [38,39]. The standard PICO terms, herein, were translated as follows: Type of participants (Population): adult humans with tensiontype headache or cervicogenic headache. Type of interventions (Intervention): local intramuscular injections including muscular trigger points (MTrP) with botulinum toxin A. Control/Comparison: placebo. Type of outcome measures (Outcome): reduced frequency (days with headache) or reduced headache severity (pain intensity or pain relief). Based on the PICO terms as indicated above, the search questions formulated was as follows: in adult humans with tensiontype headache or cervicogenic headache, are local intramuscular injections (including MTrP) with botulinum toxin-A superior in reducing days with headache or headache severity (pain relief) compared to placebo?

Search methods for identification of studies
A systematic search strategy, developed by the physicians at the University Hospital of Northern Norway (UNN), Tromsø and at the Finnmark Hospital Trust, Alta, in corroboration with the Senior Information Specialist at the Medical Library of UiT The Arctic University of Norway and another librarian from the Unit for Applied Clinical Research -Norwegian University of Science and Technology, was conducted to identify relevant studies for this systematic review. The systematic searches were repeatedly conducted until November 2020 using electronic databases PubMed, Embase, Cinahl, Amed, Cochrane Central Register of Controlled Trials, and SCOPUS. Besides, reference lists of systematic reviews and included articles were manually scanned to expand the data set. The final searches were not restricted on language or publication date, but conducted to up-to-date 03 November 2020. The search strategy was based on the following key concepts: tension-type headache or cervicogenic headache, Botulinum toxin, reducing headache frequency i.e. days with headache or reducing headache severity. An example of the search strategy, where both MeSH and free keywords were used, as applied in PubMed, can be viewed as follows: ((((((tension headache) OR (Tension-Type Headache)) OR (headache disorders, secondary)) OR (cervicogenic headache)) OR (cervical)) AND (((((Botulinum Toxins) OR (botulinum)) OR (botulinum toxin)) OR (toxin)) OR (toxins))) AND (((((((reduced headache) OR (headache)) OR (reduced headache severity)) OR (severity)) OR (pain relief)) OR (reduced headache days)) OR (headache free days)) Filters: Clinical Trial, Randomized Controlled Trial, Humans.
Besides, a step by step search strategy table can be viewed in Supplementary Appendix 1 among the e components.
Google Scholar was also searched for triangulation purposes, i.e. to ensure the identification of all relevant evidence available on the herein topic and the websites searched, using the reference list of identified studies. The primary author of one publication was contacted for further information.
We additionally searched for ongoing or recently completed and unpublished studies through other repository sources such as, www. clinicaltrials.gov, https://papas.cochrane.org/, http://www.isrctn. com/ and https://www.crd.york.ac.uk/prospero/. However, none of the identified articles/trials in the latter registers were found eligible either due to double registration or because they were not completed such that results were not available.

Selection criteria for identification of eligible studies
The selection of eligible trials, included in this systematic review, was conducted in accordance with the PRISMA statement guidelines [40,41], and developed in close collaboration with the two librarians. The potentially eligible trials selected were assessed on the efficacy of BONTA. To assess this, only prospective, placebo-controlled trial, involving patients of ≥18 years old and with CTTH or CEH, receiving BONTA injections in selected skeletal muscles in the upper quarter of the body; and assessed by outcomes on pain severity/pain relief and number of headache (or headache-free) days per week(s) or month, were included. The BONTA treatment could be combined with physiotherapy as well as prophylactic and analgesic medications. However, trials, including patients with neurological diseases (dystonia, torticollis) or other pain conditions (such as temporomandibular disorder or phantom limb pain), were excluded. Only full text available trial reports were selected for inclusion. The publications subjected to multiple registration across or within databases, were excluded.

Screening and data extraction for analysis
Screening by trial headings and abstracts of potentially relevant trials was conducted independently by two of the four authors (SBR and GK). Full text copies of potentially relevant trials were retrieved and independently read and reassessed manually by the same authors. Discrepancy about eligibility was discussed until agreement was achieved. The bibliographies of the retrieved eligible articles and systematic reviews were further checked for additional references.
Data extraction from the relevant included trials was performed and was comprised of the following: year of publication, country of origin, setting, trial design, number of participants, subject characteristics, information about the intervention procedure, BONTA dosages, and clinical outcome measures.

Assessment of validity, quality and risk of bias
Each trial underwent a critical assessment by validated assessment tools such as validity, quality and risk of bias. The scores were compared, and discrepancies resolved by discussion between the two authors herein (SBR and GK).
The Oxford Pain Validity Scale for randomized trials (OPVS) covers five items: blinding, sample size of each trial group, outcomes, and demonstration of internal sensitivity [42,43]. The last, main item includes four sub-items: definition of outcomes, data presentation, statistical testing, and handling of dropouts. With a total of eight criteria the score ranges from 0 to16 points [44].
The Modified Jadad Scale consists of five items derived from the Oxford quality scale [40] focusing on randomization, blinding, withdrawals and dropouts and three extra items on criteria for eligibility, and methods to assess adverse events and statistical analysis. The sum score ranges from 0 to 8 points. The quality scale has shown a good interrater agreement with an intraclass correlation coefficient at 0.9 [41].
Study quality was also assessed by the number of confirmative responses to the CONSORT 2010 Checklist (CONsolidated Standards of reporting trials = CCL-25). The list focuses on how the trial is designed, analyzed, and results are interpreted and includes 25 items. As some items include sub-items, it provides a set of 37 yes or no responses depending on whether the item is reported or not. The checklist is designed for planning trials [45], and not a quality assessment tool per se. However, the number of responses provides a thorough consideration of several quality related study characteristics, and several journals have adopted the checklist in their evaluation of papers in order to increase the number of high-quality reports, improve RCT interpretation, and to minimize biased conclusions [46].
With an overlap of items between the different scoring systems, we found a close correlation between the scores of OPVS and Modified Jadad (Spearman correlation, N: 15, rho: 0.79, p<0.001) and CCL-25 (Spearman correlation, N:15, rho: 0.67, p<0.006) Risk of bias was assessed for the individual trial by the revised Cochrane Collaboration's tools for assessing risk of bias (RoB 2) [47,48]. The RoB 2 tools are derived from the original Cochrane risk-of-bias tool for randomized trials (ROB 1) either with a parallel or cross-over design [48,49] and is structured into a fixed set of domains, focusing on different aspects of trial design, conduct, and reporting such as randomization, blinding, differences in baseline levels, measurement, missing and selection of outcome data (see Supplementray Table 8). The risk of bias is generated by an algorithm, based on answers to the signalling questions and is expressed either by 'low' or 'high' or by the intermediate option 'some concern of' risk of bias.

Sensitivity analyses
Heterogeneity was assessed by I 2 statistics, p-values, Q statistics, degree of freedom and by calculating the influence of each trial on standardized mean difference (SMD) by excluding one by one in separate analysis to investigate the influence on the overall results [50]. To explore potential publication bias we generated Funnel plots while bias due to small trials was assessed by the Egger regression test [51]. As the trials were few and test results rely on assumptions, fixed Roland et al.: Botulinum toxin on tension-type or cervicogenic headache effect and Mandel-Paule aka empirical Bayes estimator meta-analysis were added [52]. Finally, we assessed the impact of potential covariates on the effect size using a meta-regression-based technique and with robust variance estimation in meta-regression if dependent effect sizes from the same study were included [53][54][55][56].

Meta-analysis and other statistical analyses
The quantitative synthesis of effect sizes was carried by a statistician within the team and based on SMD as effect size for the two common primary outcomes, pain intensity and pain frequency. Both are recommended and used in most clinical trials on headache. The effect sizes were calculated from final frequency [57], mean values or change from baseline and standard deviations, standard error of the means, confidence intervals or p-values described in the manuscripts, tables, or depicted in figures. Median and range were converted to mean and standard deviation using the method by Wan et al. [58]. The metaanalysis with tests and plots were generated by the statistical package Stata version 16 (Stata Corp, College Station, TX, USA) with the userdeveloped packages metan, metafunnel, metabias, metareg and robumeta. To achieve comparable dosages of abobotulinumtoxin A (Dysport) and onabotulinumtoxin A (Botox) for pooled analyses across the trials, we converted dosages of abobotulinumtoxin A by a divisor of 2.5 [59].
Due to skewed data non-parametric tests were applied for other statistical analyses. For continuous data bivariate correlations were carried out by the Spearman's test and group comparisons by Independent-Samples Kruskal-Wallis Test. For categorical data comparisons, the Fisher exact test was used. Significance values were adjusted by the Bonferroni correction for multiple tests. These correlations and group comparisons were performed by the statistical package IBM SPSS Statistics 26.

Trial design
All the 16 trials were randomized and placebo-controlled (Table 1). Fourteen were described as double-blinded and one single-blinded [70]. One trial report on patients with CEH did neither describe how patients were blinded nor report any randomization [73], but the primary author confirmed by mail that the study was randomized. Fourteen trials had a parallel and two a cross-over design [66,72]. Two trials were designed with more than two arms and compared the efficacy of different BONTA dosages [59,68] (see Table 1). Two papers [66,69] also presented results from an open-labeled extension/long-term study, these data were not included in the review.

Sample size
The 16 trials represent a total sample of 954 participants, 823 with CTTH and 131 with CEH, and a total of 571 were assigned to receive BONTA injections. The sample sizes varied across the trials from 8 to 300 patients with a median of 36.5. Only five trials included a total sample of >40 participants [59,60,62,67,68]. In three CTTH trials more than 50 participants received BONTA [59,67,68], while in nine trials the number of participants was less than 20 [36,61,63,65,66,[69][70][71][72] (see Table 1). Four trials [59,65,67,72] performed sample size calculation, but in one trial as a post hoc analysis [72]. In two trials they did either not achieve [67] or present [59] the calculated sample size.

Dropouts
Eleven trials reported rates of dropouts, with a range from 0-28% and a median of 6% (see Table 1). Eleven of the 16

Setting
In five of the 16 trials, patients were recruited from a hospital service [60,64,69,72,73], and in two from a private practice or by advertising [62,71]. Nine trials did not provide any information of the recruitment process (see Table 1). Eight trials were, and one seemed to be, supported by a pharmaceutical company (see Table 2), and in two trials a pharmaceutical company was or seemed to be involved in the trial report [59,72].

Patient characteristics
The diagnostic criteria for being included varied across the trials, but included patients equal to or above 18 years of age with the diagnosis CTTH or CEH. In most trials this was based on the well-defined criteria from IHS published in either 1988, 1998, 1997 or 2004 (see Table 1 and Supplementary Table 1). Two CEH trials, however, used the criteria from the Headache International Study Group (CHISG) and one [71] included only patients with headache after a cervical whiplash injury. The sample in the latter study may not be representative for CEH patients in general. Two CTTH trials aimed to also include patients with episodic tension-type headache [61,64], and in the first one, 16 [59,66,68], and in three CEH trials [36,71,73] inclusion was restricted to patients with a headache history for at least six months. One trial [69] selected only patients where myofascial trigger point palpation reproduced the typical ("concordant") pain.

Treatment procedure
The injection protocols varied across the trials in terms of strategy, formulation of and dosages of BONTA, number of injection sites (see Table 1), and location (see Supplementary Table 3). Eight CTTH trials [59-64, 67, 68] and two CEH trials [72,73] used a pre-determined approach where BONTA was injected into "fixed sites", and in five of them with a fixed dose [60-62, 64, 69] (see Table 1). In one trial [60] the "fixed sites" injections were guided by EMG. The symptom-evaluated technique (also termed "follow the pain") was used in three CTTH trials [65,66,69] and in two CEH trials [36,71]. In two of these trials [65,66] the injections were restricted to sites with increased muscular resistance or local tenderness, while in a third one [69] in specified myofascial trigger points, which triggered the characteristic headache. Finally, one trial [70] combined the two injection strategies. In the trials above, there was no clear description on how trigger points were identified and in one trial [36] the authors did not seem to differentiate between tender or trigger points. The median number of injection sites was eleven (range 4-22) among the CTTH trials and six (range 5-10) among the CEH trials.
Most CTTH trials applied injections in the neck, shoulder and pericranial musculature, but in three trials [61,64,66] to only pericranial musculature, and in one to the frontal muscle [64]. In two of the four CEH trials BONTA was also injected in pericranial muscles [72,73] (see Supplementary Table 3).
The total dosages of BONTA varied for onabotulinumtoxin A ® from 20 to 150 U and for abobotulinumtoxin A from 150 to 500 U. The BONTA dosages varied less across CEH trials, for onabotulinumtoxin A from 90 to 100 U and for abobotulinumtoxin A at 150 U. Among trials, reporting site dosages, they varied from 2 to 40 U for onabotulinumtoxin A and from 12 to 30 U for abobotulinumtoxin A.  To define the optimal dosage Silberstein et al. [68] designed a 6-armed trial and injected BONTA dosages of 50 U, 100 U and 150 U or placebo into 10 sites in five muscles, while in two subgroups (BONTA 86Usub and 100Usub) BONTA 86U and 100U, respectively were injected in three and saline in two muscle groups. Concomitant use of analgesics varied largely across the trials from a strict policy to free allowance for concomitant medication (see Supplementary Table 2). In two trials [59,72] prophylactic medication were discontinued six and four weeks before randomization, respectively, while a third trial [68] requested the participants to stay on a stable dose of prophylactic headache medication for at least 3 months prior to enrollment, and to use stable doses throughout the trial. For rescue medication one trial [68] accepted only over-the-counter medication, two trials [59,69] only one single rescue drug, and a fourth trial only if headache severity reached 3 at a 5-point scale [60]. One trial excluded patients using analgesics more than 10 days a month [67]. Although only three trials explicitly reported no limitation for rescue medication [61][62][63], a total of 11 trials seemed to accept use of analgesics as they registered the consumption of or days with medication [36,59,61,62,64,65,[67][68][69][70]72].

Data sampling
All 16 trials used a headache diary to record days with and/ or without headache and/or intensity of the headache during a pretreatment interval ranging from 2 to 6 weeks (baseline) and throughout the follow up with intervals ranging from 2 to 8 weeks (Table 1). In four trials the participants were instructed to record one [59,67], three [69] or four times a day [62].

Primary and secondary outcome measures
Primary outcome measure: Seven of all trials [59,62,63,65,67,68,72] defined one single primary outcome, while three presented two or three primary outcomes, respectively [64,69,70]. The only CEH trial [72], defining a primary outcome measure, chose moderate to severe headache frequency. Three CTTH trials [60,61,66] and three CEH trials [36,71,73] did neither differentiate between primary nor secondary outcomes (see Tables 1 and 2). In Supplementary  Tables 4 and 5, primary and secondary variables and variables not specified, are listed for each trial.

Efficacy
Six CTTH trials [59,62,63,65,67,68] with predefined primary endpoint and two [60,61] with no primary endpoint could not find any significant difference between BONTA and placebo treatment (see Table 2). In the multi-armed trial by Silberstein [68], four subgroups (50U, 100U, 100USub and 86U) a higher number reported ≥50% reduction in headache frequency after 90 days (p: 0.0017-0.024), but after 120 days the difference remained significant for only two of the subgroups (100U and 50U). Notably, placebo treated patients reported a larger increase in headache-free days after 60 days compared with the highest dose of BONTA (150U) (p = 0.007). Correspondingly, Rollnik et al. [61] found higher health status scores (Everyday Life-Questionnaire) among placebo treated patients after both 1 and 3 months.
Four CTTH trials [64,66,69,70] found BONTA treatment statistically superior to placebo (see Table 2), but the second one [66] had no primary endpoint. In two of the trials [64,69] only one of two primary endpoints was significantly reduced compared with placebo. The trial by Hamdy et al. [70] reported significant group differences for three primary endpoints after 30 and 90 days including headache days per month (p≤0.005), headache severity (p≤0.007), and headache disability (p≤0.027).
In two CEH trials [36,72], one with predefined primary endpoint, onabotulinumtoxin A was not superior to placebo. A pilot study [71] reported significant improvement from baseline in headache severity and active ROM 4 weeks after BONTA treatment of (onabotulinumtoxin A 100 U), but did not report any statistical comparison to placebo. As this was a pilot study, they did not conclude on effectiveness. Thus, only one CEH trial [73] found BONTA superior to placebo, however, this trial report did neither comment on randomization nor describe the blinding process.

Quantitative synthesis and meta-analysis
Twelve of the trials were included in the quantitative synthesis and meta-analysis (Figure 2A, B), while four were excluded due to insufficient outcome data. In the random-effects meta-analyses on pain frequency and pain intensity, there was among the CTTH trials a significant difference favoring BONTA on pain intensity (SMD −0.35, 95CI −0.70 to −0.002, p-value 0.049), but not on frequency (SMD −0.34, 95CI −0.71 to 0.02, p-value 0.066). Among the CEH trails there was a significant effect favoring BONTA on frequency (SMD −0.74, 95CI −1.42 to −0.06, p-value 0.034) and on intensity (SMD −0.38, 95CI −0.74 to −0.03, p-value 0.036).

Side effects
Side effects were recorded in eight out of 12 CTTH trials and in all four CEH trials (see Supplementary Table 9). Among BONTA treated patients, the rates averaged 31% with a range from 0 to 69% and among placebo treated an average of 15% with a range from 0 to 62%. The most common side effects were muscular weakness, headache, and symptoms from the injection site and neck pain, and they were generally mild to moderate and transient in nature.
Low to moderate dosages of onabotulinumtoxin A from 20 U to 150 U [36,62,65,[68][69][70][71][72][73] did not increase the occurrence of side effects. In a trial with high abobotulinumtoxin A dosages (500 units) Schulte-Mattler et al. [67] observed side effects only in the active group. Straube et al. [59] correspondingly found a higher proportion of side effects events after abobotulinumtoxin A 420 units (27%) and 210 units (25%) versus placebo (7%). Dysphagia occurred in 7% after 420 units of abobotulinumtoxin A versus 2% after placebo; and ptosis was only observed in patients receiving 420 units of abobotulinumtoxin A (4%). The proportions, reporting muscle weakness, increased from 6% at 210 units to 18% with 420 units. In the 6-arm trial by Silberstein et al. [68], they, contrastingly, found no consistent dose-response of side effects. In this systematic review there was no significant correlation between BONTA dosages and overall number of side effects.

Validity and quality of the trials and risk of bias
Standardized validity and quality tools provided a detailed assessment of the trials (see Table 2). Among 11 CTTH trials the OPVS scores ranged from 5 to 16 with a median score of 10. Among CEH trials the OPVS scores ranged from 7 to 14 with a median score of 10.5. Five CTTH trials [60,61,66,69,70] and two CEH trials [73] achieved low OPVS scores <10 due to insufficient information on blinding, small study samples, dropouts, or no correction for multiple testing and in one trial due to a single blinded design (see Table 2). One CTTH trial [63] had a sample size too low to obtain a validity (OPVS) score. The item specific scores of OPVS are presented in Supplementary Table 6.
The Modified Jadad quality score of the CTTH trials ranged from 4 to 7 with a median score of 5.8. For the CEH trials it ranged from 1 to 8 with a median of 6.5 (see Table 2). Separate item scores are presented in Supplementary Table 7. Low Modified Jadad scores reflected inadequate reporting on trial design, lack of information or inappropriate randomization, concealment of allocation or blinding of intervention, withdrawals, how dropouts were handled, or method to assess adverse event described, and for one trial single blinded design. Only five of the trials explicitly described an appropriate blinding process [62,64,68,71,72] while in two trials it was inappropriate [59,73].
In the supplementary check with the CONSORT Checklist 25 we counted a median of 21 confirmations (range 7-26) among the CTTH trials and a median of 15.5 (range 12-30) among the CEH trials (see Table 2).
Six CTTH trials [60,61,63,66,69,70] were associated with high risk of bias, while six trials [59,62,64,65,67] with some concerns of risk of bias. Among the four CEH trials, two [36,73] were associated with high risk and two [71,72] with some concerns of risk of bias (see Table 2). Separate item scores of the Risk of bias tool (RoB2), describing the reason for risk of bias, are presented in Supplementary  Table 8.
The validity (OPVS) and quality scores (Modified Jadad score, Consort Check list 25) were higher in trials with a negative primary endpoint compared with trials with either a positive or not defined primary endpoint, but the statis- Only four trials (25%) used intention-to-treat analysis, seven (44%) described an adequate sequence generation, four (25%) a concealed allocation, and six trials (38%) an adequate blinding (Supplementary Tables 6  and 7). With Fisher exact test we found a positive relationship between the negative primary outcome and blinding (p=0.02), but not to intention-to-treat analysis (p=0.18), concealed allocation (p=0.769), adequacy of sequence generation (p=0.205), or proportion of dropouts (Independent-Samples Kruskal-Wallis Test. N: 16, p=0.200).
Sponsorship has been considered a potential source of bias [76]. We oppositely found a trend towards higher validity (OPVS) and quality in the trials supported by pharmaceutical industry. Only one sponsored trial [59] concluded positively despite negative primary endpoint, and in this case the company had been involved in editing the manuscript.  [61] and Schulte-Mattler et al. [67]) used the product of pain intensity and pain duration (AUC) as endpoint. Size of data markers is proportional to study weight.

Sensitivity analyses
The heterogeneity for the outcomes was high among CTTH and CEH trials (Figure 2), with I 2 values of 57.3% (Q=14.05, p=0.029) and 75.3% (Q=8.11, p=0.017), respectively, for the frequency outcomes and I 2 = 59.9% (Q=15.0, p=0.021) and 30.5% (Q=4.32, p=0.229), respectively, for the intensity outcomes. One positive CTTH trial [70] and one CEH trial [73], with particular low study quality, had a substantial influence on the pooled standardized effect size. The other trials, however, had no strong influence on the pooled SMD. Assessing the corresponding funnel plots for headache frequency and headache intensity, and the Egger's regression test for small-study effects, no publication bias was found (p-levels between 0.26 and 0.8). The additional fixed effect and Mandel-Paule empirical Bayes estimator meta-analysis showed comparable results with random-effects metaanalysis (Supplementary Table 10). The meta-regressionbased analysis found no significant relationship between the primary outcome and age (p=0.331), sample size (p=0.236), study duration (p=0.484) and dosages (p=0.697), but significant improved BONTA effect with increased proportion of women (p=0.029) when analyzing all trials together (Supplementary Table 11). The Meta-analyses seems robust against biases, but two trials with a strong positive effect influenced the pooled SMD.
In most trials p-values less than 0.05 indicated statistical significance while one trial [68] defined a statistical p-level of 0.10 for subgroup interaction analyses. One trial performed Tukey's Studentized range test [64] and three Bonferroni corrections [67,69,72] with a stricter significant level for multiple comparisons. Confidence intervals were presented in three trials [59,65,72]. Number needed to treat analysis were carried out in only one trial [36].

Discussion
This comprehensive systematic review and meta-analysis cannot confirm a prophylactic effect of botulinum toxin injections or identify any dose-response pattern [59,68] for CTTH or CEH. Our findings strongly contrast two recent reviews, which included non-randomized, prospective and retrospective cohort studies [33,34]. BONTA was not superior to placebo treatment in eight of 12 CTTH trials [59, 61-63, 65, 67, 68], and in three of four CEH trials [36,72]. Trials with negative primary endpoint were associated with higher validity and quality scores compared with trials, favoring BONTA, which "positive" trials were hampered by low quality and high risk of bias. Most trials presented a low standardized mean difference less than 0.5 (see Figure 2), suggesting no clinically meaningful effect of BONTA [77], while three trials had a strong positive effect (two with low quality scores) which explain the slightly positive pooled SMD in the meta-analyses.
Several causes might explain this low ability to demonstrate a BONTA effect [72]. BONTA may be ineffective for CTTH and CEH, but a pain relieving effect of peripheral "needling" and saline/local anesthetic injections or concomitant treatment could also explain the small effect sizes [59]. In one trial [36] the participants additionally received physical therapy (massages and hot mud packs) and in some the trials participants were allowed rescue analgesia. The "negative" trial by Rollnik et al. [61] included patients with episodic tension-type headache and a low headache frequency at baseline (mean 16.2 and SD 10.8) might have reduced the effect size. Due to small effect size the trial by Padberg et al. [65] would need 800 participants to reach a significant group difference.
Standardized validity and quality tools do not evaluate the quality of the intervention/injection procedure itself. It is therefore prudent to question whether the treatment protocols were properly designed in the "negative" trials. None of the studies used an image-guided technique (X-ray with contrast agent or ultrasound), and only one applied EMG guidance to confirm intramuscular needle position [60]. Thus, one could question whether the active component BONTA reached the neuromuscular junctions. Considering that pericranial muscles are located superficially, we find it more likely that BONTA reached the muscular layer after pericranial injections.
Different injection techniques were applied across the trials, but the "follow the pain" strategy did not demonstrate better results compared with "fixed site" injections. There was a large variation in BONTA dosages across trials and some have argued that BONTA dosages in some trials were too small to obtain a clinical meaningful effect. The phase 3 PREEMPT studies on migraine treatment (PRE-EMPT= Research Evaluating Migraine Prophylaxis Therapy) [26] applied onabotulinumtoxin A dosages of 155 and 195 U, and in the treatment of dystonia 500 units abobotulinumtoxin A have been recommended [78]. The findings by Straube et al. [59] could support this notion: they reported better effect of a high dose (420 units) abobotulinumtoxin A, headache duration, and physician as well as patient's global assessment were more pronounced compared with low dose (210 units) abobotulinumtoxin A. On the other hand, they reported increased proportion of muscle weakness from 6% at 210 units to 18% at 420 units [59], suggesting that a tolerable limit had been reached. The large dose ranging trial by Silberstein et al. [68] could furthermore not confirm any dose-response of BONTA. After 60 days placebo-treated patients actually reported a larger increase in headache-free days compared to patients who had received 150 units onabotulinumtoxin A.
These higher rates of side effects after BONTA injections might have unblinded participants in some of the trials. In the "positive" trial by Kokoska [64], which included patients with frontal CTTH, three BONTA treated participants reported ptosis vs zero in the placebo group. The three patients (15% of all BONTA treated) were probably unblinded, and this may have interfered with the experienced treatment effect.
The response to saline injections in some of the "negative" trials indicates a placebo effect, although regression towards the mean and the Hawthorne effect cannot be excluded [79]. Placebo responses have been shown in several headache studies [80,81], and seem to be even higher after invasive procedures [82]. In that case, it suggests adequate blinding of these trials. Supporting this, we found highest validity and quality scores in trials with a negative primary endpoint, in line with a previous systematic review and meta-analysis on acupuncture [42]. This phenomenon clearly illustrates the impact of adequate randomization and blinding in clinical trials [40,83], and it is notable when two recent systematic reviews [33,34] on BONTA treatment for CTTH, included low quality, non-randomized and noncontrolled cohort studies.

Strength and limitations of the review
This review followed the current PRISMA recommendations for systematic reviews and included a large sample of participants (898 participants) where 535 received BONTA. Eight of the trials chose pain intensity and/or headache frequency as primary endpoint. Both endpoints are clinically relevant and should be sensitive enough to demonstrate a clinically significant difference. The external validity of these findings should therefore be satisfactory.
Other researchers have suggested that repeated injections may increase and prolong the effect of BONTA [32,65,66,84,85]. This has so far not been tested in randomized, placebo-controlled trials, and we do not have sufficient data to answer this question.

Conclusion
After critically reviewing available data from 16 selected RCTs with a quantitative synthesis and meta-analyses, including 11 of the trials, we cannot conclude that prophylactic BONTA injections are superior to placebo treatment for tension-type headache or cervicogenic headache, irrespective of injection technique and dosage level. Our findings challenge the notion that increased muscular tone or trigger points are responsible for CTTH and support previous studies indicating more complex mechanisms. Also for CEH other mechanisms seem to play a role. To answer the question about efficacy of BONTA injections, future reviews should be based on large, well-designed, randomized, placebo-controlled trials and not noncontrolled studies with low validity and quality and a high risk of bias.

Clinical implications
After systematically reviewing 16 selected RCTs on BONTA injection, we cannot recommend prophylactic treatment with BONTA for tension-type headache or cervicogenic headache.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. Competing interests: Authors state no conflict of interest, and they alone are responsible for the content and writing of the paper. Informed consent: Not necessary for this review of already published trials. Ethical approval: Not necessary for this review of already published trials.