Genome-wide Analysis of the WRKY Gene Family and its Response to Abiotic Stress in Buckwheat (Fagopyrum Tataricum)

Abstract The WRKY gene family is an ancient plant transcription factor (TF) family with a vital role in plant growth and development, especially in response to biotic and abiotic stresses. Although many researchers have studied WRKY TFs in numerous plant species, little is known of them in Tartary buckwheat (Fagopyrum tataricum). Based on the recently reported genome sequence of Tartary buckwheat, we identified 78 FtWRKY proteins that could be classified into three major groups. All 77 WRKY genes were distributed unevenly across all eight chromosomes. Exon–intron analysis and motif composition prediction revealed the complexity and diversity of FtWRKYs, indicating that WRKY TFs may be of significance in plant growth regulation and stress response. Two separate pairs of tandem duplication genes were found, but no segmental duplications were identified. Overall, most orthologous gene-pairs between Tartary and common buckwheat evolved under strong purifying selection. qRT-PCR was used to analyze differences in expression among four FtWRKYs (FtWRKY6, 74, 31, and 7) under salt, drought, cold, and heat treatments. The results revealed that all four proteins are related to abiotic stress responses, although they exhibited various expression patterns. In particular, the relative expression levels of FtWRKY6, 74, and 31 were significantly upregulated under salt stress, while the highest expression of FtWRKY7 was observed from heat treatment. This study provides comprehensive insights into the WRKY gene family in Tartary buckwheat, and can support the screening of additional candidate genes for further functional characterization of WRKYs under various stresses.


Introduction
Although most plants grow in specific environments, they experience continual changes in their external conditions, therefore, plants have developed a series of complex mechanisms to withstand stresses [1,2]. Transcription factors (TFs) are crucial proteins in the response to environmental stimuli by regulating gene expression temporally and spatially [3]. TFs, also called sequence-specific DNA-binding factors, bind to conserved cis-elements in promoter regions, thereby interacting with downstream target genes to influence transcription [4,5]. The WRKY family is one of the largest and most diverse TF families, and is related to the coordination of many physiological activities in the plant kingdom. Moreover, it has been widely studied for its important role in regulating gene expression under adverse conditions [6 -9]. According to the number of domains and features of the zinc-finger motif, the WRKY family can be divided into three distinct groups (I, II, and III) [10]. Groups I and II both exhibit a C2H2 zinc-finger motif, although group I has two WRKY domains whereas group II has only one. Group II can be further classified into five subgroups (IIa-e) based on the amino acid (aa) stretch in the zincfinger motif [11,12]. Group III also contains a single WRKY bioinformatics analyses including phylogenetic, gene structure, and motif composition analyses; determined the chromosomal locations of the genes; and calculated the Ka/Ks values of WRKY genes in Tartary buckwheat. Subsequently, the expression patterns of select FtWRKYs under salt, drought, cold, and heat treatments were analyzed. This study helps clarify the functions of WRKY proteins and provides a foundation for further comparative genomic studies in Tartary buckwheat.

Phylogenetic analysis and protein properties of FtWRKYs
Multiple alignment of the WRKY domain sequence in the 78 FtWRKY proteins was performed with ClustalW using the default settings [48], and a phylogenetic tree was constructed using the neighbor-joining (NJ) method with MEGA 6.0 software [49] with the following parameters: pairwise deletion, 1000 bootstrap replicates, and Poisson correction. To obtain an accurate classification, two members per group and highly conserved representative AtWRKY proteins were included in the tree building (Table  S2). The FtWRKY domains in each group were analyzed, and the sequence logos were produced using WebLogo online software (http://weblogo.threeplusone.com/) [8]. The properties of the proteins, including sequence length (aa length), molecular weight (MW), isoelectric point (IP), instability index (II), aliphatic index (AI), and grand domain but a C2HC zinc-finger motif [11,12]. The domain of WRKY TFs is approximately 60-aa long at the N-terminus and has a typical zinc-finger motif at the C-terminus [13]. It is generally thought that WRKY TFs can specifically interact with the W-box (TTGACT/C) found in the promoter region of many plants [14,15]. The first WRKY gene, SPF1, was isolated from sweet potato (Ipomoea batatas), and was considered to have potential negative impacts on the regulation of sucrose-induced genes [16]. Multiple studies have since been performed in different plants. For example, 72 and 64 WRKYs have been reported in the model herbaceous plants Arabidopsis thaliana and Oryza sativa, respectively [17], and many other studies (Table S1) have been performed in Triticum aestivum [18], Camellia sinensis [19], Populus trichocarpa [10], and Glycine max [20]. Such studies have demonstrated that WRKY proteins not only participate in physiological processes such as seed germination [21] and leaf senescence [22], but are also involved in the response to biotic stresses such as pathogens [23] and pests [24], as well as abiotic stresses such as drought [25], heat [26], cold [27], salinity [28], and heavy metals [8]. For instance, in transgenic tobacco, the BcWRKY46 gene enhanced tolerance to freezing, abscisic acid, salt, and dehydration stresses [29]. Meanwhile, ThWRKY7 improved cadmium tolerance under cadmium stress in combination with ThVHAc1 in woody plants [30]. In A. thaliana, AtWRKY25, 33, 46, and 54 have been demonstrated to play vital roles in the response to several types of stress [31,32]. Moreover, WRKY proteins have vital roles in the biosynthesis of secondary metabolites, such as paclitaxel and benzylisoquinoline [33,34].
Tartary buckwheat (Fagopyrum tatarium) is an annual eudicot plant belonging to the genus Fagopyrum. Tartary buckwheat and common buckwheat (Fagopyrum esculentum) are the most commonly cultivated species of this genus [35,36]. Tartary buckwheat originates in southwest China, and is also known as kuqiao for its bitter seeds. It is grown mainly in farming and ranching areas that overlap with northern China, and exhibits strong abiotic resistance to harsh eco-climatic environments [37 -39]. As a medicinal and nutrient-rich crop, Tartary buckwheat has higher flavonoid content than common buckwheat and is especially abundant in rutin, accounting for 0.8-1.7% of the dry weight of the plant [40,41]. Moreover, quercetin, anthocyanins, and other flavonoids in buckwheat have various biological activities, such as antibacterial, antioxidant, and anti-inflammatory effects [41 -44].
The genome sequence of Tartary buckwheat was recently published [42], enabling the systematic characterization of WRKY genes in this species and the study of their expression. Therefore, we performed Illumina RNA-sequencing datasets were collected from the NCBI SRA database (https://www.ncbi.nlm.nih.gov/ sra), including different five tissues, i.e. roots, flowers, stems, and leaves (accession: SRX3974871; SRX3974872; SRX3974873; SRX3974874), and for the salt treatment, plants were disposed with 200 mM NaCl for 0, 24 hours (accession:SRX3210945; SRX3210946). Transcript expression levels were calculated in FPKM units as reads per kilobase of transcript sequence per million mapped reads. FPKM value were transformed by log2 and the heatmap was performed by HemI software [54].

Plant material, growth conditions, and treatments
The seeds of Tartary buckwheat cv. Pinku1 were provided by Dr. Bo Li from the college of agriculture, South China Agricultural University. Plants were grown in pots containing soil and vermiculite mixture (3:1) in an artificial climate chamber, with a program set to 25/22°C (day/night), 16-h photoperiod, and relative humidity of 75%. Stress treatments were initiated in 5-week-old normal seedlings, and the seedlings were disposed with following treatment as described by Zhou et al. [38] and Gao et al. [40]. For salt and drought treatments, seedlings were irrigated with 15% NaCl and 30% PEG6000, respectively. For cold and heat treatments, seedlings were transferred to 4°C and 40°C in an illuminated incubator, respectively. Whole samples were harvested after 0, 3, and 12 h, while plants at 0 h were used as the control. Samples from three biological replicates were frozen immediately in liquid nitrogen and stored at -80°C until further analysis.

Total RNA extraction and cDNA reverse transcription
Total RNA was isolated from frozen seedlings with an RNAprep Pure Plant Kit (TIANGEN, Beijing, China) and the concentration of RNA was determined using spectrophotometry. First-strand cDNA was reverse transcribed using a PrimeScript RT Master Mix (Perfect) Real Time Kit (TaKaRa, Dalian, China). All operational procedures were performed according to the manufacturer's instructions. Finally, 1 μL cDNA was diluted with 4 μL nuclease-free water before quantitative real-time (qRT)-PCR analysis.

Conserved motifs and gene structure analysis of FtWRKY genes
The conserved motifs in the FtWRKY proteins were predicted using MEME Suite (http://meme-suite.org/ tools/meme), the following parameters were employed in analysis: maximum number of motifs 10; minimum motif width 6; maximum motif width 50. The gene structure of FtWRKY was predicted by comparing the coding sequences and corresponding genomic sequences using the GSDS tool (http://gsds.cbi.pku.edu.cn) [51].

Chromosomal location of FtWRKY genes
The physical location of FtWRKYs on chromosomes was retrieved from the annotated genome and chromosome files, and genes were plotted separately onto all eight chromosomes based on the order of their physical position. Finally, an image of their physical location was created with MapInspect software [52].

Calculation of Ka/Ks of orthologous gene-pairs between FtWRKYs and FeWRKYs
The orthologous gene-pairs of WRKY between Tartary and common buckwheat were aligned using ClustalW on the basis of diverse sequence alignment tools. Alignment of the aa sequences and their corresponding original cDNA sequences were used to calculate the synonymous rate (Ks) and nonsynonymous rate (Ka) using the CODEML program in the PAML interface tool of PAL2NAL [5]. Furthermore, the evolutionary constraint (Ka/Ks) was determined. The approximate time (million years ago [Mya]) of the orthologous gene-pairs were estimated using the equation T = Ks/2λ, where the synonymous substitution rate (λ) was 1.5 × 10 -8 [53].

RNA-sequencing data analysis of FtWRKY genes
To investigate the expression patterns of FtWRKYs among different tissues as well as under salt treatment, the NJ tree indicated that the FtWRKYs could be divided into three main groups defined in previous research [11]. Figure  S1 showed the multiple sequence alignment of the FtWRKY domain among each group. Group II had the most FtWRKYs (56), followed by group I (15). Meanwhile, group III had a different type of zinc-finger compared with groups I and II, and contained only seven members. Group II could be further classified into five subgroups (IIa-e). Subgroup IIa (six members) and IIb (ten members) were in the same branch, while subgroup IId (12 members) and IIe (11 members) were derived from one clade. Subgroup IIc (17 members) was more similar to group I than the other subgroups. Notably, in group I, four FtWRKYs (FtWRKY3, 25, 66, and 59) had only one WRKY domain at the C-terminus, which may have lost or acquired a domain during evolution [58].
Based on the 78 complete aa sequences, we predicted the properties of the FtWRKYs, including the aa length, MW, IP, II, AI, and GRAVY ( According to the predicted protein stability, almost 93.59% of WRKY proteins were unstable, because their II values were greater than 40, while only five FtWRKYs were stable, among which four members were from subgroup IIc. The predicted results of the subcellular localization revealed that all FtWRKYs were localized in the nucleus.

Gene structure and motif composition of FtWRKY proteins
The analysis of the intron-exon structure of full-length cDNA ( Fig. 2B) revealed that all FtWRKYs had introns in the translated region, although the number of introns and exons varied from one to five introns and two to six exons, respectively. The majority (52.56%) of the FtWRKY genes contained two introns. Meanwhile, six genes contained five introns and eight genes contained only one intron. Most members of subgroup IIb had two introns, except FtWRKY23 and 47, and five members of group III (71.43%) contained three exons and two introns. Genes with similar structures were always clustered in the same group, which was further confirmed by the results of the FtWRKY classification ( Fig. 2A).

qRT-PCR analysis
The expression of FtWRKY6 and 74 were enhanced significantly after treatment with NaCl, and same as FtWRKY7 and 31, they shared a high identity with AtWRKY25, 33, 46, and 54 were identified in plant defense experiments, thus were selected for qRT-PCR analysis. The housekeeping gene histone3 (GenBank ID: HM628903) of Tartary buckwheat was used as an internal control [55]. Specific qRT-PCR primers (Table S3) were designed with Primer Premier 6.0 and synthesized by Sangon (Guangzhou, China). The SYBR Premix Ex Taq Kit (TaKaRa, China) was used for the qRT-PCR reaction, and the reactions were performed using a Roche LightCyler 480 system (Roche, Basel, Switzerland). The qRT-PCR reactions were performed in a total volume of 20 μL, including 10 µL SYBR Premix Ex Taq, 6 µL ddH 2 O, 2 µL diluted cDNA, and 1 µL each forward and reverse primer. The qPCR program was as follows: initiation with a 3 min denaturation period at 95°C, followed by 40 cycles of 95°C for 15 s, 60°C for 30 s, and 72°C for 20 s. Three biological replicates and three technical replicates were included in the qRT-PCR analysis. Finally, gene expression was calculated using the 2 -∆∆ c method [56], and the means and standard deviation of three biological replicates were calculated. The significance was statistically analyzed by t text, and revealed by asterisks (* p<0.05, **p<0.01).

Identification and classification of FtWRKYs
In the F. tatarium genome sequence, 74 protein sequences were detected that contained the conserved complete WRKY domain based on the HMMER and SMART analyses. Moreover, four additional proteins were identified that were missing the zinc-finger motif. These sequences were included for the subsequent analysis because they have been identified as annotated WRKYs in Arabidopsis based on BLAST, and this phenomenon has also been reported in Capsicum annuum [57]. Interestingly, one gene (FtPinG0001732900.01) showed two alternative messenger RNA splicing, therefore, 77 genes were detected in total. The 78 identified proteins were named FtWRKY1-77 according to the distribution (from top to bottom) of the corresponding genes on chromosomes (Chr.) 1-8 ( Table 1).
The phylogenetic relationship of all FtWRKYs and 14 typical AtWRKYs was constructed based on multiple sequence alignment of their WRKY domains (Fig. 1) The domain at the C-terminus. These three motifs were also found in 15 of the 17 proteins of subgroup IIc, because FtWRKY54 lacked the zinc-finger-like motif and FtWRKY10 did not contain motif 4. Similarly, in subgroup IId, motifs 1, 2, and 7 were found in 11 of the 12 members, except the one member without a zinc finger (FtWRKY59). This indicated that the motifs were selectively distributed among the groups. Motif 8 was found in all members of subgroup IIa and nine of the ten members of subgroup IIb. Moreover, all members in subgroup IIb contained motif 5. Motif 10 was mostly found in subgroups IIa and IIb, while motif 9 was only identified in FtWRKY39, 54, and 61. Overall, the variety and complexity of these motifs suggests that these proteins likely have additional functions [59].
Ten motifs were identified in Tartary buckwheat using MEME software. Table S4 presents detailed information of these motifs and Figure 2C

Duplication and evolution of FtWRKY
Gene duplication plays a vital role in the enlargement of gene families [60]. Chromosome regions shorter than 100 kb containing more than two genes with a similarity greater than 40% are considered to be caused by a tandem duplication event [61]. We found two pairs of tandem duplicates, FtWRKY29 and 30 and FtWRKY50 and 51,

Expression patterns of FtWRKY in different tissue and under salt treatment
The heatmap of 78 FtWRKYs was constructed by log2 transformed FPKM values (Fig.4). FtWRKY37, 50, 54 and 55 were barely expressed in any of the selected tissues. Other 74 of 78 (94.9%) FtWRKYs were expressed at least in one tissue. Some genes, like FtWRKY8, 70 and 74, were highly expressed in all the tissues, suggesting these FtWRKYs may play an important role in the plant development of Tartary buckwheat. Additionally, some genes showed high expression simultaneously in flower, leaf and root, such as FtWRKY6, 43 and 51, while most of the genes showed lower expression levels in stem. Interestingly, a small number of genes presented tissue-specific expression profiling, for example, FtWRKY14 was only particularly expressed in root, and it may play key role in this tissue.
A total of 71 FtWRKYs (91.03%) were detected as being expressed under salt stress, indicating that most FtWRKYs genes were related to dealing with environment change. In total, 50 genes were upregulated, among these FtWRKY74 and 6 were most significantly followed by FtWRKY43 and 56.

Expression analysis of selected FtWRKYs under abiotic stress
TFs typically contain various types of DNA-binding domains and can improve expression at the transcription level under different abiotic stresses. Because the members of the same subgroups exhibit functional similarities [13], we selected four FtWRKYs (6, 7, 31, and 74) for an abiotic stress expression study in seedlings (Fig. 5), which were clustered in same groups with AtWRKY25, 33, 46, and 54. Under salt stress, all FtWRKY genes were upregulated during the first 3 h and then decreased, while FtWRKY6, 74, and 31 were significantly up-regulated. The highest expression (13-fold increase) was detected after 3 h in FtWRKY31, followed by FtWRKY74. Three FtWRKY genes (FtWRKY74, 31, and 7) were significantly induced by drought conditions, and maintained a highly increased expression of at least 2.67-fold until 24 h. Meanwhile, FtWRKY6 showed an almost 5-fold increase within 3 h, but decreased a few hours later. Under cold treatment, the relative expression of FtWRKY74 increased by more than 4.5-fold within 3 h, while the expression of FtWRKY6 and 7 was significantly upregulated at the 12 h. Meanwhile, FtWRKY31 fluctuated slightly with an upregulation of less than 1.3-fold after 3 h and downregulation below that of the control after 12 h. All genes exhibited similar expression patterns under heat treatment, with no clear changes within 3 h and significantly upregulation of at least 2.95-fold within 12 h. FtWRKY74 showed the highest expression (almost 10-fold increase), whereas FtWRKY7 showed only a small change.
All selected FtWRKY genes responded to the abiotic stress treatments. FtWRKY6 and 74 showed rapid and significant upregulation with all treatments within 3 h. Under heat treatment, all genes were induced after 3 h. Except for FtWRKY31 under cold treatment within 12 h and FtWRKY6 under heat treatment within 3 h, the expression levels of all genes under the various stresses showed greater fold-changes than the control. Within 3 h, the average fold-change with salt treatment was 9.24-fold, which was the most significant change, but only 1.09-fold with heat treatment. Meanwhile, the average fold-changes were 3.24 and 2.34 under drought and cold conditions, respectively. By contrast, after 12 h, the heat condition showed the highest average fold change (5.26-fold), while the salt treatment induced a fold-change of only 2.66. In addition, the drought and cold treatments caused foldchanges of 3.41 and 2.85 within 12 h, respectively.

Discussion
The WRKY family is one of the largest TF families in higher plants, and its members play an essential role in many physiological processes. In this study, a total of 78 FtWRKYs were identified, and all proteins presented clear differences, suggesting a high degree of complexity among FtWRKYs. The 78 FtWRKYs could be divided into three main groups, which was consistent with previous studies [6,8]. Comparing with other species (Table S1), the number in Tartary buckwheat (78 FtWRKYs) was closest to that of Arabidopsis (72 AtWRKYs) [17]. Therefore, they likely underwent similar evolutionary patterns. Interestingly, Group II was always the largest group among the species (e.g., 70.5% in Tartary buckwheat and 69.1% in soybean), suggesting that group II may have undergone significant expansion during the course of evolution and that group II mainly accounts for the diversity of the WRKY family among species.
Despite the high conservation of the WRKYGQK sequence in the WRKY family, there were still some interesting cases. The WRKYGKK sequence present in FtWRKY15 and 76 was similar to those identified in many species, such as tomato, apple, and peach [62 -64], this kind may have lost their ability to combine with the W-box [28]. Meanwhile, the variant type of WRKYDQK was only observed in FtWRKY50, and has also been reported in carrot [65]. Similar to other plants, the properties of FtWRKY proteins showed differences among groups and individuals [11].
From the phylogenetic tree, we identified at least one FtWRKY and AtWRKY in each subgroup, illustrating that the differentiation time of the WRKY family was earlier than the divergence time of Tartary buckwheat and A. thaliana. The FtWRKY members of the specific subgroups likely shared closely related motif compositions and functional similarities, which was supported by the subsequent gene structure analysis. Meanwhile, genes containing six exons were clustered in groups I and II, indicating that groups I and II constituted the ancestral genes.
Gene duplication plays a vital role in the enlargement of gene families [60]. Although a whole-genome duplication event occurred in Tartary buckwheat [42], only two pairs of tandem duplicates were found in the FtWRKY family, FtWRKY29 and 30 (90% similarity) and FtWRKY 50 and 51 (44% similarity). Both pairs of tandem duplicates were found in group II. By contrast, most duplication events in Arabidopsis and rice are found in group III. Surprisingly, no segmental duplications were identified in the FtWRKYs, therefore, duplication likely had a limited contribution to the expansion of the WRKY family in Tartary buckwheat. At the same time, a total of 63 orthologous gene-pairs between Tartary buckwheat and common buckwheat were identified, and the Ka/ Ks ratios of 62 to 63 pairs (98.41%) were < 1, indicative of strong purifying selection acting on these genes.
Like the expression pattern of WRKY in other species, different genes showed incongruous expression patterns, suggesting they have various function and diversity regulatory mechanisms [12,66]. According to the RNA-seq data of Tartary buckwheat, some genes were not expressed in any tissues, the reason may be some of them are pseudogenes, or they were expressed in other tissues we failed to collect. FtWRKY74 and 6 showed the highest expression level after salt treatment, which revealed these genes can improve the tolerance under osmotic stress, therefore were selected as the candidate genes to develop qRT-PCR analysis.
Substantial evidence has shown that WRKY TFs can improve the stress tolerance of plants by modulating their molecular and physiological metabolism [8,20]. In Arabidopsis, at least 26 AtWRKY genes have been demonstrated to participate in abiotic stress responses [67]. For example, overexpression of AtWRKY25 and 33 was reported under heat and salt treatment [32]. The expression pattern results suggested that the four assessed FtWRKYs were involved in the regulation of abiotic stress responses. Under heat treatment, all four genes were upregulated within 12 h, indicating that the WRKY TFs participated in a complex cross-regulation network under such stresses. FtWRKY6 and 74 simultaneously responded to salt and drought treatments, suggesting that they can co-regulate more than one adverse condition, possibly based on synergistic or antagonistic mechanisms. Overall, the results indicate that the selected four FtWRKY genes play an important role in the establishment of salt, drought, cold, and heat tolerance.
Furthermore, it is indicative of the selection method of responsive genes in our study is efficient. RNAsequencing analysis can help us to find the genes with higher expression under stress, while sequence alignment suggest the genes clustered with the reported stress resistance genes have similar functions.

Conclusions
We identified 78 FtWRKY proteins (77 genes) from the F. tataricum genome sequence, which could be divided into three groups. Although the proteins were diverse, the members in same groups and subgroups exhibited similar properties, such as gene structure and motif composition. The results suggested that duplication events contributed little to the expansion and evolution of FtWRKYs, and most FtWRKYs experienced purifying selection during the course of evolution. Finally, the expression analysis indicated that all four studied FtWRKYs (6, 74, 31, and 7) were regulated by abiotic stresses. In particular, FtWRKY6 and 74 were highly responsive to both salt and drought treatment. Based on results above, these four selected genes show potential for transgenic applications in Tartary buckwheat. This study is the solid foundation for expression analyses of additional FtWRKYs and related mechanistic studies, and can prove the fundamental theory to clone specific functional genes.