The complete mitogenome sequence of the coral lily (Lilium pumilum) and the Lanzhou lily (Lilium davidii) in China

Abstract Background The mitogenomes of higher plants are conserved. This study was performed to complete the mitogenome of two China Lilium species (Lilium pumilum Redouté and Lilium davidii var. unicolor (Hoog) cotton). Methods Genomic DNA was separately extracted from the leaves of L. pumilum and L. davidii in triplicate and used for sequencing. The mitogenome of Allium cepa was used as a reference. Genome assembly, annotation and phylogenetic tree were analyzed. Results The mitogenome of L. pumilum and L. davidii was 988,986 bp and 924,401 bp in length, respectively. There were 22 core protein-coding genes (including atp1, atp4, atp6, atp9, ccmB, ccmC, ccmFc, ccmFN1, ccmFN2, cob, cox3, matR, mttB, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7 and nad9), one open reading frame and one ribosomal protein-coding gene (rps12) in the mitogenomes. Compared with the A. cepa mitogenome, the coding sequence of the 24 genes and intergenic spacers in L. pumilum and L. davidii mitogenome contained 1,621 and 1,617 variable sites, respectively. In the phylogenetic tree, L. pumilum and L. davidii were distinct from A. cepa (NC_030100). Conclusions L. pumilum and L. davidii mitogenomes have far distances from other plants. This study provided additional information on the species resources of China Lilium.


Introduction
The mitogenomes of higher plants are usually highly conserved and highly variable across the plant taxa [1]. Mitogenome is usually transmitted to the progeny from the maternal parent [2]. Mitogenome of higher plants ranges from 200 to 2,400 kb due to the expansion and the reduction of the intergenic region [3,4]. The constant recombination and great variation in the spacer regions in mitochondrial DNA (mtDNA) promote the evolution of higher plants. During evolution, most of the genes in plant mtDNA have been lost [5]. Compared to the chloroplast, the plant mitogenome has lower mutation rates [6]. Accordingly, the mitogenome plays a specific role in the evolution of plants. There are many studies sequencing the complete mitogenomes of higher plants and describing their genome features [3].
The genus Lilium (Liliaceae) is a very important economic plant worldwide [7,8]. Lilium species are perennial herbaceous plants with ornamental and notable medicinal value [9]. In the research and development of traditional Chinese medicine (TCM), the pharmaceutical characteristics of Lilium species, especially the extracts including polysaccharides, kaempferol and jatropham, have been widely studied [10,11]. For instance, the extract from Lilium species showed antidiabetic, antioxidant, antiapoptotic and health-protective properties [10,12,13]. These properties promoted the research and the exploration of bioactive and potential pharmaceutical activities in Lilium in the TCM field.
The genus Lilium is composed of approximately 115 species [8,14], and about 55 species are centrally distributed in China [15]. Some of the Lilium species in China have not been well identified yet. The coral lily Lilium pumilum Redouté and the Lanzhou lily Lilium davidii var. unicolor (Hoog) cotton are two Chinese species with ornamental and economic values [16,17]. Both L. pumilum and L. davidii are the important original materials of TCM [16,18]. L. pumilum, which has beautiful bright red flowers, is an endangered species in China [19]. However, the genetic evolution of these two species has not been reported until now.
This study was performed to examine the complete mitogenome of the two Lilium species. The mitogenomes of L. pumilum and L. davidii were sequenced using Illumina HiSeq X Ten sequencing platform. The variations in the mitogenomes were identified by referring to the reference mitogenome sequence of the closely related Allium cepa (GenBank: KU318712.1, NC_030100.1). The phylogenetic tree analysis was performed to identify the evolution distance of these two species from other plants with completed mitogenome sequences. This study will enrich the Chinese germplasm bank of lily.

Plant materials and sequencing
The leaf samples of L. pumilum and L. davidii were obtained from Shaanxi Engineering and Technological Research Center for Conversation and Utilization of Regional Biological Resources, Yan'an, China. Both species were wild type and uncultivated plant genetic resources in China. Leaves were immersed into liquid nitrogen and stored at −80°C before use. Triplicate genomic DNA samples were extracted from leaves of each species using the cetyltrimethylammonium bromide (CTAB) method [20].
Following the genomic DNA quantity measurement (Qubit 3.0; Invitrogen, USA), samples were subjected to fragmentation, library construction (TruSeq™ DNA Sample Prep Kit; Illumina Inc, USA), PCR amplification and sequencing on the platform of Illumina HiSeq X Ten (pair-end 2 × 150 bp). Single-cell sequencing was performed on the PacBio RSII platform with 8-10 kb fragments.

Genome assembly and annotation
The Illumina raw data were processed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/), followed by alignment with the PacBio RS sequencing reads using BlasR [21] and data assembly using SPAdes-3.10.1 program [22]. Long sequences assembled (>500 bp) with high alignment ratio were considered as candidates. Mt contigs were screened from these assembled candidates by blasting against the plant mitogenomes in NCBI Organelle Genome Resources (https://www. ncbi.nlm.nih.gov/genome/organelle/). The Illumina clean data were then compared back to the assembled mt contigs, followed by modification (bug fix) using GapCloser (v1.12; http://soap.genomics.org.cn/soapdenovo.html). The mitogenomic data of the two species sequenced were finally obtained.
The tRNA sequences in the obtained mitogenomes were recognized using tRNAscan-SE (http://lowelab. ucsc.edu/tRNAscan-SE/). The protein-coding genes and rRNA-coding genes in the mitogenomes were identified by aligning against the closely related A. cepa mitogenome sequence (GenBank: KU318712.1; NC_030100), with the threshold of alignment p < e value 1e −5 . The exon and intron regions of genes were manually checked, and the gene sets in the mitogenome were identified. The variable sites in the coding sequence (CDS) of proteincoding genes and rRNA genes and intergenic spacers, as well as the structural variations (>100 bp) in the complete mitogenome sequence of the two Lilium species, were identified by comparison with A. cepa mitogenome (NC_030100) using the MUMmer and LASTZ alignment tools with default parameters.

General features of the two mitogenomes
The Illumina sequencing generated 36,455,269 clean reads in L. pumilum and 59,258,050 clean reads in L. davidii with an averaged GC content of 45.06 and 44.87%, respectively. Only contigs >500 bp were included for the annotation of mitogenome. Accordingly, the mitogenome of L. pumilum was 988,986 bp in length with 45.06% GC content, and the mitogenome of L. davidii was 924,401 bp in length with 44.87% GC content ( Table 1). These genomes were longer than 773,279 bp in Vitis vinifera [5], 490,520 bp in O. sativa [24], 316,363 bp in A. cepa [25], 678,653 bp in C. nucifera [26] and 452,526 bp in T. aestivum cv. Chinese Yumai [27], and smaller than 1,106,521 bp in Schisandra sphenanthera (Austrobaileyales) [28]. The obtained mitogenomes (988,986 and 924,401 bp) were of median size compared to these plants, but were relatively larger than the aforementioned monocotyledons. This difference might due to the expansion of the intergenic region [27,29]. The annotation of variable sites in the mitogenome would be of great importance.

Annotation of variable sites in the mitogenomes
There were 1,621 and 1,617 variable sites in the CDS of the 23 annotated genes and the intergenic spacers between genes in the mitogenome of L. pumilum and L. davidii in comparison to A. cepa mitogenome, respectively ( Table 1). The number of indels in the complete mitogenome sequence of L. pumilum (n = 45) was higher than that of L. davidii (n = 34). In addition, the variable sites in the mitogenomes in L. pumilum and L. davidii aligned onto 22 core protein-coding genes (including atp1, atp4, atp6, atp9, ccmB, ccmC, ccmFc, ccmFN1, ccmFN2, cob, cox3, matR, mttB, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7 and nad9), one open reading frame (ORF; orf725) and one ribosomal protein-coding gene (rps12) in A. cepa ( Table 2). There were 733 and 484 variable sites in gene CDS and intergenic spacers in L. pumilum and L. davidii, respectively ( Table 2). tRNA genes were not identified. The number of annotated protein-coding genes in the complete mitogenome sequence of L. pumilum and L. davidii was significantly lower than that in other reported plants, including 35 genes in T. aestivum cv. Chinese Yumai [27], 37 genes in V. vinifera [5], 41 genes in S. sphenanthera [28] and 72 genes in C. nucifera L. (coconut palm) [26], but was identical to the 23 protein-coding genes in the mitogenome of A. cepa [25]. Our previous research showed that the mitochondrial genome size of different plants is very different from others and the reference mitogenome of A. cepa due to the expansion and reduction of the intergenic region [27,29]. The narrowed gene number in L. pumilum and L. davidii might be due to the small number of known genes in the reference mitogenome in A. cepa. However, the larger mitogenome size in L. pumilum and L. davidii compared with the reference mitogenome of A. cepa might suggest that there was a large percent of mitogenomic regions with a number of unknown genes, which might be of great values for the function and evolution analysis of both L. pumilum and L. davidii species. N50 and N90, the length for which the collection of all contigs of that length or longer contains at least 50 and 90% of the total of the lengths of the contigs, respectively.  Target and query length, the length of SV in the target and query sequences; L. in TS, length of corresponding SV in target sequence; L. in QS, length of corresponding SV in query sequence.
Of the protein-coding genes, atp1, matR and ccmFN1 had the highest number of variable sites (110, 89 and 52, respectively) in the CDS regions in reference to A. cepa ( Table 2). In addition, 96, 70, 42, 40 and 40 variable sites were identified in the intergenic spacers around atp1, nad6, nad5, ccmC and nad9, respectively. These results might reveal that these genes are of great value for the evolution of L. pumilum and L. davidii species [2]. By contrast, a small number of variable sites were found in the CDS of nad1, nad3, nad7 and cox3, and in the intergenic spacers around them ( Table 2). These results might suggest their smaller contribution to the evolution of the two mitogenomes. The fact that there were no variable sites in the CDS of cox3, atp9 and nad3 genes showed that the CDS regions are well-conserved regions between the two species sequenced and A. cepa [30]. In addition, this conservation might govern the conservative evolution of mitogenome during the evolution of these genes [31].

Structural variations in the mitogenomes
The identified structural variations in the mitogenomes of L. pumilum and L. davidii are listed in Table 3. There were 14 variations in both mitogenomes with a length range of   Table 3). All the variations in L. pumilum were circular types, and half variations in L. davidii were circular types. One linear structural variation happened in ccmFc, atp1 and nad1 genes, which had a large number of variable sites (Tables 2 and 3). These variations might be of important values in the evolutionary history of L. pumilum and L. davidii [32,33].

Evolutionary analysis of the two Lilium species
The phylogenetic tree was constructed based on the concatenated amino acid sequences of the 23 protein-coding genes in the two Lilium species and other 27 plants. Figure 1 shows the phylogenetic evolutionary position of L. pumilum and L. davidii. The two Lilium species had close relationships with the Brassicales and Poales clades, while A. cepa showed close relationships with Poales clades and C. nucifera (Figure 1). Both L. pumilum and L. davidii were distant from A. cepa (NC_030100; Figure 1). These distinct genetic distances between L. pumilum or L. davidii and the reference mitogenome of A. cepa might be due to a high number of variable sites in the mitogenomes of L. pumilum and L. davidii species.
A previous phylogenetic study based on 79 chloroplast genes showed that L. longiflorum (Liliaceae) and Alstroemeria aurea Graham (Alstroemeriaceae) were clustered into the Asparagales + Commelinids clade, showing the close genetic relationships between them [34]. İkinci et al. [35] reported that there were remote evolutionary distances among different lily species from different regions in European and other countries based on the internal transcribed spacer regions of nuclear ribosomal DNA. Similar results of Japan lily species were reported by Nishikawa et al. [36]. We suppose that the more number of the annotated protein-coding genes in the mitogenome of the two lily species, the more detailed information about lily's evolution analysis will be obtained.

Conclusions
In summary, we obtained the complete mitogenome sequences of L. pumilum (988,986 bp)

Conflict of interest:
The authors state no conflict of interest.