Genetic evaluation and germplasm identification analysis on ITS2, trnL-F, and psbA-trnH of alfalfa varieties germplasm resources

Abstract In this study, genetic diversity and germplasm identification of 28 alfalfa germplasm cultivars materials were evaluated by analyzing their internal transcribed spacer 2 (ITS2), trnL-F, and psbA-trnH sequences to provide the innovative reference of alfalfa varieties genetic diversity and identify research. The results showed that the fragment average length of ITS2, trnL-F, and psbA-trnH sorting sequences were 455.7 bp, 230.3 bp, and 345.6 bp, respectively. The ITS2 sequence was too conservative to reflect the individual differences between intercultivars and intracultivars in the preliminary experiment. Furthermore, trnL-F and psbA-trnH sequence differences were relatively small between intercultivars but significant between intracultivars. Alfalfa cultivars were divided into four groups by sequence similarity clustering. Alfalfa cultivars trnL-F and psbA-trnH sequences have apparent differences, showing that chloroplast conservative sequences were independent evolution. Compared with trnL-F and psbA-trnH sequences of alfalfa cultivars, psbA-trnH sequence has abundant variation sites and can better reflect the differences between cultivars than the trnL-F sequence. Therefore, the psbA-trnH sequence can identify different alfalfa cultivars and establish the DNA sequence fingerprint.


Introduction
Alfalfa (Medicago sativa L.) is originated in Asia Minor, the Caucasus, Iran, Turkmenistan, and Central Asia [1] and is one of the most important leguminous forage in the world due to its high yield, high quality, and strong adaptability. M. sativa L. is the main research object of breeding, and the resources of alfalfa varieties are the most important basic production materials for animal husbandry and ecological improvement. By 2020, 113 national-level alfalfa varieties have been approved and registered in China [2]. Alfalfa and hybrid alfalfa are the main varieties formed by breeding. Compared with the developed countries in animal husbandry, such as the United States, Canada, New Zealand, and Australia, the number of varieties formed is small, and the breeding efficiency is low. Genetic research and identification of alfalfa germplasm resources have important theoretical and practical significance for alfalfa breeding and the development of the alfalfa industry.
Genetic diversity is the basis of species diversity and ecosystem diversity. It studies the degree of difference between species at the molecular level and reflects the genetic variation within species and between different populations [3]. Research on alfalfa genetics and genetic diversity in China mainly focuses on population morphology, agronomic traits, resistance and physiological characteristics, molecular markers, and DNA sequence analysis. Genetic diversity and identification of cultivar groups based on conserved sequences, i.e., chloroplast and ribosomal genome sequences, have been reported in alfalfa. The chloroplast genome is a closed doublestranded circular molecular structure, generally 115-210 Kb [4]. Chloroplast genes have relatively conservative characteristics, a small genome, a large amount of DNA information, a large copy number, and a high mutation rate of noncoding regions. The psbA-trnH gene sequence is the gene sequence of the noncoding region of chloroplast RNA, and the noncoding region is the region that cannot be transcribed, which is messenger RNA. The sequence is easy to be amplified and sequenced. It contains a large amount of information and has stable maternal inheritance, which can be efficiently applied to compare differences between different groups within a species. The trnL-trnF (trnL-F) gene sequence is the chloroplast RNA coding gene and the noncoding region gene sequence [5]. Sequence fragments have the advantages of simple sequence, convenient amplification, high evolution rate, little influence on the external environment, and easy mutation and variation of bases between different populations. It is widely used in interspecific and subspecific level genetic evolution phylogeny research. The chloroplast gene trnL-F sequence, psbA-trnH sequence, and nuclear gene internal transcribed spacer (ITS) sequence marker systems are mainly used in plant phylogeny research, species genetic diversity, and medicinal use in China. There are few reports on identifying plant Chinese medicinal materials in the herbage field. For example, datureae plants analysis, Bupleurum marginatum var. analysis based on internal transcribed spacer 2 (ITS2) barcode [6,7], mainly concentrates on forage and the phylogenetic research of Poaceae materials. In relevant studies, phylogenetic analysis has been conducted on the materials of the genus Alkali, the genera Astragali radix, the Medicago, and their relatives [8][9][10][11][12][13]. Ribosomal DNA ITS sequences are a family of genes encoding ribosomal RNA in the nucleus of plant cells and are nuclear gene fragments [6]. The sequence of the ribosome coding region is generally highly conserved, and there are few reports on the study of the conserved ribosome sequence in alfalfa in China. In this study, 28 alfalfa varieties and germplasm materials were taken as the research objects, and their ITS2, trnL-F, and psbA-trnH sequences were analyzed for genetic and germplasm identification. The genetic structure characteristics and genetic diversity of alfalfa variety germplasm resources in different populations were analyzed. The research has a theoretical reference for screening excellent alfalfa germplasm, establishing an alfalfa variety identification system, and exploring and utilizing alfalfa variety germplasm resources. Furthermore, it fills the gap of research on the genetic and variety identification characteristics of alfalfa in China combined with trnL-F, psbA-trnH, and other conserved sequences.

Experiment material
The test materials are 28 alfalfa varieties (materials) from different countries and regions ( Table 1). The seeds of the experimental germplasm resources were obtained from the National Medium-term Forage Germplasm Resource Bank of the Grassland Research Institute, Chinese Academy of Agricultural Sciences. The test material was planted in the Agricultural and Animal Husbandry Interlaced Area Experimental Demonstration Base of the Grassland Research Institute of the Chinese Academy of Agricultural Sciences in 2012 and was planted as a single plant. In 2018, alfalfa was sampled at the branching stage, and 10 fresh and young leaf samples were collected from each material, placed in an ice box, and brought back. Then, samples were stored in a −80°C ultra-low temperature refrigerator in the laboratory for future use.
There were differences in the number of materials selected for the alfalfa varieties tested by different sequences, and two materials were selected for ITS2 ( Table 1). Among them, the ITS2 sequence research was conducted on Medicago sativa L.cv.Aohan and Medicago varia Martin.cv.Caoyuan No. 2, and Aohan and Caoyuan No. 2 are two species, and there should be big differences in gene sequences, but two copies were found in the preliminary identification test. The DNA sequences of the materials showed little difference ( Figure 1). Therefore, other corresponding material experiments were not carried out in the follow-up. The ITS2 sequence was unsuitable for analyzing genetic diversity among alfalfa varieties or species. According to the presence or absence of materials in the research process, 24 alfalfa materials were selected for the psbA-trnH sequence, and 23 materials were selected for the trnL-F sequence.

DNA extraction
The DNA extraction method of single leaf material was extracted by DNA extraction kit (CTAB Plant genome extraction kit, BLKW, Beijing). DNA concentration and purity were detected by the 1.0% agarose detection method and trace UV/Vis spectrophotometer, respectively.

Primer and polymerase chain reaction
(PCR) amplification procedure The primers used in the experiment are shown in Table 2.
The research results of NCBI number are presented in trnL-F/psbA-trnH sequence PCR 30 μL reaction system includes 1.5 μL diluted DNA solution, 15 μL PCR Mix, 10.5 μL ddH 2 O, 1.5 μL primer F, and 1.5 μL primer R. They were mixed to form the reaction system and centrifuged for use. The amplification program was optimized for reaction conditions. The amplification is performed as follows: pre-denaturation at 94°C for 3 min, denaturation at 94°C for 30 s, annealing at 49°C for 30 s, and extension at 72°C for 30 cycles for 90 s, and then extended at 72°C for 7 min and stored at 4°C for future use.
The amplified products were subjected to 1.0% agarose gel electrophoresis, stained with nucleic acid dyes, and observed and photographed with a gel imaging system.

PCR product sequencing
The PCR amplification products were purified and used for the sequencing reaction. The ITS2, trnL-F, and psbA-trnH sequences of all product samples were determined by Shenzhen Huada Gene Technology Co., Ltd. Sequencing was performed by direct sequencing of PCR products, and each sample was sequenced in forward and reverse directions to ensure the sequencing accuracy.

Data analysis
The DNA sequence fragments obtained by sequencing were used for sequencing quality evaluation. Forward and reverse sequence sequencing results were analyzed by Codoncode Soviet Union 6220 Medicago falcata Medicago falcata L. China trnL-F, psbA-trnH Aligner, seqMan, and DNAMan. Furthermore, CLUSTALX 2.0 software was used for sequence alignment. The genetic distances of the aligned sequences were calculated by MEGA 7.0 software, and the molecular phylogenetic tree was established by the kimura method. The confidence of each branch of the phylogenetic tree was tested by bootstrap (1,000 repetitions), and gaps were always treated as missing.
3 Results and analysis 3.1 ITS2 sequence polymorphism analysis of alfalfa Aohan and Caoyuan No. 2 were selected as research materials, and 18 forward and reverse ITS2 sequences  were obtained by PCR amplification, recovery, and sequencing. The sequencing results of the nine genotypes ITS2 with unidirectional sequences are shown in Figure 1.

trnL-F sequence polymorphism analysis of alfalfa
A total of 115 forward and reverse trnL-F sequences were obtained by PCR amplification, recovery, and sequencing of 23 alfalfa germplasm materials. After removing primer sequences, hybrid and repetitive sequence fragments, 23 alfalfa materials (genotypes), and trnL-F unidirectional sequences were sorted out. The results are shown in

Cluster analysis of trnL-F and psbA-trnH sequence in alfalfa
The similarity of trnL-F sequences of 23 alfalfa germplasms was clustered (Figure 4). Analysis showed that the trnL-F sequence homology similarity between Japan 90 and Xinjiang Daye as well as Zhaodong and Zhongmu No. 2 is the largest, both of which are 100%. Tumu No. 1 and other alfalfa varieties have the lowest sequence homology similarity of trnL-F, with a similarity of 94%.
The trnL-F sequence homology of M. falcata and most materials is low, ranging from 94 to 95%. The test materials were clustered using the trnL-F sequence homology similarity of 96% as the classification standard. Among them, the materials (varieties) such as Junggar, M. falcata, Tumu No. 1, Gannong No. 3, and Soviet Union 6220 were grouped into one category, and other varieties were grouped into one category.
The clustering results of trnL-F sequence homology similarity between M. falcata and other test materials did not reach the outgroup (homogeneity) level with the highest sequence similarity. This result reflects that the trnL-F sequence is relatively conserved in alfalfa species such as M. falcata L. and M. sativa L. Comparison of M. falcata trnL-F sequence with other trnL-F sequences showed that ATTT at the 100 bp site and AT at the 116-117 bp site are specific bases and can be used as the basis for the identification of M. falcata L. and M. sativa L. resources. Figure 5 shows that the similarity of psbA-trnH sequences of 24 alfalfa germplasms was clustered. Analysis suggested that the homology among the alfalfa The psbA-trnH sequence homology similarity between Junggar alfalfa and other alfalfa is 95%. With the psbA-trnH sequence homology similarity of 96% as the classification criterion, the test materials can be grouped into one category except for Junggar alfalfa. The psbA-trnH sequence homology of alfalfa and most alfalfa materials can reach 100%. The average length of the psbA-trnH sequence is 345.6 bp. The sequence is rich in polymorphisms such as single-nucleotide variation sites, parsimony information sites, and insertion and deletion fragments. The psbA-trnH sequencing results can better identify alfalfa variety resources and can be applied to alfalfa variety identification.

Sequence comparison of trnL-F and psbA-trnH in alfalfa and DNA barcode formation analysis
The trnL-F sequences of 24 alfalfa varieties (materials) and the psbA-trnH sequences of 23 alfalfa varieties (materials) show sequence differences. The corresponding sequences can form DNA sequence barcodes to identify alfalfa varieties. According to Figures 2 and 3, the psbA-trnH sequence variation sites are abundant for all sequenced alfalfa materials. The trnL-F sequence forms 67 variation sites and 84 psbA-trnH sequences, according to statistics. Table 3 shows that the sequencing fragments are sorted and aligned within a single variety, and the psbA-trnH variation sites of the materials within varieties range from 0 to 5, with an average of 0.79. The trnL-F variant sites range from 0 to 14, averaging 5.09. Among the variation rates of trnL-F and psbA-trnH sequences within varieties, the average content of G + C fragments of trnL-F sequences in each variety is 32.4%. The highest single nucleotide variation rates among varieties are shown in Zhongmu No. 2 and Canadian alfalfa, both with 1.46%. The average G + C content of the psbA-trnH sequence fragment is 18.3%. The highest intravariety variation rate is Czech 26-1 alfalfa, with a variation rate of 4.78%. Based on the comparison results of alfalfa DNA sequences between and within varieties, the DNA identification barcodes of alfalfa psbA-trnH sequences are more abundant in polymorphisms between varieties. In addition, the variation rate within varieties is relatively low, and the identification of alfalfa varieties has formed a clearer DNA barcode sequence.

Discussion
With the innovation and progress of identification technology, fingerprints and DNA barcoding have become the core research fields of identification technology. They are widely used in biological research and are considered the most efficient and accurate technical system for identifying biogenetic characteristics from root genes and biological functional tissues. The quality of alfalfa cultivar resources determines the yield potential and resistance to a large extent. Research on the development of an accurate, reliable, rapid, and simple variety identification method is important for identifying the authenticity of alfalfa varieties, strengthening the construction of the seed quality standard system, and improving the quality of alfalfa seeds. Currently, the identification of alfalfa varieties mainly relies on conventional identification methods. The corresponding fingerprint technology is affected by factors such as the detection object, technology, and equipment, and few standardized detection procedures have been established. Standardized test procedures need to be strengthened. Fingerprints and DNA barcoding have the advantages of rapidity, accuracy, and little impact on environmental pressures in identifying the genetic variation and genetic composition of alfalfa varieties. Although many scientists have done a lot of research on DNA fingerprints, the identification of alfalfa varieties is still a difficult problem. This article mainly explored the use of ITS2, trnL-F, and psbA-trnH markers to identify alfalfa varieties. We use only 28 genotypes, but the technology we have established can group these alfalfa varieties well. The preliminary verification of these technologies shows that this is of great value for the identification of alfalfa varieties and is worth promoting and exploring.
Ribosomal DNA ITS is a family of genes encoding ribosomal RNA in the nucleus of plant cells [14]. Ribosomal coding region sequences are generally highly conserved, similar to chloroplast-conserved sequences. The corresponding spacer sequence is characterized by a large amount of mutation information, rapid mutation, simple sequences, convenient amplification, high evolution rates, less influence by the external environment, and no functional restrictions. Currently, it is widely used in the study of genetic evolutionary phylogeny at the interspecific and subspecific levels. In reference [15], six Leguminosae forages were tested, and ITS cannot be used as DNA barcoding candidate sequences due to low amplification efficiency. In reference [16], 156 alfalfa populations were tested. The ITS fragments of other species, except for a few relatives, also showed a high ability of species to define. The results of the former are preferred. Comparing the results of different studies revealed that the selected primer fragments differ, showing different results. The screening and identification of primers are critical. Comprehensive analysis showed that the ITS2 primer used in this research needs to be further optimized, and screening should be performed in many species, i.e., to distinguish the DNA barcoding potential of different alfalfa varieties.
Chloroplast gene trnL-F and psbA-trnH sequences are chloroplast spacer gene fragments with high amplification success rate and short purpose fragment length [17]. The chloroplast gene trnL-F and psbA-trnH sequence and nuclear gene ITS sequence marker systems are mainly used in plant phylogeny research, species genetic diversity, and identification of medicinal plants in china. Among them, the systematic classification of medicinal plants and the identification of Chinese medicinal materials are the most widely used, and the corresponding technology tends to be mature. Currently, Rosaceae, Rutaceae, Honeysuckle, Ginseng, Cistanche, Euphorbia, Shegan, Chonglou, and other families, genera, and species have initially established DNA barcodes and fingerprints based on conserved chloroplast sequences [18][19][20][21][22][23][24]. The genetic diversity and molecular phylogenetic classification of medicinal plant germplasm resources are mainly carried out in the families, genera, and species of sage, Schisandra, Trillium, Ophiopogon japonicus, Crow garlic, tulip, and onion [25][26][27][28]. The chloroplast gene trnL-F, psbA-trnH sequence, and the nuclear gene ITS sequence marker systems are rarely reported in forages in China, mainly in Sichuan Agricultural University and Lanzhou University, with more phylogenetic studies in gramineae materials. In reference [7], ITS sequences and psbA-trnH sequences were used for phylogenetic analysis on 37 materials of 6 genera, including Elymus. In reference [8], the psbA-trnH sequence was used to study the genetic diversity of 87 populations of M. japonica. In reference [9], ndhF, psbA-trnH, and trnL-F gene sequences were adopted to study the developmental phylogeny of 67 materials of the genus Gossypium and its relatives. In reference [6], ITS sequences were used to conduct phylogenetic analysis on 41 materials of the genus Laysporium and its relatives. The application of legume and alfalfa mainly includes the classification of pathogen populations, the legume phylogenetic relationships, and species delimitation [29][30][31]. In reference [16], samples representing 21 naturally distributed species in China were collected, and the chloroplast genomes of 75 individuals representing 20 species were assembled. The results showed that 18 species are well delimited except for Medicago sativa, Medicago falcata, and Melilotus officinalis. The primers used in this study verified the feasibility of identifying alfalfa species as DNA barcoding and formed alfalfa variety identification barcoding using psbA-trnH and trnL-F. Compared with the 23 materials used in the study, psbA-trnH sequence identification was clearer and more accurate, and the effect of distinguishing different alfalfa varieties was better. The studies on genetic diversity and variety identification of alfalfa have a one-sided analysis regarding morphological characteristics, agronomic traits, functional proteins, isozymes, and molecular markers. Conservative gene sequences can effectively complement their research shortcomings. To study the new and effective methods of alfalfa genetic diversity, further indepth related research can be conducted to make the research methods and technical means more suitable for the research of alfalfa genetic diversity and variety identification.

Conclusion
In this study, the widely used sequences of ribosomal ITS2 and chloroplast trnL-F and psbA-trnH were used to analyze the genetic diversity and variety identification of alfalfa varieties. The alfalfa ITS2, trnL-F, and psbA-trnH sequences were sequenced to obtain the sequenced fragments with average lengths of 455.7 bp, 230.3 bp, and 345.6 bp, respectively. Among them, the alternative fragments of the ITS2 sequence showed high conservation both within and among varieties. The trnL-F and psbA-trnH sequences differed little within varieties but had obvious differences among varieties. According to the alfalfa similarity clustering of trnL-F and psbA-trnH sequences, the tested alfalfa varieties were clustered into four categories. The similarities of trnL-F and psbA-trnH sequences were different among alfalfa varieties, and the differences were large, indicating that the conserved chloroplast sequences in each variety population were independently evolved. The comparison of the trnL-F and psbA-trnH sequences of alfalfa varieties showed that the psbA-trnH sequence had more variant sites and richer sequence polymorphisms. Thus, the sequence is more suitable for forming clear DNA barcoding for alfalfa variety identification. During alfalfa variety identification and genetic diversity analysis, the ITS2 sequence was too conservative. The initial sequencing results in the previous experiment could not reflect the differences between varieties and individuals within varieties. The corresponding polymorphisms of the chloroplast gene sequences are abundant, which can better reflect the differences between varieties and are more suitable for identifying different alfalfa varieties.
Funding information: This work was supported by the Inner Mongolia autonomous region seed industry demonstration projects of major science and technology innovation (high quality alfalfa varieties breeding and industrial demonstration 2022JBGS0020).
Author contributions: Wang Y. and Wang M.J. conceived and designed the research. Xu C.B. and Tong L.G. conducted experiments. Zhang X.M. analyzed the data. Wang Y. wrote the paper. All authors read and approve of the manuscript.