Discovery of exercise-related genes and pathway analysis based on comparative genomes of Mongolian originated Abaga and Wushen horse

Abstract The Mongolian horses have excellent endurance and stress resistance to adapt to the cold and harsh plateau conditions. Intraspecific genetic diversity is mainly embodied in various genetic advantages of different branches of the Mongolian horse. Since people pay progressive attention to the athletic performance of horse, we expect to guide the exercise-oriented breeding of horses through genomics research. We obtained the clean data of 630,535,376,400 bp through the entire genome second-generation sequencing for the whole blood of four Abaga horses and ten Wushen horses. Based on the data analysis of single nucleotide polymorphism, we severally detected that 479 and 943 positively selected genes, particularly exercise related, were mainly enriched on equine chromosome 4 in Abaga horses and Wushen horses, which implied that chromosome 4 may be associated with the evolution of the Mongolian horse and athletic performance. Four hundred and forty genes of positive selection were enriched in 12 exercise-related pathways and narrowed in 21 exercise-related genes in Abaga horse, which were distinguished from Wushen horse. So, we speculated that the Abaga horse may have oriented genes for the motorial mechanism and 21 exercise-related genes also provided a molecular genetic basis for exercise-directed breeding of the Mongolian horse.


Introduction
As an ancient breed, the Mongolian horse has gone through a long breeding period [1]. With a view to research tendentiousness, researchers pay more attention to traits of Thoroughbred horse [2][3][4] and Quarter horse [5,6], but not the Mongolian horse and its diverse sub-branch. The preeminent endurance and stress-resistance of Mongolia horses are important factors for them to well adapt to the cold and harsh plateau environment [7]. Natural factors may have enormous impacts on evolution owing to the rough domestication of the Mongolian horse [8]. Due to various geographic conditions and human necessities, the Mongolian horse gradually formed several specific traits. Some horses which adapt to the desert climate have larger feet, for instance, some horses which adjust to a mountain road with rocks have supple body and hard hoofs; in addition, the features of a horse which accommodate to the grassland climate have tall physique and are good at running [9]. Living in the Xilin Gol grassland of Inner Mongolia, the Abaga horse belongs to the steppe horse and speeds up to 1,600 m every 91.47 s [1]. Wushen horse, which is small build and has broad-flat horseshoe, as symbol of the desert horse in the south of Maowusu desert of Ordos City in Inner Mongolia, can hoof steadily in the desert, albeit not fast at a running speed of 13-15 km/h [10].
In Mongolia, the herdsmen depend on horses for reasons that they are the indispensable sources of pastoral rations, such as meat and dairy products, and used to be one of the means of transport by herders [8]. Furthermore, the Mongolian horse was an essential and distinguished war horse in history [11]. For the Naadam of traditional festivals in Mongolia, horse racing is one of the entertaining activities for herds and is regarded as the second most popular sporting event after wrestling [12]. So, the running speed of Mongolian horses has been one of the focus of attention. Despite not being the fastest horse in the world, people still endeavor to improve the running speed of the Mongolian horse through unremitting consideration and breeding.
To discuss the genetic variation between Abaga horse and Wushen horse in Mongolian horse strains, we planned to analyze data of the entire genome with second-generation sequencing technology to seek out exercise-related single nucleotide polymorphism (SNP) locus of Mongolian horse and offer a reference for identification and improvement of Mongolian horse varieties.

Experimental animals and sample preparation
We selected four healthy Abaga horses (two female and two male) of 1-year-old in the Inner Mongolia Abaga County and ten good-conditioned Wushen horses (eight female and two male) at age of 4-6 in the Inner Mongolia Ordos City Wushen County. We collected the jugular blood of animals. Adhering to the manufacturer's instructions for the extraction of DNA from the whole blood, the genome was extracted by the AxyPrep blood genomic DNA kit. Then we used the NanoDrop 1000 spectrophotometer and polyacrylamide gel electrophoresis to detect the concentration and integrity of the genome. The con-

Data quality control and comparison to reference genome
The raw data of 639,723,611,100 bp were sequenced from 14 samples. The inferior quality reads which have sequencing adapter, higher than 10% of N (base of uncertainty) content or inferior mass base (Q ≤ 5) content of higher than 50% were filtered out by in-house Perl/Python scripts to achieve clean data of 630,535,376,400 bp. The Q20, Q30, error rate, GC content, and other information on these data were counted by in-house Perl/Python scripts. The sequencing reads were mapped to reference genomes (Ensembl release 82) by BWAmem (bwa-0.7.8) [13], and polymerase chain reaction (PCR) and optical repetition of results were removed by using Picard [14]. Statistics of mapping rate, average depth, and coverage of the data after comparison were computed by in-house Perl/Python scripts.

SNP calling and annotation
SNP calling was performed using the GATK HaplotypeCaller (v3.5) [15]. To evaluate the reliability of the detected SNP sites and filter inferior quality SNP, we used SAMTools for SNP detection [16]. Simultaneously, the dbSNP database of 5,019,393 SNPs and 670K chip site information were downloaded. The data were used as the training set, and the detected SNPs were evaluated and filtered by using the GATK VQSR process. The standard for the retention of the final site is the tranche value of 99 (Ti/Tv = 2.02). Finally, the SNPs of equine population were filtered: GQ > 10, MAF > 0.05, and call rate > 0.9. The variants after filtering were annotated by ANNOVAR (v2016-02-01) [17].

Selective sweep analysis
To identify potential selective sweeps between Abaga horse (fast) and Wushen horse (slow), Pi log2(slow/fast) and F-statistics (FST) were calculated together using VCFtools with a 20 kb sliding window and a step size of 10 kb. Windows that contained less than ten SNPs were excluded from further analysis. The windows that were simultaneously (1) in the top 5% of Z-transformed FST values and (2) in the bottom 5% Pi log2(slow/fast) were considered to be candidate selective regions in Abaga horse. The same applies to the Wushen horse.

Statistics and advanced analysis of positively selected and candidate genes
We annotated the positively selected genes via gene ontology (GO; GOseq) to further screen out the major enriched functions [18]. The pathways which included these selected genes were enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG; KOBAS) [19]. Many positively selected genes which are in the Abaga horse were analyzed further without the overlapped genes between the Abaga horse and Wushen horse.

Related-clean data stated
We performed the entire genome second-generation sequencing for the whole blood of four Abaga horses and ten Wushen horses with the Illumina HiSeq X Ten sequencing platform. The clean data of 630,535,376,400 bp (effective rate of data: 98.56%, error rate of data: 0.03%, mean of Q20: 94.95%, and mean of Q30: 89.80%) were sequenced by filtration and 41.95G as the mean of clean data was generated in each sample ( Table 1). Then, the data were mapped to the reference genome (Ensembl release 82) via BWAmem [13].
Without PCR and optical repetition, the successful mapping rate of data was 98.36%. For the 14 samples, the average sequencing depth was 16.75 × coverage and the average cover degree was 99.55% on reference sequences ( Table 2).

Distribution of positive selection genes on chromosomes
Based on the data of SNP following SNP calling (Figure 1), we obtained the genes of significant genetic differences by using FST between Abaga horses and Wushen horses and narrowed the above genes down to 479 and 943 positively selected genes combined with SNP polymorphism analysis in Abaga horses and Wushen horses, respectively ( Figure 2). We discovered that these selected genes were mainly distributed on chromosomes 4, 7, and 10 in Abaga horses, and on chromosomes 1, 4, 8, and 16 in Wushen horses with a analysis of genes distribution on the chromosome ( Figure 3).

GO, KEGG pathways, and exerciserelated genes
The By pathways enrichment analysis of KEGG with the positively selected 479 genes in Abaga horse and 943 genes in Wushen horse, the enriched pathways (P ≤ 0.05) of Abaga horse included propanoate metabolism, viral myocarditis, phototransduction, PI3K-Akt signaling pathway, glycerolipid metabolism, morphine addiction, and mRNA surveillance pathway in which the pathway with the largest number of enriched genes (13 genes) was PI3K-Akt signaling pathway. Besides that, the enriched pathways (P ≤ 0.05) of the Wushen horse contained base excision repair, glutamatergic synapse, endometrial cancer, glycolysis/gluconeogenesis, propanoate metabolism, and ABC transporters ( Figure 5).

Discussion
Many genes of the positively selected 479 and 943 genes were enriched on chromosome 4, and the enrichment quantity of the positively selected genes was secondary by the chromosome enrichment analysis both in Abaga horse and Wushen horse. In the statement of Schröder et al. [29], athletic performance-related genes were significantly enriched on chromosomes 4 and 12 of horses, which coincided with the different traits of running speed in our exploring direction. Possibly, we will take equine chromosome 4 as the exercise-related emphasis of scientific research.
The athletic ability of the horse may be influenced not only by physiology but also by thought and motive. According to the previous studies, equine exercise-related genes included DRD1-5, SLC6A4, and BDNF, the three gene functions were related to many neurological processes, involving motivation, pleasure, cognition, memory, learning, fine motor control, modulation of neuroendocrine signaling, the adaptive ability to control emotions, supporting the survival of existing neurons, encouraging the growth, and differentiation of new neurons and synapses [64][65][66][67]. The GO analysis results of the Abaga horse were also preferentially enriched in neuronal composition, neurotransmission, and brain cell migration. These genes may allow the Abaga horse to quickly observe and distinguish the surrounding during moving at high speed, and timely rectify the status to respond to various circumstances.
In this research, 13 positively selected genes were enriched in the PI3K-Akt signaling pathway. As intracellular basal signaling pathways, the PI3K-Akt signaling pathway involves lots of vital movement, such as exercise-induced physiologic hypertrophy [21,23,24,26] and protecting mitochondria of skeletal muscle by aerobic endurance training [27], further explaining the excellent athletic performance of Abaga horse. Besides the PI3K-Akt signaling pathway, we also found many exercise-related pathways that were metabolic pathways, arachidonic acid metabolism, regulation of actin cytoskeleton, fatty acid metabolism, Ras signaling pathway, MAPK signaling pathway, Hippo signaling pathway, valine, leucine, and isoleucine degradation, cardiac muscle contraction, NF-kappa B signaling pathway, and insulin signaling pathway [20,22,25,[28][29][30][31][32] (Table 3). But, in our study, the enriched genes of positive selection were different from the previously studied genes in the above exerciserelated pathways ( Table 3), which indicated species-specific genes of positive selection in Abaga horse compared with other species (human, rat, mouse, leopard, Thoroughbred horse, etc.).
Counting on exercise-related genes of previous studies, the equine athletic performance is related to glucose metabolism, stress immune response, angiogenesis and muscle supply, insulin signal transduction, fat substrate application, muscle strength, and the formation of bones and cartilage with growth [2][3][4]20,68]. We picked up exercise-related genes as candidate genes in positively selected genes, and further, presented enriched KEGG pathways and functions with the selected exercise-related genes ( Figure 6). HTR2B (encoding 5-hydroxytryptamine receptor 2B) has been identified in the genome of Quarter horses of the racing line [6] and is associated with impulsive behavior [35] and vasoconstriction [34]. A recent research shows that HTR2B are specific markers in agerelated osteoarthritis and involved in apoptosis and inflammation of osteoarthritis synovial cells [69]. In zebrafish, vascular endothelial cadherin (encoded by CDH5) can promote elongation of the endothelial cell interface during angiogenesis [36]. KCNQ1 (encoding KvLQT1, a potassium channel protein) is related to exercise, and mutation of KCNQ1 and KCNE1 can cause susceptibility of sudden cardiac death for a horse [37][38][39]. Mena (encoded by ENAH) which is located in the Z line that the borders of the sarcomere, VASP, and αII-Spectrin assemble cardiac multi-protein complexes to regulate cytoplasmic actin networks [40,41]. PIH1D1 (encoding the components of the apoptotic regulatory complex R2TP) is relevant to muscle mass [42,43]. Because E3 ubiquitin-protein ligase SMURF1 (encoded by SMURF1) functions as negative regulator of myostatin pathway activity and myostatin is a negative regulator of skeletal muscle mass, up-regulated expression of SMURF1 may link to skeletal muscle growth following prolonged training [44]. UNC13C is connected with the differentiation of myoblast while integral myotubes originate in myoblast differentiation and raise the distinct muscle fiber types to build the complex skeletal muscle architecture for body movement, postural behavior, and breathing [45,46]. ATP1A3 encodes subunit alpha-3 of sodium/potassium-transporting ATPase, which increased the expression and may be conducive to decreasing fatigue after training [47,48]. ATP1A3 gene mutations can result in the rapid-onset dystonia-parkinsonism [70]. PPARD (encoding peroxisome proliferatoractivated receptor delta) participates in regulation of energy metabolism, cell proliferation, and differentiation, protection in stress conditions such as oxidative stress and inflammation, and other important life activities [57]. The antecedent studies have shown that the Arabian horse will change the expression of PPARD and other genes of PPAR signaling pathway genes in skeletal muscle during exercise, and improve the coefficient of utilization of fatty acids by energy conversion [58]. The up-regulated PPARD is also found after exercise in the Thoroughbred horse [3]. So, we speculated that positively selected PPARD improved athletic ability by a similar mechanism in Abaga horse. Besides the counterregulatory hormone of insulin, GCG (encoding glucagon) is deemed to be involved in adipose metabolism and energy balance [35]. Transcription factor 7-like 2 (encoded by TCF7L2) not only affects the metabolism of adipocytes by DNA methylation but also activates the corresponding target genes through the Wnt signaling pathway to specifically inhibit glucagon synthesis in enteroendocrine cells [55]. GALNT13 may be involved in metabolic and energy pathways [56].
Exercise has a great influence on the composition of developing horse joints, the thickness of the hyaline cartilage of the adult horse, the calcified cartilage, and the subchondral bone [71,72]. We found several genes associated with skeleton and cartilage development among candidate genes of the Abaga horse. PPP2R5B and PPP6R3 are closely related to femur strength in rats and bone mineral density in humans, respectively [49,50]. PTPRE encodes receptor-type tyrosine-protein phosphatase epsilon which is a positive regulator of osteoclast function [51]. RHOBTB1 is involved in osteoclast-mediated bone absorption activity [52]. REA (LRRFIP1, RCAN1, and RHOBTB1) and IF (TRIP12, HSPE1, and MAP2K6) have an important role to play in muscle cell degradation, development, and motility from Nelore cattle [73]. Chondrogenesis demands transformation of chondrocytes from a simple mesenchymal condensation to cells with a highly enriched extracellular matrix (ECM) in the developing skeleton in which SCFD1 plays an important role in the secretion of ECM protein during chondrogenesis [53,54]. So far there are no studies on the association between these genes and the motor function of horses, but these skeleton-and cartilage-related genes provide new inspiration for the correlational research between ossature and exercise.
After exercise, the equine stress reaction will involve inflammation, cell signaling, and immune interactions [4]. Cell activation is the first step in the proliferation of immune cells, and CD69 is first detected in cell surface glycoproteins after activation [74]. The low-to moderateintensity aerobic trekking induces activation of CD69 T cells and promotes anti-stress effects on the oxidative balance and the high-altitude-induced injury of the immune responses among women [60]. EIF4G3 encodes eukaryotic translation initiation factor 4 gamma 3 which is indispensable for triggering protein synthesis and is thought to be involved in exercise stress-induced response in horses [59,75]. We hypothesized that these genes may be involved in the ability of Abaga horses to enhance certain disease resistance through exercise, but more data and experiments are needed to verify.
GRM1 encodes metabotropic glutamate receptor 1, whose deficiency can lead to serious deficits in motor coordination and spatial learning in mice [61,62]. The effectors of the Hippo signal pathway regulate several motor-related genes and adaptations while VGLL4 is Hippo-signal-related to body height [30]. These exercise-related genes were positively selected in the Abaga horse, indicating that the Abaga horse has exerciserelated genetic potential compared with the Wushen horse.

Conclusion
We generated and analyzed the genomic data of the Abaga horse and Wushen horse by sequencing. We uncovered that chromosome 4 may be associated with the evolution of athletic performance in the Mongolian horse. The positively selected genes of the Abaga horse were enriched in exercise-related pathways, which suggest that the Abaga horse may have an exclusively physiological mechanism for the motorial process. Moreover, 21 exercise-related genes were detected. These findings provided a molecular genetic basis for exercise-directed breeding of Mongolian horses.