Revealing distinct DNA methylation patterns in hepatic carcinoma through high-throughput sequencing

Objectives: To study the relationship between DNA methylation and tumour development and provide experimental evidence for the personalized diagnosis and treat-ment of hepatic carcinoma. Methods: The DNA of hepatic carcinoma tissue (Ca group) and adjacent normal tissue (T group) were extracted using the phenol-chloroform method and then treated with bisul ﬁ te. Twenty-ﬁ ve genes including 45 subtypes were ampli ﬁ ed by PCR. The PCR products were sequenced via the Illumina 450k methylation array assay. The changes of methylated DNA performance were analysed through principal component analysis (PCA). Cluster analysis was used to evaluate the classi ﬁ cation of methylated DNA regions. Haplotype abundance variation was tested for methylation di ﬀ erences. Statistical analysis was performed using the chi-square ( χ 2 ) test or Fisher ’ s exact test. Results: Sequencing discoveries indicated CG-type methylation pervading all amplicons. However, CHG-type and CHH-type methylations were con ﬁ ned to only four amplicons (or nine subtypes). The methylation ratios of three speci ﬁ c amplicons (DAB2IP, PRDM14-1, Rab31-1) out of 45 amplicon subtypes in the Ca group signi ﬁ cantly increased (over 10 %) compared to the T group (p<0.05). Nineteen amplicons demonstrated minor distinction (methylation pattern variations between 1 and 10 %), with the remaining 23 amplicons showing only minimal disparities (under 1 %). PCA and cluster analysis unveiled a marked di ﬀ erence in methylation levels between cancerous and healthy tissues (p<0.05). Conclusions: The changes in haplotypes and methylation sites could serve as a biomarker for the clinical diagnosis of hepatic carcinoma. Methylation patterns might play an important role in the occurrence and development of he-patic carcinoma.


Introduction
Liver dysfunction can precipitate a variety of diseases and complications.Hepatic carcinoma is one of the most common malignant tumours worldwide.The incidence rate of liver cancer continues to rise, and the survival rate is low because it is almost always diagnosed in the late stage [1].Early diagnosis is very important for the treatment and prognosis of liver cancer.Recent studies have shown that DNA methylation is deregulated during the development of hepatocellular carcinoma (HCC) by significantly affecting cell differentiation, proliferation, and function [2].Differential methylation may drive the pathogenesis of liver diseases [3].
Among the various studied epigenetic marks, DNA methylation at CpG dinucleotide cytosines is the most prevalent, and it can be triggered by a variety of environmental factors.The ease of laboratory investigation into DNA methylation makes it suitable for use in epigenomewide association studies and as a potential epigenetic biomarker for predicting complex diseases [4,5].Epigenetic changes in the blood mirror age-related DNA methylation changes are seen in various human tissues, including the brain, skeletal muscle, adipose tissue, and pancreatic islets [6,7].DNA methylation, which occurs in tandem with environmental exposure and other biological events, is frequently associated with an array of diseases, including cancer and autoimmune disorders.Clinical phenotypes exhibit strong correlations with DNA methylation patterns, underscoring their potential as biomarkers for disease diagnosis and treatment guidance [8].
Gaining insights into the genetic and epigenetic profiles of tumours is crucial for understanding carcinogenesis mechanisms and developing therapeutic strategies.With next-generation sequencing, recent advancements in genomic research have enabled one-base-pair resolution analysis of DNA methylation, unveiling more intricate operational mechanisms [9].These include DNA methylation's role in gene expression suppression or activation, transcription factor binding, splicing, and nucleosome positioning.In this study, we probed the gene expression and DNA methylation status of genes in human liver cancer using The Cancer Genome Atlas database, aiming to uncover common regulatory patterns in human tumorigenesis.

DNA samples preparation and bisulfite modification
Following the approval of the appropriate ethics committee, we collected a total of 33 samples of post-surgery from a certified hospital.Twenty-six samples were divided into 13 liver cancer tissues (Ca group) and 13 adjacent normal tissues (T group).All tissues were promptly frozen in liquid nitrogen and then stored at −80 °C.We utilised the phenol-chloroform method to extract genomic DNA (gDNA) from the samples.The integrity of the gDNA was confirmed through 1 % agarose gel electrophoresis, which exhibited clear and consistent electrophoresis bands devoid of noticeable degradation or RNA contamination.The concentration and quality of the gDNA were assessed by using the NanoDrop 2000 spectrophotometer (NanoDrop Technologies, DE, USA).The gDNA were treated by bisulfite modification using the QiagenEpi-Tect Bisulfite Kit (QIAGEN, Hilden, Germany) according to the manufacturer's guidelines.The methylated DNA was purified using EpiTect Spin Columns in accordance with the gene list (Supplemental Table S1).The untreated gDNA were used as control samples.

Methylation-specific PCR (MS-PCR)
We designed MS-PCR primers based on these methylation sites.Primers for both methylated and unmethylated states were generated by Sangon Biotech (Shanghai, China) using MethPrimer software, as detailed in Supplemental Table S2.The PCR reaction mix (total volume: 20 μL) comprised KAPA2G Robust Hot Start ReadyMix, a forward primer, a reverse primer, bisulfite-treated gDNA, and double-distilled water.The target was amplified from the PCR reaction mix after thermal cycling.
Concentration detection of equal volumes of PCR product mixtures was conducted following purification with Backman Agtampure XP magnetic beads, and subsequent testing was performed.

DNA methylation sequencing
We integrated specific tag sequences compatible with the Illumina platform into the DNA library using a multi-primer panel for PCR.The library was subsequently subjected to high-throughput sequencing with the Illumina 450k methylation array (Illumina, San Diego, CA, USA), which comprised 485,577 probes that covered 99 % of the 21,231 RefSeq genes, generating FastQ data in dual-end sequencing mode, with each read measuring either 2×150 bp or 2×250 bp.BeadChip images were captured using Illumina iScan.High-quality samples were ensured by utilising control probes for staining, hybridization, extension, and specificity, along with the high bisulfite conversion efficiency (signal intensity>4,000).The raw data exported from GenomeStudio was analysed using Bioconductor and the lumi/methylumi packages, and the transformation from β-values to M-values was executed.BISMARK compared the transformed available data with the transformed target area, yielding the average methylation level of C bases.The final methylation level was calculated using the following formula: Methylation level=100×reads supporting methylation/all reads.
The combination of gDNA bisulfite modification and highthroughput sequencing allowed for the construction of a single base precision methylation map.The most significant advantage of Illumina products is the resolution of a single CpG site.While other methods locate methylation in a certain region, Illumina analysis can accurately determine the methylation level of an individual CpG site.

Statistical analysis
Statistical analysis was conducted using SPSS21 software (SPSS, Chicago, USA).The volumes of the three gene methylation states (methylated, partially methylated, and unmethylated) were compared using the Chisquare test.To investigate the relationship between gene expression/ methylation and clinicopathological parameters, either the chi-square (χ 2 ) test or Fisher's exact test was utilized.The Chi-squared test was used to compare CpG sites with significant genomic distributions.p<0.05 indicated a statistically significant difference.

Coverage depth across genes
DNA from the Ca and T groups, post-bisulfite conversion, was utilized to generate PCR amplicons for each gene under investigation.The average length of the amplicons was approximately 279.4 base pairs, with an average coverage of 36.8CpG dinucleotides per amplicon.
A total of 11,368,632 sequencing reads were accurately aligned.The T group consistently exhibited a higher quantity of reads compared to the Ca group (Table 1).Despite the higher read count for the T samples, discrepancies in the The phrase "PCR product" is indicative of tumour-associated genes and subtype.The term "Length" refers to the anticipated length of the PCR product measured in base pairs (bp).The symbol "#CpG" represents the count of potential methylation sites accessible for analysis within each PCR product.The abbreviations "Ca" and "T" are shorthand for hepatic carcinoma tissue and neighbouring normal tissues, respectively.The data outlined in the "Ca Reads" and "T Reads" columns represent the quantities of short Illumina sequencing reads that were aligned to each CpG island's reference sequences, and these were employed in subsequent examinations.All of these CpG islands were harnessed in the ensuing analyses.The "chr" represent the localization of genes on chromosomes.
read counts for each gene within the T samples were negligible, following a similar pattern, with the sole exception of sample T14 (Figure 1).The distribution of each gene within the Ca group was similarly uniform.Figure 1 displays the depth of coverage for eight PCR products representing genes in samples.These samples include four T and four Ca samples.The trace patterns were nearly identical for all samples, with minor alterations in coverage depth at each position (Figure 1A and B).There were differences in the absolute coverage depth along the entire length of the amplicon for four out of 25 sequencing and alignment PCR products.One sample (T14) displayed a notable deviation in its read distribution, with prominent gaps in coverage.Consequently, this genomic locus was excluded from further investigation.While there were differences in the absolute depth of coverage among the samples, the contours of the read distribution traces for specific PCR products were nearly identical across both the T and Ca samples (Figure 1C and D).The average coverage depth across all genes within amplicons was identified as 30,548 for the T group and 29,346 for the Ca group, with a maximum of 3,675 at the ARID1A amplicon and a minimum of 65 at the PTEN amplicon.

Methylation levels in hepatic carcinoma
Through the meticulous application of a single nucleotide polymorphism identification technique, we pinpointed each methylation site, quantified the sequencing depth, and assessed the methylation ratio spanning 25 amplicons (or 45 subtypes) for a single sample.Data analysis outcomes revealed variations in methylation ratios among the 25 amplicons between two groups of samples.
Our analysis included three distinct methylation patterns (specifically CG, CGH, and CHH), for which we calculated their respective methylation ratios within the amplicons of both the Ca and T categories.Sequencing discoveries showed CG-type methylation pervading all amplicons.However, CHG and CHH methylation were confined to only four amplicons (or nine subtypes).We observed substantial differences (>10 %) within the methylation ratios of three specific amplicons (DAB2IP, Prdm14-1, Rab31-1) when contrasting Ca and T across the 45 amplicon subtypes (Figure 2A).Compared to the T group, the methylation ratios of DAB2IP, Prdm14-1, and Rab31-1 in the Ca group increased notably (p<0.05).Conversely, 19 amplicons demonstrated minor distinction (methylation pattern variations between 1 and 10 %), with the remaining 23 amplicons showing only trivial disparities (<1 %).
The methylation ratios of 20 amplicons in the T group increased relative to the Ca group, in contrast to a down-regulation in 25 amplicons.Notably, among all 25 downregulated amplicons, the difference in methylation rates between the Ca and T groups was less than 4 %, with most being less than 1 %.Three amplicons were detected with no significant changes in methylation ratios, while the other six amplicons showed minimal differences.Among these nine amplicons, three had an increased methylation ratio in the Ca group, while six saw a reduction (Figure 2B).In the CHH methylation pattern, two amplicons showed an insignificant difference in methylation ratios between the Ca and T groups, with seven amplicons showing slight variations.Only a single amplicon exhibited an up-regulated methylation ratio in the Ca group, with the rest being downregulated (Figure 2C).

PCA and cluster examination
PCA was performed on all samples based on their DNA methylation profile to gain deeper insights into the variations in their methylated DNA performance.It was observed that all Ca samples, in contrast to the T samples, exhibited a similar pattern of methylation status alteration, indicating potential physiological transformations in the T group (Figure 3).Further investigation was conducted using cluster analysis grounded on the methylation levels of CpG loci in the amplicon to appraise the classification potential of the identified differentially methylated DNA regions.The samples were bifurcated into two clusters: one consisting of T tissues and the other of Ca tissues (Figure 4).Intriguingly, the methylation level in Ca samples was considerably elevated compared to the T samples (p<0.05),suggesting that Ca samples were more prone to methylation than the T samples.

Haplotype and the variation in haplotype abundance
In this study, amplicon was employed as a unit to scrutinize CpG haplotype with high-throughput sequencing facilitating multidimensional methylation analysis.There were 10 specific haplotypes in two Ca samples (Table 2).The abundance of minority haplotypes demonstrated significant variations in the two groups using diverse statistical models.For instance, only one out of the eight haplotypes of the ARID1A_1 sequence manifested a significant disparity by the t-test evaluation (p<0.05)(Table 3).

Discussion
Assessment of DNA methylation DNA methylation analysis emerges as a reliable technique for facilitating early diagnosis, given its ability to detect modifications before the onset of mRNA and protein alterations [10].The repertoire of techniques for DNA methylation analysis includes sodium bisulfite modification, enzyme digestion, and affinity-based methods, supplemented by next-generation sequencing, locus-specific, gelbased, and array-based approaches [11].Sodium bisulfite modification is often employed to assess the CpG methylation status, as it transforms unmethylated cytosine to uracil [12,13].With its potential for quantitative evaluation, exceptional sensitivity, and compatibility with diverse samples, this method has established itself as a gold standard for DNA methylation analysis [14].However, the bisulfite treatment process poses restrictions due to DNA denaturation and degradation, thereby limiting the quantity of available DNA.Although this technology comes with certain drawbacks, such as the need for selecting a large number of clones during cloning, complex operations, and high costs, it still provides a more precise detection of genome-wide methylation modifications and offers superior sensitivity and accuracy compared to traditional methylation sequencing techniques.This has led to its widespread application in exploring biological development and disease mechanisms.
With advancements in sequencing technologies, nextgeneration sequencing facilitates high-throughput analysis and can swiftly generate copious sequence data [15].Currently, most high-throughput sequencing studies focus on comprehensive epigenome analysis across the entire genome [16,17], which can lead to substantial costs.However, obtaining accurate data on the proportion of methylated to non-methylated cytosine is a complex endeavour due to the vast size of the mammalian genome, and securing sufficient coverage depth for each CpG site can present a significant challenge.Absolute DNA methylation assays can provide quantitative measurements of DNA methylation levels at the resolution of individual CpG sites.

Epigenetic deviations in hepatocellular carcinoma
The precise molecular pathways leading to liver cancer remain elusive, yet the accumulation of genetic and epigenetic aberrations is widely acknowledged as a key factor in the initiation of the disease [18].Gene expression alterations, independent of genetic sequence changes, can manifest through various epigenetic mechanisms, such as DNA methylation, microRNAs, non-coding RNAs, histone modifications, and shifts in nucleosome positioning [19][20][21][22][23]. Specifically, anomalous DNA methylation has garnered substantial attention in research related to HCC and its precancerous lesions [22].
In our research, through detailed research literature, it was found that the expression of 25 genes is closely related to the occurrence and development of liver cancer, but the mechanism is not fully elucidated [25][26][27][28][29].The sequence specifics and chromosomal locations of the 25 cancer-associated genes are detailed in Table 1.Therefore, we investigated these genes to reveal the potential role of promoter region

Analysis of differences in methylation ratio
DNA methylation lacks a definitive threshold, raising questions about the physiological implications of these minor deviations.Nevertheless, even slight changes in DNA methylation patterns can potentially impact both cell and organism function [24,30].Therefore, shifts in methylation patterns are noteworthy, as those surpassing 10 % are likely to induce physiological changes.Research has shown that when methylation increases by 10 %, it triggers phenotypic changes in living yellow Aguti mice [31].Some research indicated that CDKN2A promoter methylation was associated with an enhanced HCC risk and played a crucial role in the process of HCC, with a potential value to being a triage marker for HCC [32].
Given the evident changes in the methylation ratio of DAB2IP, PRDM14-1, and Rab31-1, it was speculated that the methylation abnormalities of these three amplicons may be associated with the growth and development of HCC.The hDAB2IP gene is a potential tumour suppressor gene that has two variants: hDAB2IPA and hDAB2IPB.The hDAB2IPA was the predominant isoform, being expressed in most of the examined tissues.The expression of hDA-B2IPA was silenced or down-regulated in liver cancer cell lines.The methylation of the hDAB2IPA promoter in HCC was confirmed in an additional 53 pairs of patient samples.More than 80 % of HCC samples showed hDAB2IPA promoter methylation, compared to 11.5 % in the corresponding adjacent normal tissue [33].Our results were congruent with previous reports.
PRDM14 is a member of the PR domain-containing (PRDM) family.In adults, PRDM14 has low expression in human tissues.Overexpression of PRDM14 enhances cancer cells growth and reduced cancer cells sensitivity to chemotherapeutic agents.PRDM14 might be a potential therapeutic molecular target for tumour treatment [34].The study of PRDM14 methylation in liver cancer has not been previously reported.
Rab31 is part of the Ras superfamily of small GTP-binding proteins, which regulates vesicle transport from the Golgi apparatus to early and late endosomes.Rab31 was aberrantly upregulated in HCC tissues compared to adjacent liver tissues.These data suggest that Rab31 promotes HCC cell growth by suppressing cell apoptosis [35].The study of Rab31 methylation in liver cancer has not been previously reported.Further research is required on the relationship between these genes' methylation and liver cancer.

Haplotype analysis
A spectrum of single nucleotide polymorphisms has been recognized as potential drivers of unique phenotypic differences and disease predisposition; an observation substantiated by genome-wide association studies (GWASs).Additionally, it was inferred that haplotype-dependent allele-specific methylation (hap-ASM) could function as a potential determinant of disease vulnerability, although the criteria defining tissue correlation in this context remain ambiguous.Six single nucleotide polymorphisms of TLR4 in the 5′-untranslated region and intron were associated with risk of HCC.In haplotype analysis, one haplotype (GCCCTTAG) of TLR4 was significantly associated with a decrease in the occurrence of HCC [36].Multivariate logistic regression indicated that with no G-C-T haplotype as a reference, the haplotype of G-C-T from these loci was associated with a lower risk for HCC under the recessive model [37].Two major susceptible haplotypes were identified in familial HBV-related HCC.The results indicate different genetic susceptibility between familial and sporadic HBV-related HCC [38].
The allele-specific methylation decides the methylation state of haplotypes in genetically associated regions and can establish epigenetic distinctions with potential functional consequences.To contrast the risk and non-risk haplotypes as determined by associations, DNA methylation data can be assessed via haplotype-specific methylation.Interestingly, a discrepancy in prevalence between the Ca and T samples could potentially sway tumour growth.Due to the fact that the polymorphism of a single locus in a gene often cannot reveal its true association with diseases, haplotype analysis for multiple loci has become an effective means of searching for complex disease genes.Our results indicate that haplotype analysis targeting multiple loci is an effective means of identifying the genetic mechanisms underlying the occurrence of liver cancer.The difference in abundance of a few haplotypes between the Ca and T samples might affect tumour growth.

Conclusions
High-throughput sequencing has emerged as an instrumental tool for examining specific DNA methylation patterns in liver cancer tissues and adjacent healthy tissues.Notably, the expense of high-throughput sequencing is on par with standard bisulfite sequencing.In our current investigation, this technique offered an average coverage depth of 30,548 for the healthy tissues (T samples) and 29,346 for the liver cancer tissues (Ca samples).This enabled us to scrutinize the methylation levels at each distinct site in genetically identical samples and ensured a sufficient data quantity for haplotype, PCA, and cluster analysis.
By contrasting the methylation patterns and conducting haplotype analysis, we pinpointed significant methylation sites and differing potential regulatory units between the two groups.The disparities in haplotype and methylation sites between liver cancer and neighbouring healthy tissues could potentially serve as useful references for biomarker research and development in clinical diagnostics.
Furthermore, PCA and cluster analysis revealed a marked difference in methylation levels between cancerous and healthy tissues, indicating distinct characteristics.This observed distinction in methylation patterns between liver cancer and adjacent healthy tissues could have implications for comprehending the progression and development of liver cancer.
Research ethics: Not applicable.Informed consent: Not applicable.Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.Competing interests: The authors state no conflict of interest.
Research funding: This study was supported by the Major science and technology innovation projects in Xinxiang City (ZG1403).Data availability: The raw data can be obtained on request from the corresponding author.

Figure 1 :
Figure 1: Depth of coverage yielded through high-throughput bisulfite sequencing.(A, B) Exemplary line charts illustrating absolute depth of coverage (Y-axis) vs. genes (X-axis) across the complete amplicon length for eight PCR product samples.These samples include 4 T and 4 Ca samples.The trace patterns are nearly identical for all samples, with minor alterations in coverage depth at each position.(C, D) Exemplary line charts demonstrating absolute depth of coverage (Y-axis) vs. samples (X-axis) along the entire amplicon length for four of the 25 sequenced and aligned PCR products.

Figure 2 :
Figure 2: Examination of methylation ratio differences.(A) Comparative study of DNA methylation patterns (CG) in Ca group and T group tissues.The chart displays the percentages of methylated cytosines at each potential methylation site for each of the 25 unique PCR products.(B) Comparative study of DNA methylation patterns (CHG) in Ca group and T group tissues.The chart portrays the percentages of methylated cytosines at each potential methylation site for each of the 9 unique PCR products.(C) Comparative study of DNA methylation patterns (CHH) in Ca group and T group tissues.The chart demonstrates the percentages of methylated cytosines at each potential methylation site for each of the 9 unique PCR products.Deep red signifies high methylation levels; yellow indicates low methylation levels.

Figure 3 :
Figure 3: Principle component analysis.PCA visually categorizes sample groups using colour codes.The X and Y axes denote the indicators that most accurately represent the true composition of the samples.

Figure 4 :
Figure 4: Cluster analysis.Each cell in the heatmap represents the relative methylation level of CpG sites for the corresponding row of samples, with the colour gradient depicting the relative methylation levels of the sites.Blue and red denote low and high methylation levels, respectively.The arrangement order of rows suggests methylation level similarity.Neighbouring rows indicate greater overall similarity in sample methylation levels.The tree on the left systematically exhibits the degree of similarity.

Table  :
PCR products, number of Illumina reads aligned for each CpG island and the position of genes on chromosomes.

Table  :
Haplotype type information in Ca group.ATCATCGATCTGCTACGCTTTAT, the corresponding methylated haplotype of an amplicon configured to align with the read would be: CTCT."Depth" is the term used to denote the quantity of sequences that substantiate this haplotype.This chart selectively features  haplotypes and  samples.

Table  :
Abundance differences in haplotype.