HBCR_DMR: A Hybrid Method Based on Beta-Binomial Bayesian Hierarchical Model and Combination of Ranking Method to Detect Differential Methylation Regions in Bisulfite Sequencing Data

Yassi, Maryam; Shams Davodly, Ehsan; Hajebi Khaniki, Saeedeh; Kerachian, Mohammad Amin

doi:10.3390/jpm14040361

Open AccessArticle

HBCR_DMR: A Hybrid Method Based on Beta-Binomial Bayesian Hierarchical Model and Combination of Ranking Method to Detect Differential Methylation Regions in Bisulfite Sequencing Data

by

Maryam Yassi

^1,2,3,

Ehsan Shams Davodly

¹,

Saeedeh Hajebi Khaniki

⁴ and

Mohammad Amin Kerachian

^1,5,6,7,*

¹

Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad 9184156815, Iran

²

Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand

³

Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9054, New Zealand

⁴

Student Research Committee, Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad 9177948564, Iran

⁵

Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad 9177948564, Iran

⁶

Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad 9177948564, Iran

⁷

Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada

^*

Author to whom correspondence should be addressed.

J. Pers. Med. 2024, 14(4), 361; https://doi.org/10.3390/jpm14040361

Submission received: 10 September 2023 / Revised: 20 October 2023 / Accepted: 9 January 2024 / Published: 29 March 2024

(This article belongs to the Special Issue Application of Deep and Machine Learning in Personalized Medicine and Individualized Bioinstruments)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

DNA methylation is a key epigenetic modification involved in gene regulation, contributing to both physiological and pathological conditions. For a more profound comprehension, it is essential to conduct a precise comparison of DNA methylation patterns between sample groups that represent distinct statuses. Analysis of differentially methylated regions (DMRs) using computational approaches can help uncover the precise relationships between these phenomena. This paper describes a hybrid model that combines the beta-binomial Bayesian hierarchical model with a combination of ranking methods known as HBCR_DMR. During the initial phase, we model the actual methylation proportions of the CpG sites (CpGs) within the replicates. This modeling is achieved through beta-binomial distribution, with parameters set by a group mean and a dispersion parameter. During the second stage, we establish the selection of distinguishing CpG sites based on their methylation status, employing multiple ranking techniques. Finally, we combine the ranking lists of differentially methylated CpG sites through a voting system. Our analyses, encompassing simulations and real data, reveal outstanding performance metrics, including a sensitivity of 0.72, specificity of 0.89, and an F1 score of 0.76, yielding an overall accuracy of 0.82 and an AUC of 0.94. These findings underscore HBCR_DMR’s robust capacity to distinguish methylated regions, confirming its utility as a valuable tool for DNA methylation analysis.

Keywords:

DNA methylation; epigenetic; differentially methylated region; beta-binomial Bayesian hierarchical model; ranking method

1. Introduction

Epigenetics is a research area that offers insight into the activation or suppression of genes within living cells, revealing the how, where, and when of these processes. DNA methylation has been extensively researched and is well understood as an epigenetic mechanism that plays a crucial role in various processes [1], including cell development and differentiation. DNA methylation patterns, characterized by either hypo- or hypermethylation, have been identified in human tumor cells, offering valuable insights into the development and progression of complex diseases [2].

Methylation is a process in which methyl groups are added to DNA cytosine (C) molecules, typically occurring at CpG sites. Methylation of promoter regions is commonly associated with gene expression suppression, whereas methylation within gene bodies is generally linked to increased gene expression [3].

In brief, when DNA is treated with bisulfite, unmethylated cytosines are converted to uracil (U), while methylated cytosines remain unchanged. Sequencing of bisulfite-treated DNA and aligning the sequenced reads to a reference genome allows for the quantification of methylation levels at each cytosine. Methylation can occur in three different sequence contexts: CpG, CHG, and CHH (where H corresponds to A, T, or C). Additionally, CHG and CHH methylation has been reported on rare occasions [4]. In this discussion, we will focus solely on the methylation of individual cytosine nucleotides.

Whole-genome bisulfite sequencing (WGBS) enables the precise measurement of DNA methylation across the entire genome [5]. However, alternative DNA methylation sequencing methods have been developed to cost-effectively cover variable regions of DNA methylation. These methods often employ a reduced representation of bisulfite sequencing, focusing on specific restriction sites, such as Reduced Representation Bisulfite Sequencing (RRBS) [6].

One of the most reliable and widely adopted approaches for measuring DNA methylation is the SureSelectXT Human Methyl-seq method. This platform evaluates 84 megabases (MB) of the genome, encompassing 3.7 million CpGs, 19.6 megabases of CpG islands, 9.8 megabases of cancer- and tissue-specific DMRs, 37 megabases of GENCODE promoters, 48 megabases of enhancers, CpG island shores/shelves within ±4 kilobases, and DNase I hypersensitive sites [7].

The epigenetic differences between sample groups are typically described by differentially methylated cytosines (DMCs) and differentially methylated regions (DMRs). The DNA methylation sequencing data comprise three steps in the pre-processing stage before detecting DMRs. Firstly, the total reads are assessed by a Quality Control (QC) tool to provide informative global and graphical representations of methylation sequencing read quality, which are typically applied both before and after alignment (Wingett and Andrews 2018).

Secondly, the unprocessed sequencing reads undergo cleaning through Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/, accessed on 8 September 2023), which involves the removal of sequencing adapters (specifically the Illumina universal adapter), elimination of low-quality bases (those with Q < 67 in Illumina) at the 3′ end, and handling of ambiguous bases in both reads. Thirdly, these initial bisulfite sequencing data are transformed into a count of methylated reads and covered reads of cytosines (comprising both unmethylated and methylated reads) by aligning them to the human reference genome.

For instance, conversion-aware aligners such as BSMAP [8], Bismark (Krueger and Andrews 2011), MethylCoder [9], BRAT-BW [10], Last [11], BS-Seeker2 [12], Bison [13], bwa-meth, WALT [14], VaiBS [15], BiSpark [16], BS-Seeker3 [17], and gemBS [18] are utilized to align the sequenced fragments of bisulfite-treated DNA. The methylation status of a CpG site is documented by counting the reads that are methylated and unmethylated, spanning each specific site. According to a comprehensive review article [19], differential methylation finder methods are classified into seven categories based on their primary concepts and features. For example, Logistic regression-based approaches like methylkit [20] and eDMR [21], Smoothing-based approaches like Bsmooth [22], Biseq [23], and HOME [24], Bata-binomial-based approaches like DSS [25], MOABS [26], RADMeth [27], methySig [28], DSS-signal [29], MACAU [30], DSS-general [30], and GetisDMR [31], Hidden Markov model-based approaches like ComMet [21], HMM-Fisher [32], HMM-DM [33], and DMCHMM [34], Entropy-based approaches like QDMR [35], CpG_MPs [36], and SMART [37], Mixed statistical test-based approaches like COHCAP [38], DMAP [39], and swDMR [40], and Binary segmentation-based approaches like metilene [41] and MethCP [42].

In the current paper, we introduce a novel DMR finder designed for the detection of DMRs in bisulfite sequencing data. The HBCR_DMR method is founded on a hybrid approach that combines two statistical methods: a beta-binomial Bayesian hierarchical model and a combination of ranking techniques. Within the HBCR_DMR method, we assess the variation across CpG methylation proportions using the beta-binomial model. Additionally, for DMR detection, we employ a combination of ranking methods to select discriminative CpG sites.

HBCR_DMR is versatile and can be employed with a range of methylation sequencing platforms, such as WGBS, RRBS, and target-capture methods. In this investigation, we employ HBCR_DMR for the examination of both simulated data (RRBS) and authentic data obtained from colorectal cancer samples (SureSelectXT Human Methyl-seq analysis).

2. Method

Our proposed method consists of six main stages, including (1) CpG clustering, (2) mean and variation assessment using a beta-binomial hierarchical model, (3) ranking method for identifying distinguishing CpG site selection based on methylation status, (4) combination of ranking methods, (5) definition of DMR boundaries, and (6) annotation/visualization. We identified the discriminating DMRs in simulation and real datasets and compared the selected DMRs found by HBCR_DMR with other methods. Figure 1 illustrates the flowchart of our proposed method. Our method is an open source software program and is available on GitHub (https://github.com/Genetics-Research-Laboratory-RROC/HBCR-DMR, accessed on 8 September 2023).

2.1. Data

To assess our method on simulation data, we utilized the “RRBSdata” R package, which includes twelve samples divided into two groups: six controls and six cases. The RRBSdata package encompasses a total of 7,986,265 CpGs, including 24,698 CpG islands. Additionally, it incorporates 10,000 simulated DMRs sourced from a previously published RRBS dataset with the accession number GSE42119 [43].

Furthermore, for real data analysis, we employed the SureSelectXT Human Methyl-Seq approach with a 101-base read length. This method generated 57–76 million Illumina sequencing reads from a dataset comprising six colon adenocarcinomas and six control samples (colon normal tissues). Remarkably, 88.5% to 89.8% of these reads were successfully mapped to either strand of the human genome (GRCh37/19). On average, each CpG was sequenced between 19X and 24X per sample (Supplementary Materials File S1).

2.2. CpG Clusters

Every CpG cluster signifies a region of the genome abundant in CpG sites. In each of these clusters, we filter out extraneous data noise from the complete genome’s DNA methylation dataset. This not only enhances data quality but also streamlines computational processing time by concentrating on defined and scrutinized genomic regions.

Discovering the start and end points of each CpG cluster consists of two steps:

(1): CpG sites found in the majority of at least 75% of all samples are designated as validated CpGs. If the occurrence of any CpG site across all samples falls below the 75% threshold, it is categorized as “noise” and subsequently removed from the DNA methylation dataset [23].
(2): A CpG cluster is defined as a collection of validated CpGs from all samples when the maximum distance between individual CpG sites within it is less than 100 base pairs.

2.3. Beta-Binomial Hierarchical Model

In our approach, we estimate both variation and mean values using a beta-binomial hierarchical model [25]. This model’s prior distribution is based on the entire genome, considering both methylated and unmethylated states. The genuine methylation proportions of CpGs within the replicates are represented by a beta distribution, parameterized by a group mean and a dispersion parameter. The beta distribution accounts for biological variability, while the binomial distribution captures sampling variability. To quantify variation in CpG methylation proportions concerning the group mean, we employ the dispersion parameter

φ_{i j}

, which is estimated through an empirical Bayes method. Detailed statistical formulas and notations are presented below:

(X_{i j k}| P_{i j k}, N_{i j k}) ~ B i n o m i a l (N_{i j k}, P_{i j k})

(1)

P_{i j k} ~ B e t a (u_{i j}, φ_{i j}), μ = \frac{α}{α + β}, φ = \frac{1}{α + β + 1}

(2)

φ_{i j} ~ l o g - n o r m a l (m_{0 j}, r_{0 j}^{2})

(3)

{\hat{μ}}_{i j} = \frac{\sum_{k} X_{i j k}}{\sum_{k} N_{i j k}}, {\hat{v a r}}_{i j} = {(\frac{1}{\sum_{k} N_{i j k}})}^{2} \sum_{k} \{N_{i j k} u_{i j} (1 - u_{i j}) [1 + (N_{i j k} - 1) φ_{i j}]\}

(4)

In Equation (1), denoting the i-th CpG site, j-th group, and k-th replicate, we have

X_{i j k}

as the count of reads indicating methylation,

N_{i j k}

as the total number of reads covering this position, and

P_{i j k}

as the true underlying methylation proportion.

In Equation (2), the beta distribution is parameterized with a mean (represented as

u_{i j}

) and a dispersion (denoted as

φ_{i j}

). Compared with the traditional parameterization of the beta (

α

,

β

) distribution. In Equation (3), we make the assumption that the dispersion parameters can be effectively described by a log-normal distribution with a mean of –3.39 and a standard deviation of 1.08. Equation (4) outlines how the mean methylation levels, denoted as

{\hat{μ}}_{i j}

, are estimated, and how

φ_{i j}

is obtained by maximizing the conditional posterior likelihood. Consequently,

{\hat{v a r}}_{i j}

represents the estimated variance for the j-th group. For each CpG site within a specific condition (e.g., cases or controls), unique mean methylation levels

{\hat{μ}}_{i j}

and variances

{\hat{v a r}}_{i j}

are estimated using the beta-binomial hierarchical model.

2.4. Ranking Method

Detecting differentially methylated regions (DMR) in bisulfite sequencing poses a significant challenge, primarily centered around the selection of CpG sites based on their methylation status. According to DMRFusion [44], CpG sites are typically chosen individually, without considering the interrelationship between features, using rankings based on the relative methylation levels in cancer and control groups. These rankings help determine the usefulness of each CpG site for DMR detection and employ selection methods such as Information gain, Between versus within Class scatter ratio, Fisher ratio, Z-score, and Welch’s t-test. HBCR_DMR uses three ranking methods—Fisher ratio, Z-score, and Welch’s t-test—which are based on the normal distribution. In our approach, as detailed in the previous section, we estimate mean methylation levels

{\hat{μ}}_{i j}

and variances

{\hat{v a r}}_{i j}

for each CpG site using a beta-binomial hierarchical model for both control and normal groups. These estimates are then utilized in the Fisher ratio, Z-score, and Welch’s t-test ranking methods.

Table 1 provides an overview of the ranking method, where x_i represents the relative methylation value of the i-th sample,

\bar{x_{i}}

is the average value, and σ_xi is the sample standard deviation. C denotes the class label, and parameters such as n₁ and n₂ demonstrate the number of samples belonging to specific features within the corresponding class label. Additionally, s_w and s_b correspond to the within-class and between-class scatter matrices, respectively.

2.5. Combination of Ranking Methods

The output of each method consists of a ranked list of differentially methylated CpGs across the genome, and the ensemble method is used to merge the output of each ranking method. In this case, the combination is based on a voting system of the actual ranking. If we denote “m” as the number of objects and “n” as the number of preference lists, then “n” represents the number of ranking methods, and “m” signifies the number of CpGs.

For each CpG site, we establish a corresponding rank vector denoted as “r”, where r = (r₁, …, r_n) and each r_j represents the normalized rank within the range of [0–1] of the CpG site in the j_th ranking list. In this voting system, the voters are the ranking functions (n), and the volunteers consist of the complete set of CpG sites (m) in the genome. Each CpG site is assigned specifically allocated scores, which are determined by a ranking method, within different clusters.

In the voting process, if more than 70% of the participants cast their votes in favor of a CpG site with a score exceeding the predefined empirical threshold (set at 0.04), the CpG site qualifies as a candidate for a differentially methylated region (DMR) within the genome. We initially evaluate various threshold values ranging from 0 to 1 using simulation data. This evaluation enables us to consider the trade-off between sensitivity and specificity, ultimately guiding the selection of the most appropriate empirical threshold.

2.6. Definition of DMR Boundaries

The outcome of the ranking method combination is a list of differentially methylated CpGs (with a p-value ≤ 0.05), from which we infer that they can constitute candidate DMRs. Consequently, we define DMRs as regions comprising significant adjacent CpGs within a single CpG cluster. Subsequently, p-values of neighboring CpG sites within a DMR are combined using Fisher’s method in the Metap R package. The start and end position for DMRs are defined when the methylation difference shifts from positive to negative or vice versa. This difference ensures that within a DMR, all CpGs exhibit either hypo- or hypermethylation, respectively. Furthermore, DMRs are ranked based on the Fisher ratio, methylation fold change [case/control], and absolute methylation difference [case–control].

2.7. Annotation and Visualization

All DMRs are annotated using the UCSC Genome browser (version hg19) in different classes of genome loci, including CpG islands, shores, and shelves, as well as promoters, gene bodies, transcription start sites, and intergenic regions. Moreover, for each DMR, two statistical criteria like the p-value and false discovery rate (FDR) are calculated on the simulation and real datasets. The highly relevant DMRs (p-value and FDR ≤ 0.05) for the specific cancer type on the genome browser are illustrated as a heat map in order to assess the DMRs detected in real data analysis in Section 4.2.

3. Evaluation Criteria

To assess the sensitivity and specificity of the five DMR finding methods including Methylkit/eDMR [21], BiSeq [23], DSS [25], DMRFusion [44], and HBCR-DMR, we recognized DMRs with a p-value less than 0.05 containing five or more CpGs as significant DMRs on the simulation data. Table 2 provides metrics such as true-positive (TP), false-positive (FP), false-negative (FN), true-negative (TN), sensitivity, specificity, accuracy, area under the receiver operator characteristic (ROC) curve (AUC), positive predictive value (PPV), negative predictive value (NPV), Matthews correlation coefficient (MCC), F1 score (F1), and elapsed time to evaluate the performance of these methods to evaluate the performance of these methods and these are described as below:

TP: DMR is characterized as a substantial DMR that coincides with a region resembling simulated DMRs obtained from RRBS data.

FP: DMR is described as a noteworthy DMR that does not intersect with simulated DMRs derived from RRBS data.

FN: DMR is identified as a non-DMR declared by a method that overlaps with simulated DMRs from RRBS data.

TN: DMR is identified as a non-DMR declared by a method that overlaps with simulated non-DMRs from RRBS data.

Sensitivity (recall): The sensitivity refers to the proportion of actual positives (DMRs) that are correctly identified and is estimated as follows:

S e n s i t i v i t y = \frac{T P}{T P + F N}

(5)

Specificity: The specificity refers to the proportion of actual negatives (non DMRs) that are correctly identified and is estimated as follows:

S p e c i f i c i t y = \frac{T N}{T N + F P}

(6)

Accuracy: The accuracy refers to evaluating the prediction of DMRs that are correctly or incorrectly detected and is estimated as follows:

A c c u r a c y = \frac{T P + T N}{T N + T P + F N + F P}

(7)

AUC: A DMR finder method is considered favorable if it provides a high sensitivity to detect DMRs while maintaining a low false positive rate (1 − specificity). We evaluated the trade-off between sensitivity and specificity by calculating ROC curves based on the obtained region-wise p-value. The area under the ROC curve is the (AUC) and higher values of the AUC indicate better performance for a method.

(PPV): The PPV refers to is the probability that regions with a DMR detection truly have the methylation changes, using the formula.

P o s i t i v e p r e d i c t i v e v a l u e = \frac{T P}{T P + F P}

(8)

NPV: The NPV refers to is the probability that regions with a non-DMR detection truly do not have the methylation changes, using the formula:

N e g a t i v e p r e d i c t i v e v a l u e = \frac{T N}{T N + F N}

(9)

MMC: This is calculated directly from the confusion matrix in order to evaluate binary classifications, using the formula:

M C C = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

(10)

An MCC of +1, 0, and −1 correspond to perfect prediction, no better than random prediction, and total disagreement between predicted and actual status, respectively.

F1: This is a weighted average that is calculated directly from the precision and recall, which is estimated as follows:

F 1 = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(11)

p r e c i s i o n = \frac{T P}{T P + F P}

(12)

The F1 score might be a better measure to use if we need to seek a balance between precision and recall. Furthermore, F1 ranges from 0 (worst prediction) to 1 (perfect precision and recall).

Elapsed time: The elapsed time for each DMR finder method was recorded on a Ubuntu 18.04.03 LTS Operating System, with 32 GB of 2133 MHz, DDR4 RAM, and Intel Core i7 6700 3.4 GHz CPU.

4. Results

4.1. Simulation Data Analysis

Because the true differential methylation status of CpGs is unknown in real data, simulation data are needed to evaluate the performance of different methods in a situation where the true DMRs are known. So, in order to assess HBCR_DMR’s ability to identify TP DMRs, we used simulation data comparing HBCR_DMR with four other methods: Methylkit/eDMR, BiSeq, DSS, and DMRFusion. Data were generated using the “RRBSdata” package. For all methods, we evaluated different measurements on simulation data that are shown in Table 2.

The results showed that Methylkit/eDMR, BiSeq, DSS, DMRFusion, and HBCR_DMR identified 1249, 6484, 4718, 6271, and 7111 significant DMRs, respectively. Table 2 indicates that the HBCR_DMR method has more TP DMRs than the other methods. Considering sensitivity and specificity, HBCR_DMR and BiSeq outperform the other methods. The specificity values for Methylkit/eDMR and DSS are both 0.99, while their sensitivity values are relatively poor. The accuracy measures for BiSeq and HBCR_DMR are 0.85 and 0.82, respectively, while DSS, DMRFusion, and Methylkit/eDMR have accuracy values of 0.78, 0.67, and 0.64, respectively. Thus, the highest accuracy values are achieved by BiSeq and HBCR_DMR. In terms of the AUC and F1 score as performance metrics, HBCR_DMR achieves an AUC of 0.94 and an F1 score of 0.76, which is second only to BiSeq, which has an AUC of 0.97 and an F1 score of 0.78.

The maximum value of PPV is 0.99, as measured in BiSeq, Methylkit/eDMR, and DSS. However, HBCR_DMR has an NPV value of 0.83, which is the highest value among the five methods. The highest MCC values are 0.71 and 0.62, calculated for the BiSeq and HBCR_DMR methods, respectively. The results of elapsed time for chromosome 1 were calculated for each method. The elapsed times (in seconds) for Methylkit/eDMR, BiSeq, DSS, DMRFusion, and HBCR_DMR are 41, 20,072, 196, 1740, and 1347, respectively. As a result, the elapsed time for HBCR_DMR is 14 times faster than that of BiSeq (Table 2).

We evaluated the performance of all methods using the AUC. Figure 2 displays the ROC curves. The sensitivity and specificity for BiSeq are 0.65 and 0.99, respectively, and for HBCR_DMR, they are 0.72 and 0.89, respectively. BiSeq has a higher specificity value than HBCR_DMR, while our proposed method exhibits higher sensitivity compared to BiSeq. Therefore, higher AUC values are assigned to BiSeq and HBCR_DMR.

Figure 3 illustrates the overlap of significant DMRs identified by the five methods. The overlap of the detected DMRs from different methods with the TP DMRs in the simulation data is visualized using Venn diagrams. We utilized the “makeVennDiagram” function in the ChIPpeakAnno R package to create the Venn diagrams. HBCR_DMR detects more TP DMRs than the other methods. Specifically, this method identifies 82%, 33%, 11%, and 8% more TP DMRs than Methylkit/eDMR, DSS, DMRFusion, and BiSeq, respectively. The number of common TP DMRs among all methods is 757.

4.2. Real Data Analysis

We applied our proposed method to the SureSelectXT Human Methyl-Seq dataset from our previous study on colorectal cancer and normal colon tissue [45]. This dataset comprises six colorectal adenocarcinoma and six control samples. In the current study, our aim was to compare the results of our HBCR_DMR method with those of other methods. By comparing CRC and normal samples in multi-samples, we detected several thousand hyper- and hypomethylation DMRs. In total, we identified 7325 hyper DMRs and 10,879 hypo DMRs, each with a length of more than 200 bp, the highest Fisher ratio score between these two groups, and p-values and an FDR less than 0.05 (Supplementary Materials File S2). Furthermore, we performed the conversion of genomic coordinates for hyper- and hypomethylation DMRs from the hg19 reference genome to the hg38 reference genome in (Supplementary Materials File S3).

Figure 4A,C reveal that the majority of identified DMRs in both the hypermethylation and hypomethylation categories are predominantly situated within intergenic regions, accounting for 89% and 90%, respectively. In the case of hyper DMRs, 67% of them are annotated within CpG islands, whereas only 21% of the hypomethylated DMRs are found in these regions. Notably, a significant portion of hyper DMRs is located in CGI shores, followed by exons, promoters, and CGI shelves, in that order.

Given that the detected DMRs span more than 200 base pairs, some of them extend across multiple genomic regions, encompassing multiple annotation features. Figure 4B,D provide a visual representation of the expanded annotations for the detected hypermethylation and hypomethylation DMRs. Importantly, a substantial portion of DMRs initially located in intergenic regions extends into intronic regions in both the hypermethylation and hypomethylation categories, as illustrated in Figure 4.

We assessed DNA methylation changes based on DMRs among six CRC samples (T20, T45, T67, T31, T65, T35) and six control samples (N4, N7, N8, N10, N14, N16). In Figure 5, the hyper- and hypomethylation DMR regions are depicted in a heatmap at an FDR of 0.01. A distinct pattern of DNA methylation changes is evident between the CRC and control samples.

From a biological perspective, hypermethylation regions play a key role in the occurrence of CRC. Our candidate DMRs are selected based on the top hypermethylation DMRs, with a methylation fold change > 20 and an absolute methylation difference > 0.1. Table 3 presents the top five DMRs, located in the SFMBT2, SOX5, ZNF43, AGBL4, and SOX5 genes. Furthermore, their annotation details and a visualization of the methylation changes in the DMRs with the highest difference in methylation between the CRC and control samples are provided in Supplementary Materials File S4 and File S5, respectively. Additionally, the average methylation fold change and absolute methylation difference between the CRC and control samples in these candidate regions are 25.08 and 0.24, respectively.

Table 4 compares our proposed approach with previous tools. The number of significant regions and the Type I error rate with a p-value and FDR < 0.05 are as follows: 6944 (0.3), 5637 (0.056), 18,065 (0.065), 15,362 (0.042), and 18,204 (0.028) for the Methylkit/eDMR, BiSeq, DSS, DMRFusion, and HBCR_DMR methods, respectively. Thus, the Type I error rate for BiSeq, DMRFusion, and HBCR_DMR is lower than or approximately 0.05.

5. Discussion

DNA methylation has an important role in carcinogenesis [50]. Thus, CpG regions with different methylation levels, known as DMRs, are of great importance. Statistical analysis of genome-wide bisulfite sequencing with multiple biological samples is challenging due to heterogeneous read coverage, varying methylation levels, a relatively small sample size, and a large number of CpGs in the genome.

Here, we have developed a novel DMR detection tool based on a hybrid of a beta-binomial hierarchical model and a combination of ranking methods. The major advantages of our method are as follows: First, it is suitable for small sample sizes in DNA methylation sequencing. Second, it takes into account biological variability, sampling variability, and variation across the methylation proportion of the CpG sites. Finally, it considers the diversity among ranking methods, different outputs, and result stability.

In the present study, we compared four popular DMR analysis methods, namely Methylkit/eDMR, BiSeq, DSS, and DMRFusion, with HBCR-DMR using simulation data and a real methylation dataset. We evaluated the performance of these methods based on various metrics, including TP, FP, FN, TN, sensitivity, specificity, accuracy, AUCs, PPV, NPV, MMC, F1, and elapsed time, using simulation data.

According to our simulation results, HBCR_DMRs outperform other methods in their ability to identify TP DMRs. The BiSeq method excels in terms of specificity, accuracy, AUCs, PPV, MMC, and F1. However, it has fewer TP DMRs compared to HBCR_DMR, and it takes 14 times longer in terms of elapsed time compared to our proposed method.

Regarding specificity, Methylkit/eDMR, BiSeq, and DSS perform well, while the sensitivity of Methylkit/eDMR and DSS is less satisfactory. We observed a trade-off between sensitivity and specificity, along with a reasonable elapsed time, and the highest value of NPV measure in HBCR_DMR.

The proposed method identifies 18,204 differentiated methylation regions with a Type I error rate of 0.028. These regions are divided into two groups: hypermethylation (7325 regions) and hypomethylation (10,879 regions) based on real data. Among the 11,336 significantly detected DMRs (with a p-value < 0.01), 2205 known genes are identified. The Type I error rates for the hypermethylation and hypomethylation regions are 0.01 and 0.04, respectively (see Supplementary Materials File S2). Thus, the method can detect a large number of DMRs with an approximate Type I error of 0.028.

We recommend the use of HBCR_DMR, BiSeq, or DMRFusion methods, which performed for a wide range of DMRs based on simulation and real dataset results. Notably, HBCR_DMR exhibits superior efficacy in identifying TP DMRs when compared to alternative methods, concurrently exhibiting the lowest Type I error rate within this study.

6. Conclusions

The HBCR_DMR method comprises a hybrid approach, merging a beta-binomial Bayesian hierarchical model with a combination of ranking techniques. This method proves invaluable as a DMR discovery tool and is particularly suitable for situations involving a limited sample size of DNA methylation sequencing. It adeptly incorporates considerations of biological variability, sampling variability, and the inherent variation within CpG site methylation proportions. Moreover, it thoughtfully addresses the disparities among ranking methods, accommodating divergent outputs while maintaining result stability. Notably, the HBCR_DMR method demonstrates a heightened capacity for identifying TP DMRs compared to alternative approaches, concurrently exhibiting the lowest Type I error rate within its category. In addition, HBCR_DMR exhibits versatility across various methylation sequencing platforms, including WGBS, RRBS, SureSelectXT, Human Methyl-Seq, and target-capture methods.

Supplementary Materials

The following supporting information can be downloaded at: https://mdpi.longhoe.net/article/10.3390/jpm14040361/s1.

Author Contributions

Conceptualization, M.Y.; methodology, M.Y.; software. E.S.D.; validation, E.S.D.; formal analysis, S.H.K.; investigation, S.H.K.; resources, M.A.K.; data curation, M.Y.; writing—original draft preparation, M.Y.; writing—review and editing, M.Y. and M.A.K.; visualization, E.S.D.; supervision, M.Y. and S.H.K.; project administration, M.A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this research.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

Our sincere thanks go to the Reza Radiotherapy and Oncology Center and Ali Javadmanesh for their support in CRC research programs.

Conflicts of Interest

The authors declare no conflict of interest.

References

Deaton, A.M.; Bird, A. CpG islands and the regulation of transcription. Genes Dev. 2011, 25, 1010–1022. [Google Scholar] [CrossRef] [PubMed]
Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat. Rev. Genet. 2007, 8, 286–298. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Han, H.; De Carvalho, D.D.; Lay, F.D.; Jones, P.A.; Liang, G. Gene body methylation can alter gene expression and is a therapeutic target in cancer. Cancer Cell 2014, 26, 577–590. [Google Scholar] [CrossRef] [PubMed]
Lister, R.; Pelizzola, M.; Dowen, R.H.; Hawkins, R.D.; Hon, G.; Tonti-Filippini, J.; Nery, J.R.; Lee, L.; Ye, Z.; Ngo, Q.M.; et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009, 462, 315–322. [Google Scholar] [CrossRef] [PubMed]
Cokus, S.J.; Feng, S.; Zhang, X.; Chen, Z.; Merriman, B.; Haudenschild, C.D.; Pradhan, S.; Nelson, S.F.; Pellegrini, M.; Jacobsen, S.E. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 2008, 452, 215–219. [Google Scholar] [CrossRef] [PubMed]
Meissner, A.; Gnirke, A.; Bell, G.W.; Ramsahoye, B.; Lander, E.S.; Jaenisch, R. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 2005, 33, 5868–5877. [Google Scholar] [CrossRef] [PubMed]
Soto, J.; Rodriguez-Antolin, C.; Vallespin, E.; De Castro Carpeno, J.; De Caceres, I.I. The impact of next-generation sequencing on the DNA methylation–based translational cancer research. Transl. Res. 2016, 169, 1–18.e1. [Google Scholar] [CrossRef] [PubMed]
** program. BMC Bioinform. 2009, 10, 232. [Google Scholar] [CrossRef]
Frith, M.C.; Mori, R.; Asai, K. A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res. 2012, 40, e100. [Google Scholar] [CrossRef] [PubMed]
Guo, W.; Fiziev, P.; Yan, W.; Cokus, S.; Sun, X.; Zhang, M.Q.; Chen, P.Y.; Pellegrini, M. BS-Seeker2: A versatile aligning pipeline for bisulfite sequencing data. BMC Genom. 2013, 14, 774. [Google Scholar] [CrossRef]
Ryan, D.P.; Ehninger, D. Bison: Bisulfite alignment on nodes of a cluster. BMC Bioinform. 2014, 15, 337. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Smith, A.D.; Chen, T. WALT: Fast and accurate read map** for bisulfite sequencing. Bioinformatics 2016, 32, 3507–3509. [Google Scholar] [CrossRef] [PubMed]
Li, M.; Huang, P.; Yan, X.; Wang, J.; Pan, Y.; Wu, F.X. VAliBS: A visual aligner for bisulfite sequences. BMC Bioinform. 2017, 18, 410. [Google Scholar] [CrossRef]
Soe, S.; Park, Y.; Chae, H. BiSpark: A Spark-based highly scalable aligner for bisulfite sequencing data. BMC Bioinform. 2018, 19, 472. [Google Scholar] [CrossRef] [PubMed]
Huang, K.Y.Y.; Huang, Y.J.; Chen, P.Y. BS-Seeker3: Ultrafast pipeline for bisulfite sequencing. BMC Bioinform. 2018, 19, 111. [Google Scholar] [CrossRef] [PubMed]
Merkel, A.; Fernandez-Callejo, M.; Casals, E.; Marco-Sola, S.; Schuyler, R.; Gut, I.G.; Heath, S.C. gemBS: High throughput processing for DNA methylation data from bisulfite sequencing. Bioinformatics 2019, 35, 737–742. [Google Scholar] [CrossRef]
Shafi, A.; Mitrea, C.; Nguyen, T.; Draghici, S. A survey of the approaches for identifying differential methylation using bisulfite sequencing data. Brief Bioinform. 2018, 19, 737–753. [Google Scholar] [CrossRef]
Akalin, A.; Kormaksson, M.; Li, S.; Garrett-Bakelman, F.E.; Figueroa, M.E.; Melnick, A.; Mason, C.E. methylKit: A comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 2012, 13, R87. [Google Scholar] [CrossRef]
Li, S.; Garrett-Bakelman, F.E.; Akalin, A.; Zumbo, P.; Levine, R.; To, B.L.; Lewis, I.D.; Brown, A.L.; D’Andrea, R.J.; Melnick, A.; et al. An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinform. 2013, 14 (Suppl. S5), S10. [Google Scholar] [CrossRef] [PubMed]
Hansen, K.D.; Langmead, B.; Irizarry, R.A. BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 2012, 13, R83. [Google Scholar] [CrossRef] [PubMed]
Hebestreit, K.; Dugas, M.; Klein, H.U. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 2013, 29, 1647–1653. [Google Scholar] [CrossRef] [PubMed]
Srivastava, A.; Karpievitch, Y.V.; Eichten, S.R.; Borevitz, J.O.; Lister, R. HOME: A histogram based machine learning approach for effective identification of differentially methylated regions. BMC Bioinform. 2019, 20, 253. [Google Scholar] [CrossRef] [PubMed]
Feng, H.; Conneely, K.N.; Wu, H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res. 2014, 42, e69. [Google Scholar] [CrossRef] [PubMed]
Sun, D.; **, Y.; Rodriguez, B.; Park, H.J.; Tong, P.; Meong, M.; Goodell, M.A.; Li, W. MOABS: Model based analysis of bisulfite sequencing data. Genome Biol. 2014, 15, R38. [Google Scholar] [CrossRef]
Dolzhenko, E.; Smith, A.D. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinform. 2014, 15, 215. [Google Scholar] [CrossRef]
Park, Y.; Figueroa, M.E.; Rozek, L.S.; Sartor, M.A. MethylSig: A whole genome DNA methylation analysis pipeline. Bioinformatics 2014, 30, 2414–2422. [Google Scholar] [CrossRef]
Wu, H.; Xu, T.; Feng, H.; Chen, L.; Li, B.; Yao, B.; Qin, Z.; **, P.; Conneely, K.N. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res. 2015, 43, e141. [Google Scholar] [CrossRef]
Park, Y.; Wu, H. Differential methylation analysis for BS-seq data under general experimental design. Bioinformatics 2016, 32, 1446–1453. [Google Scholar] [CrossRef]
Wen, Y.; Chen, F.; Zhang, Q.; Zhuang, Y.; Li, Z. Detection of differentially methylated regions in whole genome bisulfite sequencing data using local Getis-Ord statistics. Bioinformatics 2016, 32, 3396–3404. [Google Scholar] [CrossRef] [PubMed]
Sun, S.; Yu, X. HMM-Fisher: Identifying differential methylation using a hidden Markov model and Fisher’s exact test. Stat. Appl. Genet. Mol. Biol. 2016, 15, 55–67. [Google Scholar] [CrossRef] [PubMed]
Yu, X.; Sun, S. HMM-DM: Identifying differentially methylated regions using a hidden Markov model. Stat. Appl. Genet. Mol. Biol. 2016, 15, 69–81. [Google Scholar] [CrossRef]
Shokoohi, F.; Stephens, D.A.; Bourque, G.; Pastinen, T.; Greenwood, C.M.T.; Labbe, A. A hidden markov model for identifying differentially methylated sites in bisulfite sequencing data. Biometrics 2019, 75, 210–221. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Liu, H.; Lv, J.; **ao, X.; Zhu, J.; Liu, X.; Su, J.; Li, X.; Wu, Q.; Wang, F.; et al. QDMR: A quantitative method for identification of differentially methylated regions by entropy. Nucleic Acids Res. 2011, 39, e58. [Google Scholar] [CrossRef] [PubMed]
Su, J.; Yan, H.; Wei, Y.; Liu, H.; Liu, H.; Wang, F.; Lv, J.; Wu, Q.; Zhang, Y. CpG_MPs: Identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data. Nucleic Acids Res. 2013, 41, e4. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Liu, X.; Zhang, S.; Lv, J.; Li, S.; Shang, S.; Jia, S.; Wei, Y.; Wang, F.; Su, J.; et al. Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes. Nucleic Acids Res. 2016, 44, 75–94. [Google Scholar] [CrossRef]
Warden, C.D.; Lee, H.; Tompkins, J.D.; Li, X.; Wang, C.; Riggs, A.D.; Yu, H.; Jove, R.; Yuan, Y.C. COHCAP: An integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis. Nucleic Acids Res. 2013, 41, e117. [Google Scholar] [CrossRef]
Stockwell, P.A.; Chatterjee, A.; Rodger, E.J.; Morison, I.M. DMAP: Differential methylation analysis package for RRBS and WGBS data. Bioinformatics 2014, 30, 1814–1822. [Google Scholar] [CrossRef]
Wang, Z.; Li, X.; Jiang, Y.; Shao, Q.; Liu, Q.; Chen, B.; Huang, D. swDMR: A Sliding Window Approach to Identify Differentially Methylated Regions Based on Whole Genome Bisulfite Sequencing. PLoS ONE 2015, 10, e0132866. [Google Scholar] [CrossRef]
Juhling, F.; Kretzmer, H.; Bernhart, S.H.; Otto, C.; Stadler, P.F.; Hoffmann, S. metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res. 2016, 26, 256–262. [Google Scholar] [CrossRef] [PubMed]
Gong, B.; Purdom, E. MethCP: Differentially Methylated Region Detection with Change Point Models. In Proceedings of the International Conference on Research in Computational Molecular Biology, Washington, DC, USA, 5–8 May 2019; pp. 68–84. [Google Scholar]
Schoofs, T.; Rohde, C.; Hebestreit, K.; Klein, H.U.; Gollner, S.; Schulze, I.; Lerdrup, M.; Dietrich, N.; Agrawal-Singh, S.; Witten, A.; et al. DNA methylation changes are a late event in acute promyelocytic leukemia and coincide with loss of transcription factor binding. Blood 2013, 121, 178–187. [Google Scholar] [CrossRef]
Yassi, M.; Shams Davodly, E.; Mojtabanezhad Shariatpanahi, A.; Heidari, M.; Dayyani, M.; Heravi-Moussavi, A.; Moattar, M.H.; Kerachian, M.A. DMRFusion: A differentially methylated region detection tool based on the ranked fusion method. Genomics 2018, 110, 366–374. [Google Scholar] [CrossRef]
Kerachian, M.A.; Javadmanesh, A.; Azghandi, M.; Shariatpanahi, A.M.; Yassi, M.; Davodly, E.S.; Talebi, A.; Khadangi, F.; Soltani, G.; Hayatbakhsh, A.; et al. Crosstalk between DNA methylation and gene expression in colorectal cancer, a potential plasma biomarker for tracing this tumor. Sci. Rep. 2020, 10, 2813. [Google Scholar] [CrossRef] [PubMed]
Hussain, S.; Sun, M.; Min, Z.; Guo, Y.; Xu, J.; Mushtaq, N.; Heng, L.; Huang, H.; Zhao, Y.; Yuan, Y.; et al. Down-regulated in OA cartilage, SFMBT2 contributes to NF-κB-mediated ECM degradation. J. Cell. Mol. Med. 2018, 22, 5753–5758. [Google Scholar] [CrossRef] [PubMed]
Wu, K.; Zhao, Z.; Liu, K.; Zhang, J.; Li, G.; Wang, L. Long noncoding RNA lnc-sox5 modulates CRC tumorigenesis by unbalancing tumor microenvironment. Cell Cycle 2017, 16, 1295–1301. [Google Scholar] [CrossRef]
Cassandri, M.; Smirnov, A.; Novelli, F.; Pitolli, C.; Agostini, M.; Malewicz, M.; Melino, G.; Raschellà, G. Zinc-finger proteins in health and disease. Cell Death Discov. 2017, 3, 17071. [Google Scholar] [CrossRef]
Rogowski, K.; van Dijk, J.; Magiera, M.M.; Bosc, C.; Deloulme, J.C.; Bosson, A.; Peris, L.; Gold, N.D.; Lacroix, B.; Bosch Grau, M.; et al. A family of protein-deglutamylating enzymes associated with neurodegeneration. Cell 2010, 143, 564–578. [Google Scholar] [CrossRef]
Shariatpanahi, A.M.; Yassi, M.; Nouraie, M.; Sahebkar, A.; Varshoee Tabrizi, F.; Kerachian, M.A. The importance of stool DNA methylation in colorectal cancer diagnosis: A meta-analysis. PLoS ONE 2018, 13, e0200735. [Google Scholar] [CrossRef]

Figure 1. The flowchart of the proposed approach.

Figure 2. ROC curves compare sensitivity and specificity of Methylkit/eDMR, BiSeq, DSS, DMRFusion, and HBCR_DMR methods on simulation dataset.

Figure 3. Overlap of TP DMRs for Methylkit/eDMR, BiSeq, DSS, DMRFusion, and HBCR_DMR methods.

Figure 4. Statistical information of hyper/hypomethylation DMR annotation. (A) Hypermethylation DMRs, (B) Expansion of the detected hyper DMRs, (C) Hypomethylation DMRs, (D) Expansion of the detected hypo DMRs.

Figure 5. Heat map representation of DNA Human Methyl-seq data between CRC and normal samples on whole genes. Each column represents one sample and each row represents CpGs methylation status in hyper/hypomethylation DMRs identified by HBCR_DMR.

Table 1. Ranking methods.

Ranking Method	Criterion
Information gain	$S = I n f o (X) - {I n f o}_{x} (X)$ $I n f o (X) = - \sum_{i = 1}^{k} P (c_{i} \times X) \times \log (P (c_{i} \times X))$ ${I n f o}_{x} (X) = - \sum_{i = 1}^{V} \frac{\|V_{i}\|}{\|X\|} \times I n f o (V_{i})$
K = Number of classes V = Number of individual values of a CpGs x $V_{i}$ = The set of instances whose values in CpGs x equal $x_{i}$ $\|V_{i}\|$ = Number of samples in $\|V_{i}\|$ $\|X\| = n$ P = Probability density function $c_{i}$ = Label corresponding (i = 1… k)
Between versus Within Class Scatter Ratio	$s_{w} = \sum_{i = 1}^{c} \sum_{j}^{n_{c}} (x_{j} - μ_{i}) * {((x_{j} - μ_{i}))}^{T}$ $s_{b} = \sum_{i = 1}^{c} (μ_{i} - μ) * {((μ_{i} - μ))}^{T}$ $S = \frac{s_{b}}{s_{w}}$
$x_{j}$ —Relative methylation values of a CpGs x in jth sample $μ_{i}$ —Average value of relative methylation in a CpGs x across all samples in ith class (i = 1…c) μ = Total average value of relative methylation in a CpGs x across all classes (i = 1…c) T = Transpose matrix C = class label feature $s_{w}$ = Within classes scatter $s_{b}$ = Between classes scatter $n_{c}$ = Number of samples in ith class (i = 1…c)
Fisher ratio	$F R (x) = \frac{{([{\bar{x}}_{c_{1}} - {\bar{x}}_{c_{2}}])}^{2}}{{σ_{x_{c_{1}}}}^{2} + {σ_{x_{c_{2}}}}^{2}}$
${\bar{x}}_{c_{1}}$ , ${\bar{x}}_{c_{2}}$ = Mean value of relative methylation in a CpGs x across all samples in ith class (i = 1, 2) $σ_{x_{c_{i}}}$ = Standard deviation value of relative methylation in a CpGs x across all samples in ith class (i = 1, 2)
Z-score	$S = \frac{{\bar{x}}_{c_{1}} - {\bar{x}}_{c_{2}}}{σ_{x}}$
$σ_{x}$ = Standard deviation value of relative methylation in a CpGs x across all samples in both classes
Welch’s t-test	$S = \frac{{\bar{x}}_{c_{1}} - {\bar{x}}_{c_{2}}}{\sqrt{\frac{σ_{x_{c_{1}}}}{n_{c_{1}}} + \frac{σ_{x_{c_{2}}}}{n_{c_{2}}}}}$
$n_{c_{i}}$ = Number of samples in ith class (i = 1, 2)

Table 2. TP, FP, FN, TN, Sensitivity, Specificity, Accuracy, AUC, PPV, NPV, MCC, F1, and Elapsed time (in seconds) for the different DMR detection tools based on simulation datasets.

Method	TP	FP	FN	TN	Sensitivity ¹	Specificity	Accuracy	AUC	PPV ²	NPV ³	MCC ⁴	F1 Score	Time ⁵ (Second)
Methylkit/eDMR	1249	3	8751	14,695	0.12	0.99	0.64	0.88	0.99	0.63	0.28	0.22	41
BiSeq	6484	89	3493	14,632	0.65	0.99	0.85	0.97	0.99	0.81	0.71	0.78	20,072
DSS	4718	15	5279	14,686	0.48	0.99	0.78	0.81	0.99	0.73	0.58	0.64	196
DMRFusion	6271	4495	3554	10,378	0.64	0.69	0.67	0.91	0.58	0.74	0.33	0.61	1740
HBCR_DMR	7111	1674	2759	13,154	0.72	0.89	0.82	0.94	0.81	0.83	0.62	0.76	1347

¹ Sensitivity = Recall. ² PPV: Postive predictive value. ³ NPV: Negative predictive value. ⁴ MCC: Matthews correlation coefficient. ⁵ Time (Second): Elapsed time is calcalated for Chr 1.

Table 3. Information of the highest difference in methylation for hypermethylation DMRs.

Chr	[Start-End]	Gene Symbol	Function	Fisher Ratio	Fold Change	Absolute Methylation Difference	p Value	Q Value
10	7452243-7452499	SFMBT2	SFMBT2 gene is sequence-specific DNA binding, histone binding and miRNA interaction protein [46].	1.77	38.24	0.28	1.09 × 10⁻¹⁴	6.11 × 10⁻¹⁴
12	24715833-24716098	SOX5	SOX5 gene is an unbalances tumor microenvironment to regulate colorectal cancer progression [47].	0.71	22.8	0.22	1.04 × 10⁻⁷	3.28 × 10⁻⁷
19	22034731-22034990	ZNF43	The zinc finger protein43 are involved in gene regulation and development [48].	2.1	21.5	0.31	1.47 × 10⁻¹³	8.93 × 10⁻¹³
1	49242758-49243000	AGBL4	AGBL4 gene related to tubulin binding and metallocarboxypeptidase activity [49].	0.81	21.44	0.19	3.92 × 10⁻⁵	7.84 × 10⁻⁵
12	24715169-24715370	SOX5	SOX5 gene is an unbalances tumor microenvironment to regulate colorectal cancer progression [47].	0.75	21.42	0.19	6.91 × 10⁻⁸	2.24 × 10⁻⁷

Table 4. Comparison of the number of significant regions with a p-value and FDR < 0.05 and Type I error rate between previous tools and HBCR-DMR for DNA Human Methyl-seq data between CRC and normal samples.

Method	Number of DMRs	Type 1 Error Total
Methylkit/eDMR	6944	0.3
BiSeq	5637	0.056
DSS	18,065	0.065
DMRFusion	15,362	0.042
HBCR_DMR	18,204	0.028

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yassi, M.; Shams Davodly, E.; Hajebi Khaniki, S.; Kerachian, M.A. HBCR_DMR: A Hybrid Method Based on Beta-Binomial Bayesian Hierarchical Model and Combination of Ranking Method to Detect Differential Methylation Regions in Bisulfite Sequencing Data. J. Pers. Med. 2024, 14, 361. https://doi.org/10.3390/jpm14040361

AMA Style

Yassi M, Shams Davodly E, Hajebi Khaniki S, Kerachian MA. HBCR_DMR: A Hybrid Method Based on Beta-Binomial Bayesian Hierarchical Model and Combination of Ranking Method to Detect Differential Methylation Regions in Bisulfite Sequencing Data. Journal of Personalized Medicine. 2024; 14(4):361. https://doi.org/10.3390/jpm14040361

Chicago/Turabian Style

Yassi, Maryam, Ehsan Shams Davodly, Saeedeh Hajebi Khaniki, and Mohammad Amin Kerachian. 2024. "HBCR_DMR: A Hybrid Method Based on Beta-Binomial Bayesian Hierarchical Model and Combination of Ranking Method to Detect Differential Methylation Regions in Bisulfite Sequencing Data" Journal of Personalized Medicine 14, no. 4: 361. https://doi.org/10.3390/jpm14040361

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HBCR_DMR: A Hybrid Method Based on Beta-Binomial Bayesian Hierarchical Model and Combination of Ranking Method to Detect Differential Methylation Regions in Bisulfite Sequencing Data

Abstract

1. Introduction

2. Method

2.1. Data

2.2. CpG Clusters

2.3. Beta-Binomial Hierarchical Model

2.4. Ranking Method

2.5. Combination of Ranking Methods

2.6. Definition of DMR Boundaries

2.7. Annotation and Visualization

3. Evaluation Criteria

4. Results

4.1. Simulation Data Analysis

4.2. Real Data Analysis

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI