Predicting DNA Methylation state of CpG Islands Using Machine Learning

Document Type : Original Article


1 Biomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology

2 College of Computing and Information Technology (CCIT), Arab Academy for Science Technology and Maritime Transport (AASTMT) Cairo, Egypt

3 Misr University for science and technology

4 Computer Engineering Department Arab Academy for Science, Technology & Maritime Transport Cairo, Egypt.


DNA methylation is the primary and best understood epigenetic element that controls human health. It is an essential regulator of gene transcription. Methylation may be the head of some diseases like Parkinson's, cardiovascular, chronic kidney, cancer, and Alzheimer's. The implementation of models to predict DNA methylation has been concentrated by researchers in the bioinformatics area, according to the difficulties of predicting the methylation that is very sensitive to lifestyle or pollution changes. Recent improvements in methylation sequencing way permit the recognition of genome-wide methylated sites in DNA. In the represented work, computational methods are used to predict the methylation of DNA for every CpG locus and non-CpG locus in the whole genome, utilizing Illumina 450K array data within the 250bp region around every CpG site of the human embryonic stem cell with three classifiers including logistic regression, support vector machine, and random forest. The proposed classifiers have been evaluated. Results show that the best performance criteria came from the random forest approach giving an accuracy of 99.9% for a methylation status compared to the other two classifiers. Expressing more features will lead to higher prediction performance and wider detection coverage for methylation of CpG loci.


Main Subjects