MMG4: Recognition of G4-Forming Sequences Based on Markov Model
View abstract on PubMed
Summary
This summary is machine-generated.A new Markov model (MM) based tool, MMG4, accurately recognizes G-quadruplex (G4) forming sequences. MMG4 outperforms traditional methods and machine learning models, offering a faster and more effective approach for G4 sequence identification.
Area Of Science
- Genomics
- Bioinformatics
- Computational Biology
Background
- G-quadruplexes (G4s) are crucial nucleic acid structures with significant biological roles.
- Current G4 sequence recognition methods, like circular dichroism and nuclear magnetic resonance, are time-consuming and expensive.
Purpose Of The Study
- To develop a fast and accurate computational model for G-quadruplex forming sequence recognition.
- To investigate the factors influencing G4 sequence recognition accuracy and optimize model performance.
Main Methods
- Development of MMG4, a novel G4 recognition model utilizing a Markov model (MM).
- Analysis of sequence length impact and exploration of sequence features for machine learning models (Random Forest, Support Vector Machine, Back-Propagation Neural Network).
- Validation of model robustness and generalization using an independent testing dataset.
Main Results
- MMG4 identified high recognition accuracy in central sequence regions and lower accuracy at ends, attributed to base transfer probabilities and structural content.
- An optimal recognition interval of [910-1049] yielded a 85.95% accuracy.
- The MM-based approach significantly outperformed other machine learning models, with MMG4 (MM + Random Forest) achieving an AUC of 0.93 and AUPRC of 0.9.
Conclusions
- The Markov model-based approach effectively captures adjacent nucleotide correlations crucial for G4 recognition.
- MMG4 offers a significant advancement in G4 sequence identification, providing a robust and generalizable tool.
- The study highlights the potential of computational methods for accelerating biological sequence analysis.
Related Concept Videos
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
Cis-regulatory sequences are short fragments of non-coding DNA that are present on the same chromosomes as the genes that they regulate. These fragments serve as binding sites for transcriptional regulators, proteins that are responsible for controlling gene transcription and differential gene expression across cell types in eukaryotes. Cis-regulatory sequences can be close to the gene of interest or thousands of bases away in the DNA sequence; however, those sequences that are further away are...
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
The pentose sugar in DNA is deoxyribose, while in RNA the pentose sugar is ribose. The difference between the sugars is the presence of the hydroxyl group on the ribose's second carbon and a hydrogen on the deoxyribose's second carbon. The phosphate residue attaches to the hydroxyl group of the 5′ carbon of one sugar and the hydroxyl group of the 3′ carbon of the sugar of the next nucleotide, which forms a 5′ to 3′ phosphodiester linkage.
DNA Structure
DNA...

