Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA

R Guigó¹, J W Fickett

¹Theoretical Biology and Biophysics Group Los Alamos National Laboratory, NM 87545, USA.

Journal of Molecular Biology

|October 13, 1995

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Improving data and knowledge management to better integrate health care and research.

Journal of internal medicine·2013

Same author

SPIn: model selection for phylogenetic mixtures via linear invariants.

Molecular biology and evolution·2011

Same author

Long noncoding RNAs as enhancers of gene expression.

Cold Spring Harbor symposia on quantitative biology·2011

Same author

Selenoprofiles: profile-based scanning of eukaryotic genome sequences for selenoprotein genes.

Bioinformatics (Oxford, England)·2010

Same author

Exon structure conservation despite low sequence similarity: a relic of dramatic events in evolution?

The EMBO journal·2001

Same author

SGP-1: prediction and validation of homologous genes based on sequence alignments.

Genome research·2001

Same journal

UPF3A and UPF3B shape the transcriptome cooperatively yet oppose cell function.

Journal of molecular biology·2026

Same journal

Antibody-secreting cells integrate efficient NMD with non‑canonical UPR signaling to maintain proteostasis and support massive immunoglobulin synthesis.

Journal of molecular biology·2026

Same journal

Small molecule stabilization of diverse amyloidogenic immunoglobulin light chains revealed by hydrogen-deuterium exchange mass spectrometry.

Journal of molecular biology·2026

Same journal

UPF1 at Work: Structural and Mechanistic Insights Into a Master Regulator of Nonsense-Mediated mRNA Decay.

Journal of molecular biology·2026

Same journal

Structural basis for the pro-amyloidogenic action and ligand binding of a novel W72R variant of human apolipoprotein A-I.

Journal of molecular biology·2026

Same journal

Cryo-EM Structure of the C. elegans Septin Tetramer Reveals a Revised Architecture and Conserved Positional Orthology.

Journal of molecular biology·2026

See all related articles

Human genome sequence statistics differ between intergenic and genic DNA. Base composition, specifically C+G content, explains most of these differences, impacting gene-finding algorithm performance.

Area of Science:

Genomics
Bioinformatics
Molecular Biology

Background:

Sequence statistics are crucial for identifying protein-coding regions within genomes.
Human genome mapping involves analyzing large sets of randomly selected DNA sequences alongside known genic sequences.

Purpose of the Study:

To investigate the behavior of sequence statistics indicative of protein-coding function in human genomic DNA.
To compare these statistics between randomly selected clone sequences (primarily intergenic DNA) and known genic sequences.
To explore the compositional differences between intergenic and non-coding genic DNA.

Main Methods:

Comparative analysis of sequence statistics in randomly selected human clone sequences and known genic sequences.
Examination of sequence statistics in simulated DNA with varying C+G content.

Related Experiment Videos

Statistical analysis to correlate sequence statistic behavior with base composition (C+G content).

Main Results:

Significant differences in sequence statistics were observed between intergenic and genic DNA.
Further differences were noted between intergenic DNA and the non-coding fraction of genic DNA, suggesting distinct classes.
C+G content was identified as a major factor influencing sequence statistics, explaining most observed differences.
A+T-rich intergenic DNA aligns with compositional equilibrium, while C+G-rich non-coding genic DNA deviates significantly.
A substantial portion of variation in coding statistics used by gene-finding algorithms is attributable to C+G content, not protein-coding function.

Conclusions:

Intergenic and non-coding genic DNA represent distinct classes with differing base compositions.
Base composition, particularly C+G content, significantly influences sequence statistics used in gene identification.
Gene-finding algorithm performance can be enhanced by differentiating the impact of base composition from actual protein-coding function.