Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Distinctive sequence features in protein coding genic non-coding, and intergenic human DNA

R Guigó1, J W Fickett

  • 1Theoretical Biology and Biophysics Group Los Alamos National Laboratory, NM 87545, USA.

Journal of Molecular Biology
|October 13, 1995
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Improving data and knowledge management to better integrate health care and research.

Journal of internal medicine·2013
Same author

SPIn: model selection for phylogenetic mixtures via linear invariants.

Molecular biology and evolution·2011
Same author

Long noncoding RNAs as enhancers of gene expression.

Cold Spring Harbor symposia on quantitative biology·2011
Same author

Selenoprofiles: profile-based scanning of eukaryotic genome sequences for selenoprotein genes.

Bioinformatics (Oxford, England)·2010
Same author

Exon structure conservation despite low sequence similarity: a relic of dramatic events in evolution?

The EMBO journal·2001
Same author

SGP-1: prediction and validation of homologous genes based on sequence alignments.

Genome research·2001
Same journal

UPF3A and UPF3B shape the transcriptome cooperatively yet oppose cell function.

Journal of molecular biology·2026
Same journal

Antibody-secreting cells integrate efficient NMD with non‑canonical UPR signaling to maintain proteostasis and support massive immunoglobulin synthesis.

Journal of molecular biology·2026
Same journal

Small molecule stabilization of diverse amyloidogenic immunoglobulin light chains revealed by hydrogen-deuterium exchange mass spectrometry.

Journal of molecular biology·2026
Same journal

UPF1 at Work: Structural and Mechanistic Insights Into a Master Regulator of Nonsense-Mediated mRNA Decay.

Journal of molecular biology·2026
Same journal

Structural basis for the pro-amyloidogenic action and ligand binding of a novel W72R variant of human apolipoprotein A-I.

Journal of molecular biology·2026
Same journal

Cryo-EM Structure of the C. elegans Septin Tetramer Reveals a Revised Architecture and Conserved Positional Orthology.

Journal of molecular biology·2026
See all related articles

Human genome sequence statistics differ between intergenic and genic DNA. Base composition, specifically C+G content, explains most of these differences, impacting gene-finding algorithm performance.

Area of Science:

  • Genomics
  • Bioinformatics
  • Molecular Biology

Background:

  • Sequence statistics are crucial for identifying protein-coding regions within genomes.
  • Human genome mapping involves analyzing large sets of randomly selected DNA sequences alongside known genic sequences.

Purpose of the Study:

  • To investigate the behavior of sequence statistics indicative of protein-coding function in human genomic DNA.
  • To compare these statistics between randomly selected clone sequences (primarily intergenic DNA) and known genic sequences.
  • To explore the compositional differences between intergenic and non-coding genic DNA.

Main Methods:

  • Comparative analysis of sequence statistics in randomly selected human clone sequences and known genic sequences.
  • Examination of sequence statistics in simulated DNA with varying C+G content.

Related Experiment Videos

  • Statistical analysis to correlate sequence statistic behavior with base composition (C+G content).
  • Main Results:

    • Significant differences in sequence statistics were observed between intergenic and genic DNA.
    • Further differences were noted between intergenic DNA and the non-coding fraction of genic DNA, suggesting distinct classes.
    • C+G content was identified as a major factor influencing sequence statistics, explaining most observed differences.
    • A+T-rich intergenic DNA aligns with compositional equilibrium, while C+G-rich non-coding genic DNA deviates significantly.
    • A substantial portion of variation in coding statistics used by gene-finding algorithms is attributable to C+G content, not protein-coding function.

    Conclusions:

    • Intergenic and non-coding genic DNA represent distinct classes with differing base compositions.
    • Base composition, particularly C+G content, significantly influences sequence statistics used in gene identification.
    • Gene-finding algorithm performance can be enhanced by differentiating the impact of base composition from actual protein-coding function.