Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Combining multiple data sets in a likelihood analysis: which models are the best?

Tal Pupko¹, Dorothée Huchon, Ying Cao

¹The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan. tal@ism.ac.jp

Molecular Biology and Evolution

|November 26, 2002

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

BetaDescribe: Providing rich descriptions from protein sequences.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Intron Retention as a Homeostatic State Variable for Drug Response and Recovery: Lessons from Depression for Broader Applications.

International journal of molecular sciences·2026

Same author

The IR-Homeostat Hypothesis: Intron Retention as an Evolutionarily Conserved Fine-Tuning Layer and a Reversible Blood Biomarker of Homeostatic Dysregulation in Mood Disorders.

International journal of molecular sciences·2026

Same author

The role of plant polyploidy in the structure of plant-pollinator communities.

Frontiers in plant science·2026

Same author

Efficient algorithms for simulating sequences along a phylogenetic tree.

Bioinformatics (Oxford, England)·2025

Same author

Anesthetic Management of a Dental Patient With Familial Mediterranean Fever.

Anesthesia progress·2025

Same journal

Population Epigenetics: Deciphering DNA Methylation Diversity and its Implications for Health, Disease, and Evolution.

Molecular biology and evolution·2026

Same journal

Genomic signature of repeated transitions to diurnality in spiders.

Molecular biology and evolution·2026

Same journal

Phylogenomic blind spots: The limits of UCE and BUSCO loci in the presence of gene flow.

Molecular biology and evolution·2026

Same journal

seqLens: Optimizing Language Models for Genomic Predictions.

Molecular biology and evolution·2026

Same journal

The transcriptional and translational outcomes for pseudogenes in bacterial endosymbionts.

Molecular biology and evolution·2026

Same journal

800 million years of co-evolution in the green plant lineage - the case of LEUNIG and SEUSS transcriptional co-regulators.

Molecular biology and evolution·2026

See all related articles

Statistical models for combining multiple gene sequences in phylogenetic analysis are crucial. The study found that separate or proportional models for branch lengths and individual gamma parameters for among-site rate variation best represent molecular data, impacting tree topology accuracy.

Area of Science:

Evolutionary Biology
Bioinformatics
Computational Biology

Background:

Phylogenetic analyses traditionally used single gene sequences.
The availability of vast gene sequence data necessitates multi-gene analyses.
Combining multiple molecular datasets requires robust statistical methods.

Purpose of the Study:

To compare statistical models for combining different genes in phylogenetic analyses.
To evaluate the likelihood of tree topologies using various branch length and rate variation models.
To determine the impact of model choice on maximum likelihood phylogenetic inference.

Main Methods:

Compared three branch length estimation methods: concatenate, proportional, and separate models.
Compared three models of among-site rate variation: homogenous, single gamma parameter, and separate gamma parameters per gene.

Related Experiment Videos

Utilized two nuclear and one mitochondrial amino acid data sets for analysis.

Main Results:

The separate or proportional models for branch lengths were most appropriate, depending on the dataset.
A model with one gamma parameter for each gene was optimal for among-site rate variation across all datasets.
Model choice significantly influenced the resulting maximum likelihood tree topology.

Conclusions:

The selection of appropriate statistical models for combining molecular data is critical for accurate phylogenetic reconstruction.
Specific models for branch length estimation and among-site rate variation improve the reliability of evolutionary trees.
Understanding model effects is essential for interpreting phylogenetic results, particularly for complex datasets like mammalian phylogenies.