Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Proteomics

Proteomics

A proteome is the entire set of proteins that a cell type produces. We can study proteomes using the knowledge of genomes because genes code for mRNAs, and the mRNAs encode proteins. Although mRNA analysis is a step in the right direction, not all mRNAs are translated into proteins.
Proteomics is the study of proteomes' function. It involves the large-scale systematic study of the proteome to denote the protein complement expressed by a genome. Scientist Mark Wilkins coined the term...

Gene Evolution - Fast or Slow?

Gene Evolution - Fast or Slow?

The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code...

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Evolutionary trajectories, early diversification, and species-specific amplification of the metazoan inhibitor of apoptosis (IAP) repertoire.

Molecular biology and evolution·2026

Same author

A novel glutathione transferase harboring an FMN redox cofactor.

The FEBS journal·2026

Same author

Unraveling the Link Between Thermal Adaptation and Latent Allostery in Malate Dehydrogenase From Methanococcales.

Journal of molecular biology·2025

Same author

Genomic characterisation of novel extremophile lineages from the thalassohaline lake Dziani Dzaha expands the metabolic repertoire of the PVC superphylum.

Environmental microbiome·2025

Same author

Identifying falsified COVID-19 vaccines by analysing vaccine vial label and excipient profiles using MALDI-ToF mass spectrometry.

NPJ vaccines·2025

Same author

Allostery and Evolution: A Molecular Journey Through the Structural and Dynamical Landscape of an Enzyme Super Family.

Molecular biology and evolution·2025

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 12, 2025

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames

Published on: April 11, 2019

Multi-proteins similarity-based sampling to select representative genomes from large databases.

Rémi-Vinh Coudert^1,2, Jean-Philippe Charrier², Frédéric Jauffrit²

¹Université Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, 69622, Villeurbanne, France.

BMC Bioinformatics

|May 6, 2025

Summary

This summary is machine-generated.

A new method, Multiple-Protein Similarity-based Sampling (MPS-Sampling), efficiently selects representative bacterial genomes from large datasets. This approach ensures taxonomic and phylogenetic diversity without relying on traditional methods, addressing key challenges in genomic data analysis.

Keywords:

Dice index GTDB Genome collection Genome dereplication Prokaryotes RiboDB

More Related Videos

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

Related Experiment Videos

Last Updated: May 12, 2025

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames

Published on: April 11, 2019

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

Area of Science:

Genomics
Bioinformatics
Computational Biology

Background:

Genome databases are expanding rapidly, leading to high redundancy and variable data quality.
Selecting representative genome subsets is crucial for genomic studies but current methods are often biased and slow for large datasets.

Purpose of the Study:

To introduce MPS-Sampling (Multiple-Protein Similarity-based Sampling), a novel method for fast, scalable, and efficient genome sampling.
To address the limitations of existing biased and time-consuming sampling approaches for large genomic datasets.

Main Methods:

MPS-Sampling utilizes homologous protein families to group similar genomes.
It employs a two-step clustering process to delineate homogeneous genome groups.
Representative genomes are selected based on user-defined criteria within these groups.

Main Results:

MPS-Sampling was applied to 178,203 bacterial genomes using 48 ribosomal protein families.
It successfully generated representative genome sets, reducing dataset size significantly (0.3% to 32.17%).
Selected genomes demonstrated strong taxonomic and phylogenetic representativeness of the complete dataset.

Conclusions:

MPS-Sampling offers an efficient, fast, and scalable solution for sampling large genome collections within practical computational limits.
The method avoids biases associated with taxonomic information and phylogenetic tree inference.
MPS-Sampling meets the growing demand for reliable genome sampling tools in bioinformatics.