Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

5.6K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
5.6K
Proteomics01:33

Proteomics

7.0K
A proteome is the entire set of proteins that a cell type produces. We can study proteomes using the knowledge of genomes because genes code for mRNAs, and the mRNAs encode proteins. Although mRNA analysis is a step in the right direction, not all mRNAs are translated into proteins.
Proteomics is the study of proteomes' function. It involves the large-scale systematic study of the proteome to denote the protein complement expressed by a genome. Scientist Mark Wilkins coined the term...
7.0K
Gene Evolution - Fast or Slow?02:05

Gene Evolution - Fast or Slow?

7.0K
The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code...
7.0K
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

3.9K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
3.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evolutionary trajectories, early diversification, and species-specific amplification of the metazoan inhibitor of apoptosis (IAP) repertoire.

Molecular biology and evolution·2026
Same author

A novel glutathione transferase harboring an FMN redox cofactor.

The FEBS journal·2026
Same author

Unraveling the Link Between Thermal Adaptation and Latent Allostery in Malate Dehydrogenase From Methanococcales.

Journal of molecular biology·2025
Same author

Genomic characterisation of novel extremophile lineages from the thalassohaline lake Dziani Dzaha expands the metabolic repertoire of the PVC superphylum.

Environmental microbiome·2025
Same author

Identifying falsified COVID-19 vaccines by analysing vaccine vial label and excipient profiles using MALDI-ToF mass spectrometry.

NPJ vaccines·2025
Same author

Allostery and Evolution: A Molecular Journey Through the Structural and Dynamical Landscape of an Enzyme Super Family.

Molecular biology and evolution·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: May 12, 2025

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames
07:38

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames

Published on: April 11, 2019

12.6K

Multi-proteins similarity-based sampling to select representative genomes from large databases.

Rémi-Vinh Coudert1,2, Jean-Philippe Charrier2, Frédéric Jauffrit2

  • 1Université Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, 69622, Villeurbanne, France.

BMC Bioinformatics
|May 6, 2025
PubMed
Summary
This summary is machine-generated.

A new method, Multiple-Protein Similarity-based Sampling (MPS-Sampling), efficiently selects representative bacterial genomes from large datasets. This approach ensures taxonomic and phylogenetic diversity without relying on traditional methods, addressing key challenges in genomic data analysis.

Keywords:
Dice indexGTDBGenome collectionGenome dereplicationProkaryotesRiboDB

More Related Videos

An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.3K
Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group
07:49

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

7.0K

Related Experiment Videos

Last Updated: May 12, 2025

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames
07:38

Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames

Published on: April 11, 2019

12.6K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

3.3K
Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group
07:49

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

7.0K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Genome databases are expanding rapidly, leading to high redundancy and variable data quality.
  • Selecting representative genome subsets is crucial for genomic studies but current methods are often biased and slow for large datasets.

Purpose of the Study:

  • To introduce MPS-Sampling (Multiple-Protein Similarity-based Sampling), a novel method for fast, scalable, and efficient genome sampling.
  • To address the limitations of existing biased and time-consuming sampling approaches for large genomic datasets.

Main Methods:

  • MPS-Sampling utilizes homologous protein families to group similar genomes.
  • It employs a two-step clustering process to delineate homogeneous genome groups.
  • Representative genomes are selected based on user-defined criteria within these groups.

Main Results:

  • MPS-Sampling was applied to 178,203 bacterial genomes using 48 ribosomal protein families.
  • It successfully generated representative genome sets, reducing dataset size significantly (0.3% to 32.17%).
  • Selected genomes demonstrated strong taxonomic and phylogenetic representativeness of the complete dataset.

Conclusions:

  • MPS-Sampling offers an efficient, fast, and scalable solution for sampling large genome collections within practical computational limits.
  • The method avoids biases associated with taxonomic information and phylogenetic tree inference.
  • MPS-Sampling meets the growing demand for reliable genome sampling tools in bioinformatics.