Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Discrete profile comparison using information bottleneck.

Sean O'Rourke1, Gal Chechik, Robin Friedman

  • 1Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Dr., San Diego, CA 92093, USA. seano@cs.ucsd.edu

BMC Bioinformatics
|May 26, 2006
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Automated implementation of the SwabSeq COVID-19 diagnostic assay on the opentrons flex liquid-handling robot.

Diagnostic microbiology and infectious disease·2026
Same author

Single-cell profiling of DNA methylation in autism spectrum disorder prefrontal cortex reveals distinct regulatory and aging signatures.

Cell genomics·2026
Same author

Systematic evaluation of 24 extraction and library preparation combinations for metagenomic sequencing of SARS-CoV-2 in saliva.

bioRxiv : the preprint server for biology·2026
Same author

A Single-Cell and Spatial 3D Multi-omic Atlas of Developing Human Basal Ganglia and Inhibitory Neurons.

bioRxiv : the preprint server for biology·2026
Same author

A foundation model for continuous glucose monitoring data.

Nature·2026
Same author

A Single-Cell Atlas of DNA Methylation in Autism Spectrum Disorder Reveals Distinct Regulatory and Aging Signatures.

bioRxiv : the preprint server for biology·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

We developed a new method to encode protein information using a discrete alphabet, making large-scale protein database searches faster and more sensitive. This approach preserves significant data from probabilistic profiles, enhancing distant homolog detection.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Protein Sequence Analysis

Background:

  • Protein sequence homologs provide crucial biological insights.
  • Amino acid profiles offer richer information than individual sequences but are computationally intensive for large datasets.
  • Current profile comparison methods are too slow for extensive database searches.

Purpose of the Study:

  • To develop a method for mapping probabilistic protein profiles to a discrete alphabet.
  • To preserve essential information during this mapping process.
  • To enable efficient and sensitive large-scale protein homolog searches.

Main Methods:

  • Utilized the Information Bottleneck (IB) approach for optimal information preservation.
  • Mapped probabilistic profiles to an 80-character discrete alphabet.

Related Experiment Videos

  • Evaluated the sensitivity and speed of distant homolog searches using the discrete encoding.
  • Main Results:

    • The 80-character IB alphabet retains nearly 90% of amino acid occurrence information from profiles.
    • Discrete IB encoding achieves 88% of the sensitivity of profile comparison for distant homolog search.
    • This method is 30 times faster than profile comparison, comparable to simple sequence comparison.

    Conclusions:

    • Discrete IB encoding significantly enhances the efficiency of protein profile analysis.
    • This approach makes large-scale database queries, previously infeasible, computationally tractable.
    • The method improves distant homolog detection sensitivity and speed, broadening the applicability of profile information.