Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cis-regulatory Sequences02:02

Cis-regulatory Sequences

11.9K
Cis-regulatory sequences are short fragments of non-coding DNA that are present on the same chromosomes as the genes that they regulate. These fragments serve as binding sites for transcriptional regulators, proteins that are responsible for controlling gene transcription and differential gene expression across cell types in eukaryotes. Cis-regulatory sequences can be close to the gene of interest or thousands of bases away in the DNA sequence; however, those sequences that are further away are...
11.9K
Multiple Allele Traits01:49

Multiple Allele Traits

38.2K
The Concept of Multiple Allelism
38.2K
Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

14.6K
Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...
14.6K
Behavior of Concrete Under Compressive Load01:23

Behavior of Concrete Under Compressive Load

643
Concrete exhibits specific behaviors under different compressive loads. Understanding this is crucial for understanding its structural integrity. When concrete undergoes uniaxial compression, it tends to develop cracks that run parallel to the direction of the force. These parallel cracks stem from localized tensile stresses that occur perpendicular to the compression direction. Additionally, angled cracks may appear due to the formation of shear planes.
As the concrete specimen fractures under...
643
Design Example: Alignment of a Road Line Using GIS01:17

Design Example: Alignment of a Road Line Using GIS

350
The alignment of a road line using Geographic Information Systems (GIS) is a critical process in civil engineering, combining advanced technology with practical decision-making. This methodology begins with the collection of geospatial data, including information on land cover, geomorphology, drainage patterns, slope, and contour details. Such data is typically acquired through satellite imagery and GIS tools, offering a comprehensive understanding of the terrain.Once the data is gathered, it...
350
Regulation of Expression Occurs at Multiple Steps02:24

Regulation of Expression Occurs at Multiple Steps

26.5K
Gene expression can be regulated at almost every step from gene to protein. Transcription is the step that is most commonly regulated. This involves the binding of proteins to short regulatory sequences on the DNA. This association can either promote or inhibit the transcription of a gene associated with the respective sequence.
Transcription results in the generation of precursor (pre-mRNA) that consists of both exons and introns, which needs further processing before being translated to a...
26.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Reference-free discovery with barcoded single-cell sequencing.

Nature biotechnology·2026
Same author

FunctionaL Assigning Sequence Homing (FLASH) maps phenotype to sequence with deep and machine learning.

bioRxiv : the preprint server for biology·2026
Same author

Fast and accurate multiple-protein-sequence alignment at scale with FAMSA2.

Nature biotechnology·2026
Same author

A Reference-Free Algorithm Discovers Regulation in the Plant Transcriptome.

Plant direct·2026
Same author

MDCompress: better, faster compression of molecular dynamics simulation trajectories.

Bioinformatics (Oxford, England)·2026
Same author

An nf-core framework for the systematic comparison of alternative modeling tools: the multiple sequence alignment case study.

NAR genomics and bioinformatics·2025
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Feb 7, 2026

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation
16:02

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Published on: February 10, 2023

3.3K

CoMSA: compression of protein multiple sequence alignment files.

Sebastian Deorowicz1, Joanna Walczyszyn1, Agnieszka Debudaj-Grabysz1

  • 1Institute of Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland.

Bioinformatics (Oxford, England)
|July 17, 2018
PubMed
Summary
This summary is machine-generated.

A new compression algorithm, CoMSA, significantly reduces the size of multiple sequence alignments for protein families. This method offers a six-fold improvement over standard tools, addressing challenges in storing and transferring large bioinformatics data.

More Related Videos

Scanning Electron Microscopic Evaluation of Surface Defects of Remover Retreatment File After Single and Multiple Uses
03:07

Scanning Electron Microscopic Evaluation of Surface Defects of Remover Retreatment File After Single and Multiple Uses

Published on: October 11, 2024

1.0K
Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web
09:51

Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web

Published on: July 16, 2017

16.1K

Related Experiment Videos

Last Updated: Feb 7, 2026

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation
16:02

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Published on: February 10, 2023

3.3K
Scanning Electron Microscopic Evaluation of Surface Defects of Remover Retreatment File After Single and Multiple Uses
03:07

Scanning Electron Microscopic Evaluation of Surface Defects of Remover Retreatment File After Single and Multiple Uses

Published on: October 11, 2024

1.0K
Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web
09:51

Investigating Protein Sequence-structure-dynamics Relationships with Bio3D-web

Published on: July 16, 2017

16.1K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Compression

Background:

  • Bioinformatics databases are rapidly expanding, generating massive datasets.
  • Multiple sequence alignments of protein families, such as those in Pfam, are particularly large, posing storage and transfer challenges.
  • Current data compression methods are insufficient for these large-scale bioinformatics datasets.

Purpose of the Study:

  • To develop a novel compression algorithm specifically designed for aligned biological sequence data.
  • To improve the efficiency of storing and transferring large bioinformatics datasets, particularly multiple sequence alignments.
  • To evaluate the performance of the proposed algorithm against existing compression tools.

Main Methods:

  • A novel compression algorithm, CoMSA, was developed.
  • CoMSA is based on a generalized positional Burrows-Wheeler transform adapted for non-binary alphabets.
  • The algorithm supports FASTA and Stockholm file formats.

Main Results:

  • CoMSA achieves up to a six-fold better compression ratio compared to commonly used compressors like gzip.
  • Experiments analyzed the impact of protein family size on the compression ratio.
  • The algorithm effectively compresses large multiple sequence alignment files.

Conclusions:

  • CoMSA offers a significant improvement in data compression for bioinformatics sequence alignments.
  • The developed algorithm addresses the challenge of managing large-scale biological data.
  • CoMSA provides a practical solution for efficient storage and transfer of protein family alignments.