Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
Conservation of Protein Domains Over Different Proteins02:26

Conservation of Protein Domains Over Different Proteins

Protein domains are small structurally independent units that are part of a single amino acid chain.  Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to form...
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved DNA...
Protein Families02:47

Protein Families

Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key locations, protein...
Protein Families02:47

Protein Families

Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism.   Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members.   If these new proteins contain similar amino acids in key locations, protein...
Conserved Binding Sites01:49

Conserved Binding Sites

Many proteins’ biological role depends on their interactions with their ligands, small molecules that bind to specific locations on the protein known as ligand-binding sites. Ligand-binding sites are often conserved among homologous proteins as these sites are critical for protein function.
Binding sites are often located in large pockets, and if their location on a protein’s surface is unknown, it can be predicted using various approaches. The energetic method computationally analyses the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Directional Preparation of Duckweed-Based Porous Carbon and Adsorptive Removal of Volatile Organic Compounds.

ACS omega·2026
Same author

A lipidomic based metabolic age score for monitoring the effects of lifestyle and diet on metabolic disease risk.

Research square·2026
Same author

Toward precise management of groundwater by combined heavy metal(loid)s contamination at industrial sites: A machine learning driven source-to-risk zoning framework.

Environmental pollution (Barking, Essex : 1987)·2026
Same author

The role of Lipoprotein(a) and oxidized phospholipids in modifying the effects of aspirin on major cardiovascular events and bleeding in the ASPirin in Reducing Events in the Elderly (ASPREE) randomized clinical trial: Statistical analysis plan.

medRxiv : the preprint server for health sciences·2026
Same author

scKSFD: federated distillation model with knowledge sharing for cell type classification of clinical transcriptome data.

BMC bioinformatics·2026
Same author

Research progress in the inhibitory mechanisms of traditional Chinese medicine therapies against influenza A.

Frontiers in immunology·2026
Same journal

Constructing regulatory networks of Rubisco post-translational modifications: a novel avenue for engineering environment adaptive plants.

Gene·2026
Same journal

Traumatic brain injury enhances fracture healing by upregulating VNN1 to activate the Wnt/β-catenin signaling pathway.

Gene·2026
Same journal

Single-cell transcriptomics reveals CCL2-mediated macrophage-endothelial cell interactions drive apoptosis in varicose veins.

Gene·2026
Same journal

Development of a gene signature related to phospholipid metabolism for prognosis and therapeutic Prediction in Osteosarcoma: Focus on VAC14.

Gene·2026
Same journal

A pilot single-case longitudinal multi-omics of canine oral melanoma characterizes endogenous mutation patterns and radiotherapy-associated responses.

Gene·2026
Same journal

Genome-wide identification and analysis of GATA genes family in maize reveals involvement of ZmGATA17 in salt stress tolerance.

Gene·2026
See all related articles

Related Experiment Video

Updated: May 8, 2026

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group
07:49

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

Protein sequence comparison based on K-string dictionary.

Chenglong Yu1, Rong L He, Stephen S-T Yau

  • 1Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, IL 60607-7045,USA.

Gene
|August 14, 2013
PubMed
Summary
This summary is machine-generated.

This study introduces a K-string dictionary to reduce memory usage for protein sequence comparisons. This novel approach enables efficient, accurate gene tree construction using lower-dimensional protein representations.

Keywords:
CardinalityFrequency vectorK-stringMSANADH dehydrogenase 1ND1SVDSequence comparisonSingular Value Decompositionmultiple sequence alignment

More Related Videos

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

Related Experiment Videos

Last Updated: May 8, 2026

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group
07:49

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

A Protocol for Computer-Based Protein Structure and Function Prediction
16:41

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

A Practical Guide to Phylogenetics for Nonexperts
12:00

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • K-string-based protein sequence comparison methods demand significant computational memory.
  • The high dimensionality of protein vector representations, increasing exponentially with K, poses a challenge.

Purpose of the Study:

  • To introduce a novel "K-string dictionary" concept to address high-dimensional protein vector representation.
  • To reduce the computer memory requirements for K-string-based protein sequence analysis.

Main Methods:

  • Development of the "K-string dictionary" concept.
  • Application of Singular Value Decomposition (SVD) for analyzing protein datasets.
  • Utilizing lower-dimensional K-string-based frequency or probability vectors for protein representation.

Main Results:

  • Significantly reduced computer memory requirements for protein sequence comparison.
  • Achieved accurate gene tree construction through improved protein vector representation.
  • Demonstrated the efficacy of the K-string dictionary in handling high-dimensional data.

Conclusions:

  • The K-string dictionary offers an efficient solution for memory-intensive protein sequence analysis.
  • This method enhances the accuracy of gene tree reconstruction.
  • The proposed approach has significant implications for large-scale genomic and proteomic studies.