Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Protein Families

Protein Families

Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism. Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members. If these new proteins contain similar amino acids in key locations, protein...

Protein Families

Protein Families

Protein families are groups of homologous proteins; that is, they have similarities in amino acid sequences and three-dimensional structures. Protein families usually occur because of gene duplication, where an additional copy of a gene is inserted into the genome of an organism. Mutations that change the amino acids but still allow the protein to be properly synthesized, will lead to new protein family members. If these new proteins contain similar amino acids in key locations, protein...

Peptide Identification Using Tandem Mass Spectrometry

Peptide Identification Using Tandem Mass Spectrometry

Tandem mass spectrometry, also known as MS/MS or MS2, is an analytical technique that employs two mass analyzers. Essentially it is a series of mass spectrometers that helps isolate a particular biomolecule and then helps study its chemical properties.
This technique helps gather information regarding the protein from which the peptide was obtained and to study the peptides’ amino acid sequence. Identifying peptides from a complex mixture is an important component of the growing field of...

Modern Molecular Taxonomy

Modern Molecular Taxonomy

Advancements in molecular biology have revolutionized the identification and characterization of bacteria, with multiple methods leveraging DNA sequencing for enhanced precision. As sequencing technologies improve and costs decline, these approaches are increasingly used in clinical, environmental, and evolutionary studies.Multilocus Sequence Typing (MLST) examines several housekeeping genes, essential chromosomal genes encoding cellular functions, to distinguish strains. Approximately...

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Methods of Classification and Identification

Methods of Classification and Identification

Bacterial identification relies on a diverse array of techniques to classify and understand microorganisms, each tailored to uncover specific characteristics. Traditional morphological approaches, while still valuable, are limited for closely related or structurally simple organisms. Modern methods integrate biochemical, serological, genetic, and advanced molecular tools to achieve greater accuracy.Morphological and Biochemical TechniquesMorphological characteristics, such as cell shape and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Blood-based biomarker discovery through integrative transcriptomic and miRNA network analyses in schizophrenia, major depressive disorder, and bipolar disorder.

Computational biology and chemistry·2026

Same author

From compliance to prediction: Clinical Laboratories as digital infrastructure for health-system quality and safety.

International journal for quality in health care : journal of the International Society for Quality in Health Care·2026

Same author

Semi-Supervised Relation Extraction Informed by Area Under the Margin Ranking and Large Language Models.

Proceedings of the ... International Conference on Data Science and Advanced Analytics. IEEE International Conference on Data Science and Advanced Analytics·2026

Same author

Global Research Trends, Hotspots and Collaborative Networks in Brain-Derived Extracellular Vesicles: A Multi-Database Bibliometric Analysis.

Neuroinformatics·2026

Same author

Blockchain in Clinical Chemistry: from Hype to Clinical and Operational Value.

Indian journal of clinical biochemistry : IJCB·2026

Same author

Golgi Drivers of Cancer.

Sub-cellular biochemistry·2026

Same journal

To explore the molecular mechanism of IRF7 involved in acute kidney injury in sepsis based on proteomics.

Proteome science·2025

Same journal

Plasma proteome analysis of rheumatic patients reveals differences in fingerprints based on cardiovascular history: a pilot study.

Proteome science·2025

Same journal

Identification of noval diagnostic biomarker for HFpEF based on proteomics and machine learning.

Proteome science·2025

Same journal

Identification of proteome-wide and functional analysis of lysine crotonylation in multiple organs of the human fetus.

Proteome science·2025

Same journal

MiR-18a-LncRNA NONRATG-022419 pairs targeted PRG-1 regulates diabetic induced cognitive impairment by regulating NGF\BDNF-Trkb signaling pathway.

Proteome science·2025

Same journal

Metabolism-related proteins as biomarkers for predicting prognosis in polycystic ovary syndrome.

Proteome science·2024

See all related articles

Search research articles

Related Experiment Video

Updated: May 20, 2026

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

Protein sequence classification using feature hashing.

Cornelia Caragea¹, Adrian Silvescu, Prasenjit Mitra

¹Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA. ccaragea@ist.psu.edu.

Proteome Science

|July 5, 2012

Summary

This summary is machine-generated.

Feature hashing effectively reduces dimensionality in protein sequence classification. This method handles large datasets generated by next-generation sequencing, making data mining more feasible.

More Related Videos

A Protocol for Computer-Based Protein Structure and Function Prediction

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

Related Experiment Videos

Last Updated: May 20, 2026

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

A Protocol for Computer-Based Protein Structure and Function Prediction

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

Area of Science:

Bioinformatics
Computational Biology
Machine Learning

Background:

Next-generation sequencing accelerates protein data acquisition, leading to high-dimensional feature spaces.
Traditional k-gram representations in protein classification create computationally intractable datasets.
Dimensionality reduction is essential for efficient protein sequence analysis and machine learning.

Purpose of the Study:

To evaluate the efficacy of feature hashing for protein sequence classification.
To compare feature hashing against the conventional bag-of-k-grams method.
To address the challenges posed by high-dimensional data in bioinformatics.

Main Methods:

Feature hashing was applied to reduce high-dimensional protein sequence data into a lower-dimensional space.
The feature hashing approach involved mapping k-grams to hash keys and aggregating counts.
Performance was benchmarked against the standard bag-of-k-grams technique.

Main Results:

Feature hashing demonstrated effectiveness in reducing dimensionality for protein sequence classification tasks.
The method offers a viable alternative to traditional high-dimensional approaches.
This technique enhances the feasibility of applying data mining algorithms to large protein sequence datasets.

Conclusions:

Feature hashing is a practical and effective dimensionality reduction technique for protein sequence classification.
This approach mitigates computational challenges associated with large-scale biological data.
The study highlights feature hashing's potential to improve machine learning performance in bioinformatics.