Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Protein Networks

Protein Networks

An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...

Conservation of Protein Domains Over Different Proteins

Conservation of Protein Domains Over Different Proteins

Protein domains are small structurally independent units that are part of a single amino acid chain. Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Coronary Artery Calcium Scans by Area Deprivation Index in a Multicenter Health System.

JACC. Advances·2026

Same author

Cohort Study of Initial Diuretic Dosing and Outcomes Among Patients Hospitalized for Congestive Heart Failure: Insights From the Cardiovascular Quality Improvement and Care Innovation Consortium.

Journal of the American Heart Association·2026

Same author

Association Between Artificial Intelligence-Detected Features on the ECG and Presence of Microvascular Obstruction.

Circulation. Cardiovascular interventions·2025

Same author

Three-Year Outcomes of Second Bioprosthesis After Early Surgical Aortic Bioprosthetic Failure.

JACC. Advances·2025

Same author

High-Profile Cardiac Arrests Correlate With Online Searches and Public Interest: A Call for Action.

Journal of the American Heart Association·2025

Same author

Incidence, predictors and outcomes of tricuspid regurgitation progression after left-sided valvular intervention.

Heart (British Cardiac Society)·2025

Same journal

Real-Time XFEL Data Analysis at SLAC and NERSC: a Trial Run of Nascent Exascale Experimental Data Analysis.

Concurrency and computation : practice & experience·2025

Same journal

BrainForge: an online data analysis platform for integrative neuroimaging acquisition, analysis, and sharing.

Concurrency and computation : practice & experience·2023

Same journal

Data-driven analysis and predictive modeling on COVID-19.

Concurrency and computation : practice & experience·2023

Same journal

A face mask detection system: An approach to fight with COVID-19 scenario.

Concurrency and computation : practice & experience·2023

Same journal

Poor and rich dolphin optimization algorithm with modified deep fuzzy clustering for COVID-19 patient analysis.

Concurrency and computation : practice & experience·2023

Same journal

Evaluating COVID-19 risk under the estimation of population mean using two attributes.

Concurrency and computation : practice & experience·2023

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 22, 2026

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Published on: July 8, 2025

Optimizing high performance computing workflow for protein functional annotation.

Larissa Stanberry¹, Bhanu Rekepalli², Yuan Liu²

¹Bioinformatics & High-Throughput Analysis Laboratory and High-Throughput Analysis Core, Seattle Children's Research Institute (SCRI), DELSA Global, Seattle, WA 98101, USA.

Concurrency and Computation : Practice & Experience

|October 15, 2014

Summary

This summary is machine-generated.

Annotating vast amounts of protein data is challenging. This study introduces an optimized automated workflow using high-performance computing for efficient large-scale protein annotation, ensuring high accuracy.

Keywords:

BLAST COG HSPp-BLAST PS PSI-BLAST XSEDE computational bioinformatics data-enabled life sciences petascale protein annotation protein sequence universe science gateways sequence similarity

More Related Videos

Label-Free Immunoprecipitation Mass Spectrometry Workflow for Large-scale Nuclear Interactome Profiling

Label-Free Immunoprecipitation Mass Spectrometry Workflow for Large-scale Nuclear Interactome Profiling

Published on: November 17, 2019

A Protocol for Computer-Based Protein Structure and Function Prediction

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

Related Experiment Videos

Last Updated: Apr 22, 2026

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Application of I TASSER, trRosetta, UCSF Chimera, HADDOCK server, and HEX loria for De Novo and In Silico Design of Proteins

Published on: July 8, 2025

Label-Free Immunoprecipitation Mass Spectrometry Workflow for Large-scale Nuclear Interactome Profiling

Label-Free Immunoprecipitation Mass Spectrometry Workflow for Large-scale Nuclear Interactome Profiling

Published on: November 17, 2019

A Protocol for Computer-Based Protein Structure and Function Prediction

A Protocol for Computer-Based Protein Structure and Function Prediction

Published on: November 3, 2011

Area of Science:

Genomics and Bioinformatics
Computational Biology
Molecular Biology

Background:

The rapid expansion of genomic sequencing generates massive amounts of protein data, overwhelming manual annotation efforts.
Existing automated protein annotation methods face limitations due to high computational costs.
Accurate functional annotation of newly sequenced genomes is crucial for biological discovery.

Purpose of the Study:

To develop and optimize an automated workflow for large-scale protein annotation.
To address the challenge of annotating millions of newly sequenced bacterial proteins efficiently.
To provide a scalable solution for the growing volume of genomic data.

Main Methods:

Implementation of an optimized automated workflow leveraging high-performance computing architectures.
Utilization of a low-complexity classification algorithm to assign proteins into clusters of orthologous groups.
Application of the Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) for classification, ensuring high specificity and sensitivity (≥80%).
Employment of highly scalable parallel applications for sequence alignment and classification.

Main Results:

The workflow successfully processed 1,200,000 newly sequenced bacterial proteins using the Extreme Science and Engineering Discovery Environment (XSEDE) supercomputers.
The automated approach demonstrated high specificity and sensitivity in protein classification.
The optimized workflow significantly enhances the efficiency of large-scale protein annotation.

Conclusions:

The proposed automated workflow provides an efficient and scalable solution for the functional annotation of big genome data.
This approach overcomes the limitations of manual curation and computationally expensive existing methods.
The workflow is essential for keeping pace with the rapid expansion of the protein sequence universe and enabling future biological research.