Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Conservation of Protein Domains Over Different Proteins

Conservation of Protein Domains Over Different Proteins

Protein domains are small structurally independent units that are part of a single amino acid chain. Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...

Pleiotropy

Pleiotropy

Pleiotropy is the phenomenon in which a single gene impacts multiple, seemingly unrelated phenotypic traits. For example, defects in the SOX10 gene cause Waardenburg Syndrome Type 4, or WS4, which can cause defects in pigmentation, hearing impairments, and an absence of intestinal contractions necessary for elimination. This diversity of phenotypes results from the expression pattern of SOX10 in early embryonic and fetal development. SOX10 is found in neural crest cells that form melanocytes,...

Nonsense-mediated mRNA Decay

Nonsense-mediated mRNA Decay

The Upf proteins that carry out nonsense-mediated decay (NMD) are found in all eukaryotic organisms, including humans. Each protein has an individual role, but they need to work in collaboration. Upf1 is an ATP-dependent RNA helicase that unwinds the RNA helix. Because Upf1 can unwind any RNA, Upf2 and Upf3 are required to help Upf1 discriminate between nonsense and normal mRNAs.
Usually, Upf3 binds to an Exon Junction Complex (EJC) at mRNA splice sites. If a ribosome fully translates the mRNA,...

Mutations

Mutations

Signal Sequences and Sorting Receptors

Signal Sequences and Sorting Receptors

Signal sequences are short amino acid sequences that guide newly synthesized proteins to their proper location within the cell. Classical signal sequences are fifteen to sixty amino acids long and present at the N-terminus of a polypeptide chain. Each signal sequence has a conserved segment of basic residues towards their N terminus, a hydrophobic core, and a C-terminus rich in polar residues. The C-terminus also contains a signal cleavage site and features a -3 -1 sequence motif. The -3-1...

Single Nucleotide Polymorphisms-SNPs

Single Nucleotide Polymorphisms-SNPs

A single nucleotide polymorphism or SNP is a single nucleotide variation at a specific genomic position in a large population. It is the most prevalent type of sequence variation found in the human genome. Point mutations that occur in more than 1% of the population qualify as SNPs. These are present once every 1000 nucleotides on an average in the human genome. Replacement of a purine with another purine (A/G) or a pyrimidine with another pyrimidine (C/T) is known as a transition. In contrast,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

On the state of protein function prediction: a report on the fourth CAFA challenge.

bioRxiv : the preprint server for biology·2026

Same author

Advances in Protein Function Prediction from the Fifth CAFA Challenge.

bioRxiv : the preprint server for biology·2026

Same author

AlphaFold Protein Structure Database 2025: a redesigned interface and updated structural coverage.

Nucleic acids research·2025

Same author

NAD<sup>+</sup> reverses Alzheimer's neurological deficits via regulating differential alternative RNA splicing of <i>EVA1C</i>.

Science advances·2025

Same author

GOBeacon: An ensemble model for protein function prediction enhanced by contrastive learning.

Protein science : a publication of the Protein Society·2025

Same author

An antibody developability triaging pipeline exploiting protein language models.

mAbs·2025

Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026

Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026

Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026

Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026

Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026

Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 29, 2025

In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila

In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila

Published on: August 20, 2019

Enhancing missense variant pathogenicity prediction with protein language models using VariPred.

Weining Lin¹, Jude Wells², Zeyuan Wang³

¹Division of Biosciences, Institute of Structural and Molecular Biology, University College London, London, UK.

Scientific Reports

|April 7, 2024

Summary

This summary is machine-generated.

VariPred, a novel computational tool, accurately predicts genetic variant pathogenicity using protein sequences. This approach outperforms existing methods by leveraging advanced protein language models without complex feature engineering.

More Related Videos

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

In Vivo Modeling of the Morbid Human Genome using Danio rerio

In Vivo Modeling of the Morbid Human Genome using Danio rerio

Published on: August 24, 2013

Related Experiment Videos

Last Updated: Jun 29, 2025

In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila

In Vivo Functional Study of Disease-associated Rare Human Variants Using Drosophila

Published on: August 20, 2019

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

In Vivo Modeling of the Morbid Human Genome using Danio rerio

In Vivo Modeling of the Morbid Human Genome using Danio rerio

Published on: August 24, 2013

Area of Science:

Genomics and Bioinformatics
Computational Biology
Molecular Genetics

Background:

Predicting the pathogenicity of genetic variants is crucial for understanding disease mechanisms and clinical impact.
Traditional methods rely on hand-crafted features, often requiring complex data preprocessing like structural or evolutionary analyses.
The advent of deep learning and large protein language models offers new avenues for variant pathogenicity prediction.

Purpose of the Study:

To introduce VariPred, a novel framework for predicting genetic variant pathogenicity.
To leverage pre-trained protein language models for an end-to-end variant impact prediction.
To demonstrate that VariPred outperforms existing state-of-the-art methods using only protein sequence data.

Main Methods:

Developed VariPred, an end-to-end deep learning model utilizing a pre-trained protein language model (ESM-1b).
Input requirement is limited to the protein sequence, eliminating the need for structural or multiple sequence alignment features.
Evaluated VariPred's performance on six established variant impact prediction benchmarks.

Main Results:

VariPred demonstrated comparable or superior performance against established predictors like 3Cnet, Polyphen-2, REVEL, MetaLR, FATHMM, and ESM variant.
The model achieved robust classification accuracy across multiple benchmarks.
The simplified input requirement (protein sequence only) streamlines the prediction process.

Conclusions:

VariPred offers a powerful and efficient new tool for predicting variant pathogenicity.
The framework highlights the potential of protein language models in genomic variant interpretation.
This sequence-based approach simplifies pathogenicity prediction, making it more accessible for researchers.