Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Conservation of Protein Domains Over Different Proteins

Conservation of Protein Domains Over Different Proteins

Protein domains are small structurally independent units that are part of a single amino acid chain. Although these domains are often structurally independent, they may rely on synergistic effects to perform their functions as part of a larger protein. Protein domains may be conserved within the same organism, as well as across different organisms.
A limited set of protein domains often duplicate and recombine during evolution. These domains can be organized in different combinations to...

Conservation of Protein Domains

Conservation of Protein Domains

Protein Networks

Protein Networks

An organism can have thousands of different proteins, and these proteins must cooperate to ensure the health of an organism. Proteins bind to other proteins and form complexes to carry out their functions. Many proteins interact with multiple other proteins creating a complex network of protein interactions.
These interactions can be represented through maps depicting protein-protein interaction networks, represented as nodes and edges. Nodes are circles that are representative of a protein,...

Protein Networks

Protein Networks

Protein and Protein Structure

Protein and Protein Structure

Proteins are one of the most abundant organic molecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Their structures, like their functions, vary greatly. They are all, however, amino acid polymers arranged in a linear sequence.
A protein's shape is critical to its function. For example, an enzyme...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The missing link in FAIR data policy: biodata resources in life sciences.

Scientific data·2026

Same author

Unlocking the potential of PubMed Central supplementary data files.

Bioinformatics advances·2025

Same author

Manuscript Classification to Support the Analysis of Biases in Publication Opportunities.

Studies in health technology and informatics·2025

Same author

A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications.

GigaScience·2025

Same author

New and revised gene ontology biological process terms describe multiorganism interactions critical for understanding microbial pathogenesis and sequences of concern.

Journal of biomedical semantics·2025

Same author

A compendium of human gene functions derived from evolutionary modelling.

Nature·2025

Same journal

Development of CypriSSR: a genome-wide, chromosome-level microsatellite database for multiple cyprinidae species.

Database : the journal of biological databases and curation·2026

Same journal

KitBase Expanded: An Integrated Genomic and Phenotypic Resource for 3,268 Fast-Neutron-Irradiated Rice Mutants.

Database : the journal of biological databases and curation·2026

Same journal

PhaLP 2.0: extending the community-oriented phage lysin database with a SUBLYME pipeline for metagenomic discovery.

Database : the journal of biological databases and curation·2026

Same journal

A similarity metric, rubric, and unified hierarchy for biomedical publication types and study designs.

Database : the journal of biological databases and curation·2026

Same journal

GUTAID: a curated database linking gut microbial antigens to autoimmune mechanisms.

Database : the journal of biological databases and curation·2026

Same journal

Rosetta Statements: simplifying FAIR knowledge graph construction with a user-centred approach.

Database : the journal of biological databases and curation·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 3, 2026

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

Deep Question Answering for protein annotation.

Julien Gobeill¹, Arnaud Gaudinat², Emilie Pasche³

¹BiTeM group, University of Applied Sciences-HEG, Library and Information Sciences, SIBTex group, Swiss Institute of Bioinformatics, julien.gobeill@hesge.ch.

Database : the Journal of Biological Databases and Curation

|September 19, 2015

Summary

This summary is machine-generated.

Biomedical question-answering systems struggle with complex genomics queries. A new deep question-answering (QA) approach using Gene Ontology (GO) concepts significantly improves answer recall and precision by over 100%.

More Related Videos

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

Related Experiment Videos

Last Updated: Apr 3, 2026

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

Area of Science:

Bioinformatics
Genomics
Computational Biology

Background:

Biomedical professionals face information overload from extensive literature.
Standard search engines and question-answering (QA) systems struggle to efficiently retrieve specific information, especially for complex genomics questions.
Existing QA systems often fail to extract answers requiring Gene Ontology (GO) concepts.

Purpose of the Study:

To evaluate dictionary-based classifiers and a novel supervised classifier (GOCat) for extracting GO concepts from biomedical literature.
To investigate the effectiveness of a deep QA approach that leverages curated biological data for inferring answers not explicitly stated.
To address the limitations of current QA systems in handling complex genomics-related queries.

Main Methods:

Comparison of two dictionary-based classifiers against a Gene Ontology (GO) supervised classifier (GOCat).
Utilizing the GOA database to identify GO concepts annotated by curators for similar abstracts.
Implementing a deep QA approach incorporating a classification step and curated data exploitation.
Testing on a dataset of 100 retrieved abstracts per complex genomics question.

Main Results:

Dictionary-based and redundancy-based QA approaches are relatively ineffective for complex genomics questions.
The deep QA approach using GOCat significantly improves both the quantity and quality of extracted answers.
A +100% improvement in both recall and precision was observed when using GOCat for complex answers like protein functional descriptions.

Conclusions:

Standard QA methods are insufficient for complex genomics questions requiring Gene Ontology (GO) concepts.
A deep QA approach, exemplified by GOCat, effectively utilizes curated biological data to infer answers.
Supervised classification with curated data offers a substantial improvement in QA system performance for biomedical literature.