Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Structure of a Gene

Structure of a Gene

A gene is the fundamental unit of heredity. Every individual has two copies of each gene, one inherited from each parent. Although most people contain the same genes, there is a small fraction that is slightly different amongst people. A gene with a small difference in its sequence of DNA bases forms different alleles, contributing to different phenotypes.
However, only 1% of the DNA is composed of genes that encode proteins; the rest, 99% is non-coding DNA. This non-coding DNA performs...

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Large Language Models Meet Biomedical Knowledge Graphs for Mechanistically Grounded Therapeutic Prioritization.

ArXiv·2026

Same author

DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI.

ArXiv·2026

Same author

MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering.

ArXiv·2026

Same author

Enhancing the quality and trustworthiness of large language model-generated summaries of clinical oncology literature.

JAMIA open·2026

Same author

On the state of protein function prediction: a report on the fourth CAFA challenge.

bioRxiv : the preprint server for biology·2026

Same author

TCBLex - A lexical database of Finnish literary texts for children.

Behavior research methods·2025

Same journal

Analysis of strength degradation of coal and rock masses and stability of mined areas under long term immersion environment.

PloS one·2026

Same journal

Biogenic Silver-Selenium nanocomposite with anticancer activity and potent efficacy against vancomycin-resistant Staphylococcus aureus.

PloS one·2026

Same journal

Preparation and physicochemical characterization of a biodegradable chitosan/carboxymethyl cellulose hydrogel synthesized in NaOH/urea medium.

PloS one·2026

Same journal

Action-guilt, survivor-guilt, and depression in combat-related PTSD.

PloS one·2026

Same journal

Explainable machine learning for predicting activities of daily living at discharge in stroke patients: A retrospective study using SHAP interpretability.

PloS one·2026

Same journal

Deep learning based two-way feature depiction model for brain tumor detection.

PloS one·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 12, 2026

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Published on: December 22, 2017

Large-scale event extraction from literature with multi-level gene normalization.

Sofie Van Landeghem¹, Jari Björne, Chih-Hsuan Wei

¹Department of Plant Systems Biology, VIB, Gent, Belgium.

|April 25, 2013

Summary

This summary is machine-generated.

This study introduces an automated text mining system for life sciences, linking biological concepts to database identifiers across millions of articles. The resulting comprehensive dataset aids database curation and pathway analysis.

More Related Videos

Large-Scale Multi-Omics Genome-Wide Association Studies (Mo-GWAS): Guidelines for Sample Preparation and Normalization

Large-Scale Multi-Omics Genome-Wide Association Studies (Mo-GWAS): Guidelines for Sample Preparation and Normalization

Published on: July 27, 2021

Related Experiment Videos

Last Updated: May 12, 2026

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Comprehensive Workflow for the Genome-wide Identification and Expression Meta-analysis of the ATL E3 Ubiquitin Ligase Gene Family in Grapevine

Published on: December 22, 2017

Large-Scale Multi-Omics Genome-Wide Association Studies (Mo-GWAS): Guidelines for Sample Preparation and Normalization

Large-Scale Multi-Omics Genome-Wide Association Studies (Mo-GWAS): Guidelines for Sample Preparation and Normalization

Published on: July 27, 2021

Area of Science:

Biomedical Informatics
Computational Biology
Text Mining

Background:

Automated text mining is crucial for life sciences, aiding database curation, knowledge summarization, and information retrieval.
Scaling text mining tools to millions of articles and linking analyses to biomolecular databases (e.g., UniProt, KEGG) is essential for comprehensive coverage.

Purpose of the Study:

To develop and evaluate a text mining strategy that normalizes biological concepts in text to database identifiers.
To create a large-scale, publicly available resource of biomolecular events and gene/protein mentions from biomedical literature.

Main Methods:

Combined and improved two state-of-the-art text mining components for normalization and event extraction.
Processed 21.9 million PubMed abstracts and 460,000 PubMed Central open access full-text articles.
Mapped biological concepts to identifiers at varying granularity levels.

Main Results:

Generated a dataset of 40 million biomolecular events involving 76 million gene/protein mentions across 5032 species.
Linked mentions to 122,000 distinct genes.
Demonstrated promising results for database and pathway curation.

Conclusions:

The developed text mining approach and resulting dataset offer significant value for life science research and database curation.
The software components are open-source, and the dataset is freely accessible via API and bulk download, promoting further bioinformatic analyses.