Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genome Annotation and Assembly03:36

Genome Annotation and Assembly

21.2K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
21.2K
X-ray Diffraction of Biological Samples01:10

X-ray Diffraction of Biological Samples

5.0K
X-ray diffraction or XRD is an analytical tool that utilizes X-rays to study ordered structures such as crystalline organic and inorganic samples, polycrystalline materials, proteins, carbohydrates, and drugs.
According to Bragg's law, when X-rays strike the sample positioned on a stage, the rays are  scattered by the electron clouds around the sample atoms. The  X-ray diffraction or scattering is caused by constructive interference of the X-ray waves that reflect off the internal...
5.0K
Molecular Chaperones and Protein Folding03:00

Molecular Chaperones and Protein Folding

15.2K
15.2K
Molecular Chaperones and Protein Folding03:00

Molecular Chaperones and Protein Folding

20.5K
The native conformation of a protein is formed by interactions between the side chains of its constituent amino acids. When the amino acids cannot form these interactions, the protein cannot fold by itself and needs chaperones. Notably, chaperones do not relay any additional information required for the folding of polypeptides; the native conformation of a protein is determined solely by its amino acid sequence. Chaperones catalyze protein folding without being a part of the folded protein.
The...
20.5K
Sanger Sequencing01:57

Sanger Sequencing

775.9K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
775.9K
Protein Folding Quality Check in the RER01:29

Protein Folding Quality Check in the RER

5.4K
ER is the primary site for the maturation and folding of soluble and transmembrane secretory proteins. The calnexin cycle is a specific chaperone system that folds and assesses the confirmation of N-glycosylated proteins before they can exit the ER lumen. The primary players of this quality check pipeline are the lectins, ER-resident chaperones, and a glucosyl transferase enzyme. In case the calnexin system in the lumen fails to salvage a misfolded protein, it is transported to the cytoplasm...
5.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

From SMILES to Enhanced Molecular Property Prediction: A Unified Multimodal Framework with Predicted 3D Conformers and Contrastive Learning Techniques.

Journal of chemical information and modeling·2024
Same author

Multitask Learning on Graph Convolutional Residual Neural Networks for Screening of Multitarget Anticancer Compounds.

Journal of chemical information and modeling·2024
Same author

An efficient hybrid deep learning architecture for predicting short antimicrobial peptides.

Proteomics·2024
Same author

Predicting Cardiotoxicity of Molecules Using Attention-Based Graph Neural Networks.

Journal of chemical information and modeling·2024
Same author

Herpes zoster vaccine safety in the Aotearoa New Zealand population: a self-controlled case series study.

Nature communications·2023
Same author

iNSP-GCAAP: Identifying nonclassical secreted proteins using global composition of amino acid properties.

Proteomics·2022
Same journal

QSAR in the Browser: An Interactive Cheminformatics Web Application.

Journal of chemical information and modeling·2026
Same journal

FoldDoF: Utilizing the Primary Degrees of Freedom of Protein Backbone for Geometric Modeling and Generation.

Journal of chemical information and modeling·2026
Same journal

Derisking Affinity Optimization for Macrocycles and Cyclic Peptides: High-Precision Free Energy Simulations across Five Diverse Targets.

Journal of chemical information and modeling·2026
Same journal

An End-User Audit of Reproducibility, Data Leakage, and Overfitting of the Top-Ranked ADMET Prediction Models in TDC Leaderboards.

Journal of chemical information and modeling·2026
Same journal

PFASGroups: An Open-Source Framework for Automated Identification, Structural Classification, and Prioritization of Per- and Polyfluoroalkyl Substances.

Journal of chemical information and modeling·2026
Same journal

DeepKbhb: Context-Aware Prediction of Human Lysine β-Hydroxybutyrylation Sites.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Video

Updated: Feb 27, 2026

Author Spotlight: Advancing Biotherapeutic Mass Calculation by Introducing mAbScale, a Python-Based Desktop Application
04:24

Author Spotlight: Advancing Biotherapeutic Mass Calculation by Introducing mAbScale, a Python-Based Desktop Application

Published on: June 16, 2023

2.4K

MEHC-Curation: A Python Framework for High-Quality Molecular Data Set Curation.

Trong-Chinh Pham1, Nhat-Anh Nguyen-Dang2, Thanh-Hoang Nguyen-Vo3,4

  • 1School of Biotechnology, International University - VNU HCMC, Quarter 33, Linh Xuan Ward, Ho Chi Minh City 700000, Vietnam.

Journal of Chemical Information and Modeling
|February 26, 2026
PubMed
Summary
This summary is machine-generated.

MEHC-curation is a new Python framework that simplifies molecular data curation for quantitative structure-activity relationship (QSAR) modeling and drug discovery. It ensures high-quality datasets by validating, cleaning, and normalizing chemical structures, making complex data preparation accessible to all researchers.

More Related Videos

Curation of Computational Chemical Libraries Demonstrated with Alpha-Amino Acids
08:21

Curation of Computational Chemical Libraries Demonstrated with Alpha-Amino Acids

Published on: April 13, 2022

3.1K
Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

4.3K

Related Experiment Videos

Last Updated: Feb 27, 2026

Author Spotlight: Advancing Biotherapeutic Mass Calculation by Introducing mAbScale, a Python-Based Desktop Application
04:24

Author Spotlight: Advancing Biotherapeutic Mass Calculation by Introducing mAbScale, a Python-Based Desktop Application

Published on: June 16, 2023

2.4K
Curation of Computational Chemical Libraries Demonstrated with Alpha-Amino Acids
08:21

Curation of Computational Chemical Libraries Demonstrated with Alpha-Amino Acids

Published on: April 13, 2022

3.1K
Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

4.3K

Area of Science:

  • Computational Chemistry
  • cheminformatics
  • Drug Discovery

Background:

  • High-quality molecular data is essential for reliable Quantitative Structure-Activity Relationship (QSAR) modeling and drug discovery.
  • Existing molecular databases often contain inaccuracies like invalid structures and duplicates, which negatively impact model performance and reproducibility.
  • Current data curation tools demand significant domain expertise and complex procedures, posing challenges for novice and nonexpert users.

Purpose of the Study:

  • To develop a user-friendly Python framework, MEHC-curation, that simplifies molecular data set curation for researchers of all expertise levels.
  • To provide an accessible tool for curating chemical structures (SMILES strings), thereby lowering barriers to entry in QSAR modeling and drug discovery.
  • To integrate seamlessly into existing drug discovery and QSAR workflows, enhancing data quality and reproducibility.

Main Methods:

  • Developed MEHC-curation, a Python framework implementing a three-stage pipeline: Validation, Cleaning, and Normalization.
  • Integrated functionalities for duplicate removal and comprehensive error tracking within the curation process.
  • Focused on simplifying the curation of SMILES strings to make the process straightforward and efficient.

Main Results:

  • MEHC-curation successfully simplifies the intricate process of molecular data curation.
  • The framework ensures high-quality molecular datasets by addressing common inaccuracies such as invalid structures and duplicates.
  • The tool is designed for ease of use, requiring no specialized expertise, thus democratizing data curation.

Conclusions:

  • MEHC-curation provides an accessible and efficient solution for molecular data curation, crucial for QSAR modeling and drug discovery.
  • The framework empowers researchers, including those new to the field, to generate reliable datasets.
  • By simplifying data preparation, MEHC-curation facilitates improved model performance and reproducibility in computational chemistry and drug discovery research.