Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Peptide Identification Using Tandem Mass Spectrometry01:33

Peptide Identification Using Tandem Mass Spectrometry

6.4K
Tandem mass spectrometry, also known as MS/MS or MS2, is an analytical technique that employs two mass analyzers. Essentially it is a series of mass spectrometers that helps isolate a particular biomolecule and then helps study its chemical properties.
This technique helps gather information regarding the protein from which the peptide was obtained and to study the peptides’ amino acid sequence. Identifying peptides from a complex mixture is an important component of the growing field of...
6.4K
Molecular Models02:00

Molecular Models

38.0K
Physical models representing molecular architectures of chemical compounds play essential roles in understanding chemistry. The use of molecular models makes it easier to visualize the structures and shapes of atoms and molecules.
38.0K
Photochemical Electrocyclic Reactions: Stereochemistry01:26

Photochemical Electrocyclic Reactions: Stereochemistry

1.8K
The absorption of UV–visible light by conjugated systems causes the promotion of an electron from the ground state to the excited state. Consequently, photochemical electrocyclic reactions proceed via the excited-state HOMO rather than the ground-state HOMO. Since the ground- and excited-state HOMOs have different symmetries, the stereochemical outcome of electrocyclic reactions depends on the mode of activation; i.e., thermal or photochemical.
Selection Rules: Photochemical Activation
1.8K
SN1 Reaction: Stereochemistry02:15

SN1 Reaction: Stereochemistry

8.3K
This lesson provides an in-depth discussion of the stereochemical outcomes in an SN1 reaction.
In the first step of an SN1 reaction, the bond between the electrophilic carbon and the leaving group ionizes to generate the carbocation intermediate. The second step of the mechanism is the nucleophilic attack.
In the formed carbocation, the positively charged carbon is sp2 hybridized with a trigonal planar geometry. As all the three substituents lie on the same plane, a plane of symmetry for the...
8.3K
Chemical Ionization (CI) Mass Spectrometry01:21

Chemical Ionization (CI) Mass Spectrometry

705
The molecular ion peak of a molecule in the mass spectrum provides vital information for molecular identification. However, conventional electron impact ionization can lead to the rapid dissociation of some molecular ions before they reach the detector. A milder ionization method is required to increase the lifetime of such ionized analyte molecules. Chemical ionization (CI) is a gas-phase protonation reaction useful for mass-analyzing analyte molecules that are easily protonated to yield the...
705
SN2 Reaction: Stereochemistry02:23

SN2 Reaction: Stereochemistry

9.3K
In an SN2 reaction, the nucleophilic attack on the substrate and departure of the leaving group occurs simultaneously through a transition state. As the nucleophile approaches the substrate from the back-side, the configuration of the substrate carbon changes from tetrahedral to trigonal bipyramidal and then back to tetrahedral, leading to an inversion in the configuration of the product.
If the substrate is an achiral molecule at the α-carbon, the inversion of configuration is not...
9.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Machine learning approaches for improved understanding of factors associated with history of sport-related concussion.

Risk analysis : an official publication of the Society for Risk Analysis·2025
Same author

Redefining text-to-SQL metrics by incorporating semantic and structural similarity.

Scientific reports·2025
Same author

Deep learning-based object detection algorithms in medical imaging: Systematic review.

Heliyon·2025
Same author

Comparing Stacking Ensemble Techniques to Improve Musculoskeletal Fracture Image Classification.

Journal of imaging·2024
Same author

Multiparametric MRI dataset for susceptibility-based radiomic feature extraction and analysis.

Scientific data·2024
Same author

Assessing robustness of quantitative susceptibility-based MRI radiomic features in patients with multiple sclerosis.

Scientific reports·2023
Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026
Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026
Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026
Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026
Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026
Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: Jun 9, 2025

Author Spotlight: Unveiling the Potential of VSFG Microscopy in Studying Mesoscopically Heterogeneous Self-Assembled Structures
08:49

Author Spotlight: Unveiling the Potential of VSFG Microscopy in Studying Mesoscopically Heterogeneous Self-Assembled Structures

Published on: December 1, 2023

1.3K

Comparing SMILES and SELFIES tokenization for enhanced chemical language modeling.

Miguelangel Leon1, Yuriy Perezhohin1, Fernando Peres1

  • 1NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, 1070-312, Lisbon, Portugal.

Scientific Reports
|October 24, 2024
PubMed
Summary
This summary is machine-generated.

Atom Pair Encoding (APE) with SMILES representations enhances chemical language model performance in biophysics and physiology tasks. This novel tokenization method improves classification accuracy for drug discovery and material science applications.

More Related Videos

Using a Cyclic Ion Mobility Spectrometer for Tandem Ion Mobility Experiments
08:40

Using a Cyclic Ion Mobility Spectrometer for Tandem Ion Mobility Experiments

Published on: January 20, 2022

4.2K
Open-source Single-particle Analysis for Super-resolution Microscopy with VirusMapper
07:38

Open-source Single-particle Analysis for Super-resolution Microscopy with VirusMapper

Published on: April 9, 2017

10.0K

Related Experiment Videos

Last Updated: Jun 9, 2025

Author Spotlight: Unveiling the Potential of VSFG Microscopy in Studying Mesoscopically Heterogeneous Self-Assembled Structures
08:49

Author Spotlight: Unveiling the Potential of VSFG Microscopy in Studying Mesoscopically Heterogeneous Self-Assembled Structures

Published on: December 1, 2023

1.3K
Using a Cyclic Ion Mobility Spectrometer for Tandem Ion Mobility Experiments
08:40

Using a Cyclic Ion Mobility Spectrometer for Tandem Ion Mobility Experiments

Published on: January 20, 2022

4.2K
Open-source Single-particle Analysis for Super-resolution Microscopy with VirusMapper
07:38

Open-source Single-particle Analysis for Super-resolution Microscopy with VirusMapper

Published on: April 9, 2017

10.0K

Area of Science:

  • Computational chemistry
  • Bioinformatics
  • Machine learning in life sciences

Background:

  • Life sciences research is resource-intensive, often relying on trial and error.
  • Machine learning, particularly Deep Learning, is accelerating scientific discovery.
  • Natural Language Processing (NLP) is being explored for chemical language representation.

Purpose of the Study:

  • To evaluate NLP tokenization methods for chemical language representations (SMILES, SELFIES).
  • To assess the impact of tokenization on BERT-based models for biophysics and physiology classification.
  • To compare Byte Pair Encoding (BPE) with a novel Atom Pair Encoding (APE) approach.

Main Methods:

  • Utilized BERT-based models with SMILES and SELFIES representations.
  • Applied Byte Pair Encoding (BPE) and Atom Pair Encoding (APE) tokenization techniques.
  • Evaluated model performance on HIV, toxicology, and blood-brain barrier penetration datasets using ROC-AUC.

Main Results:

  • Atom Pair Encoding (APE) significantly outperformed Byte Pair Encoding (BPE).
  • APE with SMILES representations showed superior performance in classification tasks.
  • Enhanced preservation of chemical element integrity and contextual relationships by APE.

Conclusions:

  • Tokenization is critical for processing chemical language effectively.
  • Atom Pair Encoding (APE) offers a more robust method for chemical NLP tasks.
  • Refined tokenization techniques can advance drug discovery and material science.