Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sampling Methods: Overview01:06

Sampling Methods: Overview

2.6K
A sample refers to a smaller subset representative of a larger population. In analytical chemistry, studying or analyzing an entire population is often impractical or impossible. Therefore, samples are used to draw inferences and generalize the whole population. The sampling method selects individuals or items from a population to create a sample. Standard sampling methods include random, judgemental, systematic, stratified, and cluster sampling. 
In analytical chemistry, the choice of...
2.6K
Cluster Sampling Method01:20

Cluster Sampling Method

14.0K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
14.0K
Sampling Plans01:23

Sampling Plans

896
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
896
Upsampling01:22

Upsampling

583
Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...
583
Bootstrapping01:24

Bootstrapping

810
The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...
810
Systematic Sampling Method01:17

Systematic Sampling Method

12.5K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
Systematic sampling is one of the simplest methods...
12.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Deep Learning-Driven Saccharide Online Sequencing for Elucidating the Pathological Alterations of Heparan Sulfate in APAP-Induced Acute Liver Injury.

Analytical chemistry·2026
Same author

Pathway Representation via Intrinsic Structural Medoids (PRISM): A Structural Mapping Approach to Clustering Molecular Pathways.

bioRxiv : the preprint server for biology·2026
Same author

A New Family of Seniority-Restricted Coupled Cluster Methods.

The journal of physical chemistry. A·2026
Same author

Exploring New Construction Schemes for Extended-Hierarchy Configuration-Interaction Wave Functions.

The journal of physical chemistry. A·2026
Same author

Efficient exploration of peptide libraries using active learning with AlphaFold-based screening.

bioRxiv : the preprint server for biology·2026
Same author

Scaling <i>k</i>-Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations.

Journal of chemical information and modeling·2026
Same journal

Genetic Impacts on Variability of Body Fat Distribution Uncover Gene-Environment and Gene-Gene Interactions.

bioRxiv : the preprint server for biology·2026
Same journal

16S ribosomal RNA modification drives transcript-specific translation efficiency.

bioRxiv : the preprint server for biology·2026
Same journal

FlcE latches onto the FliL-stator complex to turbocharge flagellar motility in <i>Borrelia burgdorferi</i>.

bioRxiv : the preprint server for biology·2026
Same journal

Synaptic pruning, myelination and the emergence of psychiatric disorders in late adolescence.

bioRxiv : the preprint server for biology·2026
Same journal

Structural and functional insights into the Rcs phosphorelay.

bioRxiv : the preprint server for biology·2026
Same journal

The structural basis of RanGAP1 regulation and catalysis in nuclear transport.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: Jan 16, 2026

An Unbiased Approach of Sampling TEM Sections in Neuroscience
10:56

An Unbiased Approach of Sampling TEM Sections in Neuroscience

Published on: April 13, 2019

7.7K

Undersampling techniques for large datasets.

Lexin Chen1,2, Ramon Alain Miranda Quintana1,2

  • 1Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA.

Biorxiv : the Preprint Server for Biology
|September 26, 2025
PubMed
Summary
This summary is machine-generated.

DNA-Encoded Libraries (DELs) generate vast chemical libraries for drug discovery. This study addresses class imbalance in DEL data by evaluating undersampling techniques to improve machine learning model training.

Keywords:
algorithmscluster chemistrymolecular simulation

More Related Videos

Sampling Soils in a Heterogeneous Research Plot
07:11

Sampling Soils in a Heterogeneous Research Plot

Published on: January 7, 2019

35.8K
Sampling Strategies and Processing of Biobank Tissue Samples from Porcine Biomedical Models
05:07

Sampling Strategies and Processing of Biobank Tissue Samples from Porcine Biomedical Models

Published on: March 6, 2018

16.2K

Related Experiment Videos

Last Updated: Jan 16, 2026

An Unbiased Approach of Sampling TEM Sections in Neuroscience
10:56

An Unbiased Approach of Sampling TEM Sections in Neuroscience

Published on: April 13, 2019

7.7K
Sampling Soils in a Heterogeneous Research Plot
07:11

Sampling Soils in a Heterogeneous Research Plot

Published on: January 7, 2019

35.8K
Sampling Strategies and Processing of Biobank Tissue Samples from Porcine Biomedical Models
05:07

Sampling Strategies and Processing of Biobank Tissue Samples from Porcine Biomedical Models

Published on: March 6, 2018

16.2K

Area of Science:

  • Medicinal Chemistry
  • Chemoinformatics
  • Machine Learning

Background:

  • DNA-Encoded Libraries (DELs) enable rapid synthesis and screening of billions of small molecules.
  • Machine learning (ML) models benefit from DEL binding data for drug discovery.
  • Class imbalance, with far more inactive than active compounds, poses a significant challenge for ML model training in DELs.

Purpose of the Study:

  • To investigate and benchmark various undersampling strategies for the majority (inactive) class in DEL datasets.
  • To assess the impact of these strategies on the performance of ML models trained on imbalanced DEL data.

Main Methods:

  • Exploration of different undersampling techniques for the majority class.
  • Benchmarking undersampling strategies against random selection.
  • Prototyping and evaluation on two distinct DEL datasets.
  • Testing with three different machine learning models.

Main Results:

  • The 'max_sim' undersampling strategy demonstrated superior performance across evaluated metrics.
  • Comparative analysis showed significant improvements over random selection in handling class imbalance.
  • The developed pipeline was successfully implemented within the DELight package.

Conclusions:

  • Undersampling strategies, particularly 'max_sim', are effective in mitigating class imbalance in DEL datasets.
  • Improved ML model training using balanced DEL data can enhance hit identification and drug discovery efforts.
  • The DELight package provides a practical tool for applying these strategies in DEL-based drug discovery.