Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Sampling Distribution

Sampling Distribution

Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...

Central Limit Theorem

Central Limit Theorem

The central limit theorem, abbreviated as clt, is one of the most powerful and useful ideas in all of statistics. The central limit theorem for sample means says that if you repeatedly draw samples of a given size and calculate their means, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. In other words, as sample sizes increase, the distribution of means follows the normal distribution more closely.
The sample size, n, that...

Sampling Theorem

Sampling Theorem

In signal processing, the analysis of continuous-time signals, denoted as x(t), often involves sampling techniques to convert these signals into discrete-time signals. This process is essential for digital representation and manipulation. A critical component in sampling is the train of impulses, characterized by the sampling interval and the sampling frequency. The relationship between these parameters and the original signal's properties dictates the success of the sampling process.

Binomial Probability Distribution

Binomial Probability Distribution

A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Probability Distributions

Probability Distributions

The probability of a random variable x is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Explainable artificial intelligence reveals divergent learning in pharmacophore-based hierarchical pooling graph neural networks.

Scientific reports·2026

Same author

Categorization of Protein Kinases by Combining Data from Cell Biology and Medicinal Chemistry Enables Further Evaluation and Differentiation of the Understudied Kinome.

Journal of medicinal chemistry·2026

Same author

Explainable artificial intelligence for molecular design in pharmaceutical research.

Chemical science·2026

Same author

Transformer Learning in Sequence-Based Drug Design Depends on Compound Memorization and Similarity of Sequence-Compound Pairs.

Molecular informatics·2026

Same author

Identifying and evaluating understudied protein kinases using biological and chemical criteria.

RSC medicinal chemistry·2025

Same author

Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values.

Molecular informatics·2025

Same journal

Cyber Military Operations under International Humanitarian Law: Interpreting the Concept of "Attack" and Challenges in Protecting Civilians.

F1000Research·2026

Same journal

Sentiment Analysis of Acceptance TVET Online Courses on the Skill Academy App from Google Play: Leveraging Text Mining with Comparison Machine Learning Model.

F1000Research·2026

Same journal

Emotional intelligence: An important skill to learn now more than ever.

F1000Research·2026

Same journal

East Mediterranean Lineage of <i>Brucella melitensis</i> in Human Isolates and Milk Samples in Oman Using MLVA-14.

F1000Research·2026

Same journal

Application of K-Means Clustering for Job Applicant Analysis in Construction Firms Using R.

F1000Research·2026

Same journal

The influence of self-esteem and emotional intelligence on addiction to social networks in Peruvian university students.

F1000Research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 26, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

Martin Vogt¹, Jürgen Bajorath¹

¹Department of Life Science Informatics, B-IT, University of Bonn, Endenicher Allee 19c, Bonn, NRW, 53115, Germany.

|March 18, 2020

Summary

This summary is machine-generated.

The ccbmlib Python package models Tanimoto coefficient distributions for RDKit fingerprints. It assesses statistical significance and conditional rankings for molecular similarity searches.

Keywords:

Bernoulli model Tanimoto coefficient.fingerprints p-value similarity value distributions

More Related Videos

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Published on: October 11, 2016

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Related Experiment Videos

Last Updated: Dec 26, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Published on: October 11, 2016

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Area of Science:

Computational chemistry
Cheminformatics
Statistical modeling

Background:

Molecular similarity is crucial in drug discovery and chemical informatics.
Tanimoto coefficients are widely used to quantify molecular similarity.
Different molecular fingerprints can yield varying similarity results.

Purpose of the Study:

To introduce the ccbmlib Python package for modeling similarity distributions.
To enable statistical assessment of Tanimoto coefficients across diverse fingerprint types.
To evaluate how molecular similarity is represented by different fingerprint methods.

Main Methods:

Utilizing the ccbmlib Python package to model Tanimoto coefficient distributions.
Applying p-values for quantitative comparison of similarity scores.
Modeling conditional similarity distributions for ranked similarity searches.
Statistical analysis of feature distributions and correlations in fingerprints.

Main Results:

Developed accurate models for 11 RDKit fingerprints using ChEMBL data.
Achieved high accuracy with differences of 1% or less in Tanimoto coefficients for high similarity.
Demonstrated the utility of p-values for comparing diverse fingerprint representations.
Conditional significance scores effectively estimate compound rankings in similarity searches.

Conclusions:

The ccbmlib package provides robust statistical tools for analyzing molecular similarity.
It facilitates quantitative comparisons and ranking estimations across various fingerprint types.
The models show high fidelity in representing molecular similarity, aiding cheminformatics applications.