Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sampling Distribution01:12

Sampling Distribution

16.4K
Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...
16.4K
Central Limit Theorem01:14

Central Limit Theorem

19.2K
The central limit theorem, abbreviated as clt, is one of the most powerful and useful ideas in all of statistics. The central limit theorem for sample means says that if you repeatedly draw samples of a given size and calculate their means, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. In other words, as sample sizes increase, the distribution of means follows the normal distribution more closely.
The sample size, n, that...
19.2K
Sampling Theorem01:15

Sampling Theorem

1.2K
In signal processing, the analysis of continuous-time signals, denoted as x(t), often involves sampling techniques to convert these signals into discrete-time signals. This process is essential for digital representation and manipulation. A critical component in sampling is the train of impulses, characterized by the sampling interval and the sampling frequency. The relationship between these parameters and the original signal's properties dictates the success of the sampling process.
1.2K
Binomial Probability Distribution01:15

Binomial Probability Distribution

15.0K
A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...
15.0K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

5.0K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
5.0K
Probability Distributions01:32

Probability Distributions

11.5K
 The probability of a random variable x  is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...
11.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Explainable artificial intelligence reveals divergent learning in pharmacophore-based hierarchical pooling graph neural networks.

Scientific reports·2026
Same author

Categorization of Protein Kinases by Combining Data from Cell Biology and Medicinal Chemistry Enables Further Evaluation and Differentiation of the Understudied Kinome.

Journal of medicinal chemistry·2026
Same author

Explainable artificial intelligence for molecular design in pharmaceutical research.

Chemical science·2026
Same author

Transformer Learning in Sequence-Based Drug Design Depends on Compound Memorization and Similarity of Sequence-Compound Pairs.

Molecular informatics·2026
Same author

Identifying and evaluating understudied protein kinases using biological and chemical criteria.

RSC medicinal chemistry·2025
Same author

Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values.

Molecular informatics·2025
Same journal

Cyber Military Operations under International Humanitarian Law: Interpreting the Concept of "Attack" and Challenges in Protecting Civilians.

F1000Research·2026
Same journal

Sentiment Analysis of Acceptance TVET Online Courses on the Skill Academy App from Google Play: Leveraging Text Mining with Comparison Machine Learning Model.

F1000Research·2026
Same journal

Emotional intelligence: An important skill to learn now more than ever.

F1000Research·2026
Same journal

East Mediterranean Lineage of <i>Brucella melitensis</i> in Human Isolates and Milk Samples in Oman Using MLVA-14.

F1000Research·2026
Same journal

Application of K-Means Clustering for Job Applicant Analysis in Construction Firms Using R.

F1000Research·2026
Same journal

The influence of self-esteem and emotional intelligence on addiction to social networks in Peruvian university students.

F1000Research·2026
See all related articles

Related Experiment Video

Updated: Dec 26, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K

ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

Martin Vogt1, Jürgen Bajorath1

  • 1Department of Life Science Informatics, B-IT, University of Bonn, Endenicher Allee 19c, Bonn, NRW, 53115, Germany.

F1000Research
|March 18, 2020
PubMed
Summary
This summary is machine-generated.

The ccbmlib Python package models Tanimoto coefficient distributions for RDKit fingerprints. It assesses statistical significance and conditional rankings for molecular similarity searches.

Keywords:
Bernoulli modelTanimoto coefficient.fingerprintsp-valuesimilarity value distributions

More Related Videos

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM
12:26

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Published on: October 11, 2016

13.7K
Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

2.1K

Related Experiment Videos

Last Updated: Dec 26, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K
Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM
12:26

Integrating Remote Sensing with Species Distribution Models; Mapping Tamarisk Invasions Using the Software for Assisted Habitat Modeling SAHM

Published on: October 11, 2016

13.7K
Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

2.1K

Area of Science:

  • Computational chemistry
  • Cheminformatics
  • Statistical modeling

Background:

  • Molecular similarity is crucial in drug discovery and chemical informatics.
  • Tanimoto coefficients are widely used to quantify molecular similarity.
  • Different molecular fingerprints can yield varying similarity results.

Purpose of the Study:

  • To introduce the ccbmlib Python package for modeling similarity distributions.
  • To enable statistical assessment of Tanimoto coefficients across diverse fingerprint types.
  • To evaluate how molecular similarity is represented by different fingerprint methods.

Main Methods:

  • Utilizing the ccbmlib Python package to model Tanimoto coefficient distributions.
  • Applying p-values for quantitative comparison of similarity scores.
  • Modeling conditional similarity distributions for ranked similarity searches.
  • Statistical analysis of feature distributions and correlations in fingerprints.

Main Results:

  • Developed accurate models for 11 RDKit fingerprints using ChEMBL data.
  • Achieved high accuracy with differences of 1% or less in Tanimoto coefficients for high similarity.
  • Demonstrated the utility of p-values for comparing diverse fingerprint representations.
  • Conditional significance scores effectively estimate compound rankings in similarity searches.

Conclusions:

  • The ccbmlib package provides robust statistical tools for analyzing molecular similarity.
  • It facilitates quantitative comparisons and ranking estimations across various fingerprint types.
  • The models show high fidelity in representing molecular similarity, aiding cheminformatics applications.