Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Data: Types and Distribution01:19

Data: Types and Distribution

2.2K
In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...
2.2K
Probability Distributions01:32

Probability Distributions

10.0K
 The probability of a random variable x  is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...
10.0K
Binomial Probability Distribution01:15

Binomial Probability Distribution

13.1K
A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...
13.1K
Student t Distribution01:31

Student t Distribution

11.6K
The population standard deviation is rarely known in many day-to-day examples of statistics. When the sample sizes are large, it is easy to estimate the population standard deviation using a confidence interval, which provides results close enough to the original value. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
The Student t distribution was developed by William S. Goset (1876–1937) of the...
11.6K
Poisson Probability Distribution01:09

Poisson Probability Distribution

10.0K
A Poisson probability distribution is a discrete probability distribution. It gives the probability of a number of events occurring in a fixed interval of time or space if these events happen at a known average rate and independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on average, there are five words spelled incorrectly in 100 pages. The interval is 100 pages.
The...
10.0K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.5K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Multidimensional exposure architecture shapes vaping-associated transcriptomic dysregulation in oral epithelium.

Frontiers in oncology·2026
Same author

The association of epigenetic age acceleration with internal smoking dose, risk of lung cancer, and all-cause mortality in cigarette smokers: the Multiethnic Cohort study.

Clinical epigenetics·2026
Same author

Interaction of genetic and lifestyle risk scores on colorectal cancer risk across five racial and ethnic populations.

Journal of the National Cancer Institute·2026
Same author

The effect of cigarette exposure on placental epigenetics: A systematic review.

Reproductive toxicology (Elmsford, N.Y.)·2026
Same author

Systematic review and meta-analysis corrected for history of smoking tobacco identifies type 1 diabetes as a possible risk factor for bladder cancer.

Diabetes research and clinical practice·2025
Same author

Genome-wide profiling of unmodified DNA using methyltransferase-directed tagging and enrichment.

Cell reports methods·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Apr 28, 2026

How to Create and Use Binocular Rivalry
14:34

How to Create and Use Binocular Rivalry

Published on: November 10, 2010

78.0K

Non-specific filtering of beta-distributed data.

Xinhui Wang, Peter W Laird, Toshinori Hinoue

  • 1Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N Soto Street, Suite 202W, Los Angeles 90089-9239 California, USA. kims@usc.edu.

BMC Bioinformatics
|June 20, 2014
PubMed
Summary
This summary is machine-generated.

We developed a new DNA methylation data filter that improves cancer subtype detection by addressing biases in standard methods. This novel approach, utilizing a variance-stabilizing transformation, offers a valuable alternative for feature selection in high-dimensional molecular studies.

More Related Videos

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

10.6K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.6K

Related Experiment Videos

Last Updated: Apr 28, 2026

How to Create and Use Binocular Rivalry
14:34

How to Create and Use Binocular Rivalry

Published on: November 10, 2010

78.0K
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

10.6K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.6K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Non-specific feature selection is crucial for dimension reduction in high-dimensional molecular data analysis.
  • Standard deviation filtering in DNA methylation studies can bias probe selection due to the mean-variance relationship of Beta-distributed data.
  • This bias can impact the effectiveness of subsequent cluster analysis.

Purpose of the Study:

  • To explore the impact of standard deviation filtering bias on DNA methylation data clustering.
  • To develop and evaluate novel filter methods that overcome this bias.
  • To improve the identification of biological subtypes, particularly cancer phenotypes, using cluster analysis.

Main Methods:

  • Comparison of 11 non-specific filters across eight Infinium HumanMethylation datasets.
  • Development of a novel filter statistic using a variance-stabilizing transformation for Beta-distributed data.
  • Evaluation of filter performance in detecting cancer subtypes and distinguishing normal tissue subgroups.

Main Results:

  • The novel filter, using a variance-stabilizing transformation, outperformed standard deviation filters in detecting cancer subtypes with specific methylation patterns (CpG island methylator phenotype).
  • Standard deviation filters remained effective for distinguishing normal tissue subgroups.
  • Different filters prioritized features from distinct genomic contexts (e.g., CpG island promoters vs. intergenic regions), with some overlap in discovered sample subsets.

Conclusions:

  • Two distinct filter statistics, prioritizing different feature characteristics, demonstrate efficacy in identifying cancer/non-cancer clusters and specific cancer methylation phenotypes.
  • Both novel and standard filters are valuable for discovery-driven cluster analysis.
  • Recommends applying both filter types to new datasets and evaluating feature/cluster overlap for comprehensive analysis.