Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Data: Types and Distribution

Data: Types and Distribution

In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...

Probability Distributions

Probability Distributions

The probability of a random variable x is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...

Binomial Probability Distribution

Binomial Probability Distribution

A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...

Student t Distribution

Student t Distribution

The population standard deviation is rarely known in many day-to-day examples of statistics. When the sample sizes are large, it is easy to estimate the population standard deviation using a confidence interval, which provides results close enough to the original value. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
The Student t distribution was developed by William S. Goset (1876–1937) of the...

Poisson Probability Distribution

Poisson Probability Distribution

A Poisson probability distribution is a discrete probability distribution. It gives the probability of a number of events occurring in a fixed interval of time or space if these events happen at a known average rate and independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on average, there are five words spelled incorrectly in 100 pages. The interval is 100 pages.
The...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Multidimensional exposure architecture shapes vaping-associated transcriptomic dysregulation in oral epithelium.

Frontiers in oncology·2026

Same author

The association of epigenetic age acceleration with internal smoking dose, risk of lung cancer, and all-cause mortality in cigarette smokers: the Multiethnic Cohort study.

Clinical epigenetics·2026

Same author

Interaction of genetic and lifestyle risk scores on colorectal cancer risk across five racial and ethnic populations.

Journal of the National Cancer Institute·2026

Same author

The effect of cigarette exposure on placental epigenetics: A systematic review.

Reproductive toxicology (Elmsford, N.Y.)·2026

Same author

Systematic review and meta-analysis corrected for history of smoking tobacco identifies type 1 diabetes as a possible risk factor for bladder cancer.

Diabetes research and clinical practice·2025

Same author

Genome-wide profiling of unmodified DNA using methyltransferase-directed tagging and enrichment.

Cell reports methods·2025

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 28, 2026

How to Create and Use Binocular Rivalry

How to Create and Use Binocular Rivalry

Published on: November 10, 2010

Non-specific filtering of beta-distributed data.

Xinhui Wang, Peter W Laird, Toshinori Hinoue

¹Department of Preventive Medicine, USC Keck School of Medicine, University of Southern California, 2001 N Soto Street, Suite 202W, Los Angeles 90089-9239 California, USA. kims@usc.edu.

BMC Bioinformatics

|June 20, 2014

Summary

This summary is machine-generated.

We developed a new DNA methylation data filter that improves cancer subtype detection by addressing biases in standard methods. This novel approach, utilizing a variance-stabilizing transformation, offers a valuable alternative for feature selection in high-dimensional molecular studies.

More Related Videos

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

Basics of Multivariate Analysis in Neuroimaging Data

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

Related Experiment Videos

Last Updated: Apr 28, 2026

How to Create and Use Binocular Rivalry

How to Create and Use Binocular Rivalry

Published on: November 10, 2010

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

Basics of Multivariate Analysis in Neuroimaging Data

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

Area of Science:

Genomics
Bioinformatics
Computational Biology

Background:

Non-specific feature selection is crucial for dimension reduction in high-dimensional molecular data analysis.
Standard deviation filtering in DNA methylation studies can bias probe selection due to the mean-variance relationship of Beta-distributed data.
This bias can impact the effectiveness of subsequent cluster analysis.

Purpose of the Study:

To explore the impact of standard deviation filtering bias on DNA methylation data clustering.
To develop and evaluate novel filter methods that overcome this bias.
To improve the identification of biological subtypes, particularly cancer phenotypes, using cluster analysis.

Main Methods:

Comparison of 11 non-specific filters across eight Infinium HumanMethylation datasets.
Development of a novel filter statistic using a variance-stabilizing transformation for Beta-distributed data.
Evaluation of filter performance in detecting cancer subtypes and distinguishing normal tissue subgroups.

Main Results:

The novel filter, using a variance-stabilizing transformation, outperformed standard deviation filters in detecting cancer subtypes with specific methylation patterns (CpG island methylator phenotype).
Standard deviation filters remained effective for distinguishing normal tissue subgroups.
Different filters prioritized features from distinct genomic contexts (e.g., CpG island promoters vs. intergenic regions), with some overlap in discovered sample subsets.

Conclusions:

Two distinct filter statistics, prioritizing different feature characteristics, demonstrate efficacy in identifying cancer/non-cancer clusters and specific cancer methylation phenotypes.
Both novel and standard filters are valuable for discovery-driven cluster analysis.
Recommends applying both filter types to new datasets and evaluating feature/cluster overlap for comprehensive analysis.