Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.
The...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the Guinness...

Estimating Population Standard Deviation

Estimating Population Standard Deviation

When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance, comparing...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Sequential imputation for missing values.

Computational biology and chemistry·2007

Same journal

Integrative in silico analysis identifies functionally and regulatively relevant nsSNPs in the TRIB3 gene.

Computational biology and chemistry·2026

Same journal

MARS: Multi-anchor reasoning for reliable toxicity prediction under distribution shift.

Computational biology and chemistry·2026

Same journal

Zadeh-based fuzzy analysis of carreau tri-hybrid nanofluid hemodynamics in a straight artery with irregular triangular stenosis.

Computational biology and chemistry·2026

Same journal

Exploring C<sub>6</sub>N<sub>6</sub> as an effective drug delivery carrier for anticancer drugs mercaptopurine and thiotepa: A DFT and MD approach.

Computational biology and chemistry·2026

Same journal

Role of Artificial Intelligence in bioinformatics: Revolutionizing molecular docking and DNA tokenization.

Computational biology and chemistry·2026

Same journal

An interpretable framework for cancer drug response prediction using integrated drug and multi-omics data with a hybrid Bi-LSTM-GRU network.

Computational biology and chemistry·2026

See all related articles

Search research articles

Related Experiment Videos

Robust data imputation.

Karlien Vanden Branden¹, Sabine Verboven

¹Joint Research Centre, TP 361, 21020 Ispra VA, Italy.

Computational Biology and Chemistry

|September 6, 2008

Summary

This summary is machine-generated.

This study introduces a new robust imputation method for bioinformatics data, addressing outlier issues in gene expression data. The method improves accuracy and data cleaning for reliable statistical analysis.

Related Experiment Videos

Area of Science:

Bioinformatics
Statistical Genetics
Computational Biology

Background:

Missing data imputation is crucial in bioinformatics, particularly for gene expression data.
Existing single imputation methods often lack robustness and are sensitive to outliers.
Outliers in gene expression data can negatively impact imputed values and subsequent analyses.

Purpose of the Study:

To evaluate the performance of existing imputation techniques in the presence of outliers.
To introduce a novel robust imputation method designed to handle outliers effectively.
To demonstrate the benefits of the proposed method, including data cleaning and extension to multiple imputation.

Main Methods:

A simulation study was conducted to test various imputation techniques with outlying gene expression data.
A new robust imputation method was developed and implemented.
The method was extended to a multiple imputation approach.
A classification example was used to illustrate the method's performance.

Main Results:

Existing imputation methods showed a lack of robustness when outliers were present in gene expression data.
The newly developed robust imputation method effectively handled outliers, improving imputation accuracy.
The robust imputation procedure also demonstrated data cleaning capabilities.
The multiple imputation extension of the robust method effectively addressed the uncertainty of imputed values.

Conclusions:

Robust imputation methods are essential for accurate analysis of gene expression data containing outliers.
The proposed robust imputation method offers improved performance over existing techniques.
The method provides a valuable tool for data cleaning and reliable statistical inference in bioinformatics.
The multiple imputation extension enhances the handling of imputation uncertainty.