Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantitative Analysis

Quantitative Analysis

Quantitative analysis is a technique for measuring the amount of specific constituents in a sample. When the sample's composition is unknown, qualitative analysis is performed first to identify its components, which ensures that the correct substances are measured during the quantitative phase.
In quantitative analysis, two key measurements are made: the sample quantity and a property proportional to the amount of the analyte (the substance being analyzed). This forms the basis of the...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Quartile

Quartile

Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first, find the median or second quartile. The first quartile, Q1, is the middle value of the lower half of the data, and the third quartile, Q3, is the middle value, or median, of the upper half of the data. To get the idea, consider the same data set:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
The median or second quartile is seven. The lower half of the...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Modified Boxplots

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same author

Retinol-guided liposomal platform for targeted pirfenidone delivery in hepatic fibrosis.

Journal of nanobiotechnology·2026

Same author

Combinatorial optimization enhanced by shallow quantum circuits with 104 superconducting qubits.

National science review·2026

Same author

Medicare Insurance Type and Broad Genomic Profiling in Metastatic Cancer.

JAMA network open·2026

Same author

Doubly Robust Estimators of the Restricted Mean Time in Favor Estimands in Individual- and Cluster-Randomized Trials.

Statistics in medicine·2026

Same author

JOINT IDENTIFICATION OF SPATIALLY VARIABLE GENES VIA A NETWORK-ASSISTED BAYESIAN REGULARIZATION APPROACH.

The annals of applied statistics·2026

Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026

Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026

Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026

Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026

Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026

Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 29, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Lq-based robust analytics on ultrahigh and high dimensional data.

Jiachen Chen¹, Ruofan Bie², Yichen Qin³

¹Department of Biostatistics, Boston University, Boston, MA, USA.

Statistics in Medicine

|September 13, 2022

Summary

This summary is machine-generated.

This study introduces a novel feature selection method robust to outliers and model misspecification in high-dimensional regression. The minimum Lq-likelihood estimation (MLqE) framework improves variable identification and parameter accuracy.

Keywords:

Lq entropy SKCM contamination screening variable selection

More Related Videos

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Related Experiment Videos

Last Updated: Aug 29, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Area of Science:

Statistics
Data Science
Bioinformatics

Background:

High-dimensional data is prevalent in fields like omics and finance.
Data contamination, including outliers and model misspecification, poses significant challenges in regression analysis.

Purpose of the Study:

To develop a robust feature screening and selection framework for ultrahigh and high-dimensional data.
To address both dimension issues and data contamination (outliers and model misspecification).

Main Methods:

Proposed a framework based on minimum Lq-likelihood estimation (MLqE).
MLqE is designed to be robust to outliers and account for model misspecification.
Framework evaluated through numerical analysis and real-world data application.

Main Results:

The proposed MLqE framework demonstrated robustness under various contamination scenarios.
Real data analysis on skin cutaneous melanoma data showed superior performance.
Outperformed traditional methods in variable identification effectiveness and parameter estimation accuracy.

Conclusions:

The MLqE-based feature selection framework offers a powerful solution for analyzing contaminated high-dimensional data.
This method enhances reliability in regression analysis across diverse scientific fields.