Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantitative Analysis01:12

Quantitative Analysis

556
Quantitative analysis is a technique for measuring the amount of specific constituents in a sample. When the sample's composition is unknown, qualitative analysis is performed first to identify its components, which ensures that the correct substances are measured during the quantitative phase.
In quantitative analysis, two key measurements are made: the sample quantity and a property proportional to the amount of the analyte (the substance being analyzed). This forms the basis of the...
556
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.4K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.4K
Quartile01:15

Quartile

4.5K
Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first, find the median or second quartile. The first quartile, Q1, is the middle value of the lower half of the data, and the third quartile, Q3, is the middle value, or median, of the upper half of the data. To get the idea, consider the same data set:
1; 1; 2; 2; 4; 6; 6.8; 7.2; 8; 8.3; 9; 10; 10; 11.5
The median or second quartile is seven. The lower half of the...
4.5K
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

7.2K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
7.2K
Modified Boxplots00:57

Modified Boxplots

9.9K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
9.9K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same author

Retinol-guided liposomal platform for targeted pirfenidone delivery in hepatic fibrosis.

Journal of nanobiotechnology·2026
Same author

Combinatorial optimization enhanced by shallow quantum circuits with 104 superconducting qubits.

National science review·2026
Same author

Medicare Insurance Type and Broad Genomic Profiling in Metastatic Cancer.

JAMA network open·2026
Same author

Doubly Robust Estimators of the Restricted Mean Time in Favor Estimands in Individual- and Cluster-Randomized Trials.

Statistics in medicine·2026
Same author

JOINT IDENTIFICATION OF SPATIALLY VARIABLE GENES VIA A NETWORK-ASSISTED BAYESIAN REGULARIZATION APPROACH.

The annals of applied statistics·2026
Same journal

Optimal Weighted Tests for Replication Studies and the 'Two-Trials Rule' With Multiple Hypotheses.

Statistics in medicine·2026
Same journal

Identifiable Copula-Double-Cox Models: A Fully Parametric Framework for Dependent Right-Censored Survival Data.

Statistics in medicine·2026
Same journal

Moving From Individualized Risk-Based Prevention to Benefit-Based Prevention: Estimating Individualized Life-Years Gained From Prevention Services as a Basis for Eligibility.

Statistics in medicine·2026
Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026
Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026
Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026
See all related articles

Related Experiment Video

Updated: Aug 29, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

674

Lq-based robust analytics on ultrahigh and high dimensional data.

Jiachen Chen1, Ruofan Bie2, Yichen Qin3

  • 1Department of Biostatistics, Boston University, Boston, MA, USA.

Statistics in Medicine
|September 13, 2022
PubMed
Summary
This summary is machine-generated.

This study introduces a novel feature selection method robust to outliers and model misspecification in high-dimensional regression. The minimum Lq-likelihood estimation (MLqE) framework improves variable identification and parameter accuracy.

Keywords:
Lq entropySKCMcontaminationscreeningvariable selection

More Related Videos

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

8.8K
Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

16.1K

Related Experiment Videos

Last Updated: Aug 29, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

674
Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

8.8K
Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

16.1K

Area of Science:

  • Statistics
  • Data Science
  • Bioinformatics

Background:

  • High-dimensional data is prevalent in fields like omics and finance.
  • Data contamination, including outliers and model misspecification, poses significant challenges in regression analysis.

Purpose of the Study:

  • To develop a robust feature screening and selection framework for ultrahigh and high-dimensional data.
  • To address both dimension issues and data contamination (outliers and model misspecification).

Main Methods:

  • Proposed a framework based on minimum Lq-likelihood estimation (MLqE).
  • MLqE is designed to be robust to outliers and account for model misspecification.
  • Framework evaluated through numerical analysis and real-world data application.

Main Results:

  • The proposed MLqE framework demonstrated robustness under various contamination scenarios.
  • Real data analysis on skin cutaneous melanoma data showed superior performance.
  • Outperformed traditional methods in variable identification effectiveness and parameter estimation accuracy.

Conclusions:

  • The MLqE-based feature selection framework offers a powerful solution for analyzing contaminated high-dimensional data.
  • This method enhances reliability in regression analysis across diverse scientific fields.