Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.3K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.3K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.8K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.8K
Outliers and Influential Points01:08

Outliers and Influential Points

6.5K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
6.5K
Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

9.7K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
9.7K
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

522
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
522
What Are Outliers?01:12

What Are Outliers?

5.4K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
5.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Cardiomyocyte-derived USP20 mitigates myocardial ischemia/reperfusion injury through deubiquitinating GRP78.

Theranostics·2026
Same author

Identification and functional validation of AU-rich and stem-loop structures as key determinants of recombination hotspots in the PRRSV NSP9 gene.

The Journal of general virology·2026
Same author

The N-terminal region of pdm09/H1N1 PA synergizes with its cognate NP to enhance mammalian adaptation of avian-origin H9N2 canine influenza virus.

Veterinary microbiology·2026
Same author

A reverse genetics-based NS1-truncated live attenuated vaccine confers broad heterologous protection against swine influenza viruses.

Microbial pathogenesis·2026
Same author

The 3'Untranslated region is a critical determinant of Getah virus replication, pathogenesis, and vector competence.

Virulence·2026
Same author

Corrigendum to "Genetic determinants of cell- and egg-preferred replication in H3N2 canine influenza virus" [Vet. Microbiol. 319 (2026) 111085].

Veterinary microbiology·2026
Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026
Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026
Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026
Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026
Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026
Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026
See all related articles

Related Experiment Video

Updated: Feb 28, 2026

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

17.6K

Sparse robust discriminant analysis for high-dimensional and heavy-tailed data.

Weijian Huang1, Qing Mai2, Jing Zeng1

  • 1Faculty of Business for Science & Technology, School of Management, University of Science and Technology of China, Hefei, Anhui 230026, China.

Biometrics
|February 26, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces a robust classifier for high-dimensional medical data, accommodating both light-tailed and heavy-tailed distributions. The method improves prediction accuracy, especially for imbalanced datasets, outperforming existing techniques.

Keywords:
discriminant analysisheavy-tailednesshigh-dimensional classificationimbalanced datavariable selection

More Related Videos

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

16.4K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K

Related Experiment Videos

Last Updated: Feb 28, 2026

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

17.6K
Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

16.4K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K

Area of Science:

  • Medical data analysis
  • Statistical learning
  • Bioinformatics

Background:

  • Large-scale medical data (gene expression, MRI) are prevalent.
  • Existing sparse discriminant analysis methods assume light-tailed predictors, often violated in practice.
  • Robustness is needed for heavy-tailed medical data.

Purpose of the Study:

  • Propose a robust classifier using an elliptically contoured discriminant analysis (EDA) model.
  • Accommodate both light-tailed and heavy-tailed data distributions.
  • Improve prediction accuracy on imbalanced medical datasets.

Main Methods:

  • Developed a robust classifier under the EDA model.
  • Identified intrinsic dimension-reduction subspace for optimal prediction.
  • Proposed a high-dimensional classifier using subspace projection.
  • Utilized balanced rate for assessing prediction accuracy on imbalanced data.

Main Results:

  • The proposed EDA-based classifier accommodates heavy-tailed data.
  • Achieved superior prediction accuracy, particularly for imbalanced datasets.
  • Demonstrated consistency in subspace estimation, variable selection, and prediction accuracy.
  • Empirical results on synthetic and real medical data (lung cancer, leukemia) show superiority over state-of-the-art methods.

Conclusions:

  • The proposed robust EDA classifier effectively handles high-dimensional, potentially heavy-tailed medical data.
  • Subspace identification and projection offer a powerful approach for robust classification.
  • The method provides a more accurate and reliable tool for medical data analysis.