Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

9.8K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
9.8K
Regression Toward the Mean01:52

Regression Toward the Mean

7.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
7.3K
Prediction Intervals01:03

Prediction Intervals

3.5K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
3.5K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.8K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.8K
Regression Analysis01:11

Regression Analysis

8.9K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
8.9K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Risk factors underlying brain structure change rate in cognitive decline: Results from genomewide and phenomewide investigations.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2026
Same author

Limited overlap between genetic effects on disease susceptibility and disease survival.

Nature genetics·2025
Same author

Acute gastrointestinal and post-acute COVID-19 gastrointestinal syndrome assessment on the Gastrointestinal Symptom Rating Scale scoring system: A questionnaire-based survey.

Journal of family medicine and primary care·2025
Same author

Neuropathology-based approach reveals novel Alzheimer's Disease genes and highlights female-specific pathways and causal links to disrupted lipid metabolism: insights into a vicious cycle.

Acta neuropathologica communications·2025
Same author

Matrix sketching framework for linear mixed models in association studies.

Genome research·2024
Same author

Multiomic approach and Mendelian randomization analysis identify causal associations between blood biomarkers and subcortical brain structure volumes.

NeuroImage·2023
Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026
Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026
Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026
Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026
Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026
See all related articles

Related Experiment Video

Updated: Mar 25, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Feature Selection for Ridge Regression with Provable Guarantees.

Saurabh Paul1, Petros Drineas2

  • 1Global Risk Sciences, Paypal, San Jose, CA 95112, U.S.A. saupaul@paypal.com.

Neural Computation
|February 19, 2016
PubMed
Summary
This summary is machine-generated.

We developed new feature selection methods for regularized least-squares classification and ridge regression. These unsupervised techniques offer theoretical guarantees and demonstrate superior performance over existing methods in experiments.

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.4K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.7K

Related Experiment Videos

Last Updated: Mar 25, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.4K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.7K

Area of Science:

  • Machine Learning
  • Statistical Learning Theory

Background:

  • Feature selection is crucial for efficient and generalizable machine learning models.
  • Regularized least-squares classification and ridge regression are widely used for regression and classification tasks.

Purpose of the Study:

  • To introduce novel unsupervised feature selection methods for regularized least-squares classification and ridge regression.
  • To provide theoretical guarantees on the generalization performance of the selected features.
  • To compare the proposed methods with existing feature selection techniques.

Main Methods:

  • Single-set spectral sparsification: a deterministic sampling-based feature selection for regularized least-squares classification.
  • Leverage-score sampling: an unsupervised randomized feature selection for ridge regression.
  • Derivation of risk bounds in the fixed design setting for both methods.

Main Results:

  • Both single-set spectral sparsification and leverage-score sampling provide worst-case generalization guarantees.
  • The risk in the sampled feature space is comparable to the risk in the full-feature space.
  • Experimental results on synthetic and real-world datasets (TechTC-300) show superior performance compared to existing methods.

Conclusions:

  • The proposed feature selection methods are effective for regularized least-squares classification and ridge regression.
  • These unsupervised methods offer theoretical advantages and practical improvements in performance.
  • The findings support the use of spectral sparsification and leverage-score sampling for efficient and accurate machine learning.