Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Calibration Curves: Linear Least Squares

Calibration Curves: Linear Least Squares

A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Growth Conditions Control the Elastic and Electrical Properties of ZnO Nanowires.

Nano letters·2015

Same author

Evaluation of the basic functions of six calcium-dependent protein kinases in Toxoplasma gondii using CRISPR-Cas9 system.

Parasitology research·2015

Same author

Genome-wide DNA binding pattern of two-component system response regulator RhpR in Pseudomonas syringae.

Genomics data·2015

Same author

Antiaging Gene Klotho Regulates Adrenal CYP11B2 Expression and Aldosterone Synthesis.

Journal of the American Society of Nephrology : JASN·2015

Same author

[TREATMENT OF FIRST METATARSAL DIAPHYSIS COMMINUTED FRACTURES WITH MINI-PLATE VIA MEDIAL APPROACH].

Zhongguo xiu fu chong jian wai ke za zhi = Zhongguo xiufu chongjian waike zazhi = Chinese journal of reparative and reconstructive surgery·2015

Same author

Metformin Use Is Associated With Better Survival of Breast Cancer Patients With Diabetes: A Meta-Analysis.

The oncologist·2015

Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026

Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026

Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026

Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026

Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026

Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 10, 2025

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions.

Kai Chen¹, Yuqian Zhang¹

¹Institute of Statistics and Big Data, Renmin University of China, Beijing 100872, China.

|August 27, 2025

Summary

This summary is machine-generated.

This study demonstrates that unlabeled data in semi-supervised learning significantly enhances parameter estimation accuracy, even for correctly specified linear models in high-dimensional settings. These findings challenge existing assumptions and offer improved methods for regression analysis.

Keywords:

debiased Lasso high-dimensional linear models non-sparse models semi-supervised learning statistical inference

More Related Videos

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Sep 10, 2025

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Machine Learning
Statistics
Statistical Learning Theory

Background:

Current understanding in semi-supervised learning posits that unlabeled data benefits linear parameter estimation solely under model misspecification.
This paradigm is challenged in high-dimensional statistical settings where unlabeled data can offer advantages even for correctly specified models.

Purpose of the Study:

To challenge the prevailing understanding of unlabeled data utility in semi-supervised learning.
To demonstrate the benefits of incorporating unlabeled samples in high-dimensional settings for linear parameter estimation.
To develop robust and efficient semi-supervised estimators for regression coefficients.

Main Methods:

Development of robust semi-supervised estimators for regression coefficients, initially focusing on dense scenarios without assuming sparse population slopes.
Extension of methods for enhanced efficiency in sparse linear slope scenarios.
Extensive numerical studies to validate the performance of the proposed semi-supervised methods.

Main Results:

Demonstrated that additional unlabeled samples improve estimation accuracy for linear parameters in high-dimensional settings, contrary to prior beliefs.
Showcased that leveraging unlabeled data reduces estimation bias and enhances inference robustness, even when the true model is linear.
Proposed novel semi-supervised methods offering improved efficiency, particularly in sparse linear slope scenarios.

Conclusions:

Unlabeled data provides significant benefits for parameter estimation in semi-supervised learning within high-dimensional contexts, irrespective of model specification.
The developed robust semi-supervised estimators effectively reduce bias and improve accuracy and robustness in regression analysis.
The proposed methods offer practical advancements for utilizing unlabeled data in statistical modeling.