Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Regression Toward the Mean01:52

Regression Toward the Mean

6.5K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.5K
Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

7.8K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
7.8K
Multiple Regression01:25

Multiple Regression

3.2K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.2K
Regression Analysis01:11

Regression Analysis

6.0K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
6.0K
Improving Translational Accuracy02:07

Improving Translational Accuracy

11.9K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.9K
Calibration Curves: Linear Least Squares01:20

Calibration Curves: Linear Least Squares

2.1K
A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...
2.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Growth Conditions Control the Elastic and Electrical Properties of ZnO Nanowires.

Nano letters·2015
Same author

Evaluation of the basic functions of six calcium-dependent protein kinases in Toxoplasma gondii using CRISPR-Cas9 system.

Parasitology research·2015
Same author

Genome-wide DNA binding pattern of two-component system response regulator RhpR in Pseudomonas syringae.

Genomics data·2015
Same author

Antiaging Gene Klotho Regulates Adrenal CYP11B2 Expression and Aldosterone Synthesis.

Journal of the American Society of Nephrology : JASN·2015
Same author

[TREATMENT OF FIRST METATARSAL DIAPHYSIS COMMINUTED FRACTURES WITH MINI-PLATE VIA MEDIAL APPROACH].

Zhongguo xiu fu chong jian wai ke za zhi = Zhongguo xiufu chongjian waike zazhi = Chinese journal of reparative and reconstructive surgery·2015
Same author

Metformin Use Is Associated With Better Survival of Breast Cancer Patients With Diabetes: A Meta-Analysis.

The oncologist·2015
Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026
Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026
Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026
Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026
Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026
Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026
See all related articles

Related Experiment Video

Updated: Sep 10, 2025

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

16.2K

Semi-supervised linear regression: enhancing efficiency and robustness in high dimensions.

Kai Chen1, Yuqian Zhang1

  • 1Institute of Statistics and Big Data, Renmin University of China, Beijing 100872, China.

Biometrics
|August 27, 2025
PubMed
Summary
This summary is machine-generated.

This study demonstrates that unlabeled data in semi-supervised learning significantly enhances parameter estimation accuracy, even for correctly specified linear models in high-dimensional settings. These findings challenge existing assumptions and offer improved methods for regression analysis.

Keywords:
debiased Lassohigh-dimensional linear modelsnon-sparse modelssemi-supervised learningstatistical inference

More Related Videos

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression
06:50

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

6.7K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

Related Experiment Videos

Last Updated: Sep 10, 2025

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

16.2K
O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression
06:50

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

6.7K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K

Area of Science:

  • Machine Learning
  • Statistics
  • Statistical Learning Theory

Background:

  • Current understanding in semi-supervised learning posits that unlabeled data benefits linear parameter estimation solely under model misspecification.
  • This paradigm is challenged in high-dimensional statistical settings where unlabeled data can offer advantages even for correctly specified models.

Purpose of the Study:

  • To challenge the prevailing understanding of unlabeled data utility in semi-supervised learning.
  • To demonstrate the benefits of incorporating unlabeled samples in high-dimensional settings for linear parameter estimation.
  • To develop robust and efficient semi-supervised estimators for regression coefficients.

Main Methods:

  • Development of robust semi-supervised estimators for regression coefficients, initially focusing on dense scenarios without assuming sparse population slopes.
  • Extension of methods for enhanced efficiency in sparse linear slope scenarios.
  • Extensive numerical studies to validate the performance of the proposed semi-supervised methods.

Main Results:

  • Demonstrated that additional unlabeled samples improve estimation accuracy for linear parameters in high-dimensional settings, contrary to prior beliefs.
  • Showcased that leveraging unlabeled data reduces estimation bias and enhances inference robustness, even when the true model is linear.
  • Proposed novel semi-supervised methods offering improved efficiency, particularly in sparse linear slope scenarios.

Conclusions:

  • Unlabeled data provides significant benefits for parameter estimation in semi-supervised learning within high-dimensional contexts, irrespective of model specification.
  • The developed robust semi-supervised estimators effectively reduce bias and improve accuracy and robustness in regression analysis.
  • The proposed methods offer practical advancements for utilizing unlabeled data in statistical modeling.