Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multiple Regression01:25

Multiple Regression

3.1K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.1K
Regression Analysis01:11

Regression Analysis

6.0K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
6.0K
Assumptions of Survival Analysis01:15

Assumptions of Survival Analysis

182
Survival models analyze the time until one or more events occur, such as death in biological organisms or failure in mechanical systems. These models are widely used across fields like medicine, biology, engineering, and public health to study time-to-event phenomena. To ensure accurate results, survival analysis relies on key assumptions and careful study design.
182
Truncation in Survival Analysis01:09

Truncation in Survival Analysis

289
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
289
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

194
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
194
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

[Catalytic synthesis of dihydroavenanthramide D by lipase RWL].

Sheng wu gong cheng xue bao = Chinese journal of biotechnology·2026
Same author

Non-interruptive decision support to increase appropriate screening for primary hyperparathyroidism.

American journal of surgery·2026
Same author

Early Pregnancy Blood Pressure Trajectory Groups Predict Hypertensive Disorders of Pregnancy.

JACC. Advances·2026
Same author

Case Report: A rare co-occurrence of IgA pemphigus and pyoderma gangrenosum associated with IgA-κ type monoclonal gammopathy of undetermined significance: a 19-year diagnostic and therapeutic journey.

Frontiers in immunology·2026
Same author

Association of sarcopenia with the long-term risk of overall infections and infectious diseases: a prospective cohort study of 458 332 participants.

MedScience·2026
Same author

Association between serum meteorin-like protein and metabolic dysfunction-associated steatotic liver disease in patients with type 2 diabetes mellitus: a cross-sectional study.

Endocrine connections·2026
Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026
Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026
Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026
Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026
Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026
See all related articles

Related Experiment Video

Updated: Sep 2, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K

Inference for the Case Probability in High-dimensional Logistic Regression.

Zijian Guo1, Prabrisha Rakshit1, Daniel S Herman2

  • 1Department of Statistics, Rutgers University, Piscataway, New Jersey, USA.

Journal of Machine Learning Research : JMLR
|August 8, 2022
PubMed
Summary
This summary is machine-generated.

This study introduces a new bias-corrected method for estimating case probability in electronic health records. This improves statistical inference for patient case-control labeling using high-dimensional data.

Keywords:
Case-controlContraction principleEHR phenotypingOutcome labellingRe-weighting

More Related Videos

Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS
04:40

Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS

Published on: July 30, 2020

3.0K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.4K

Related Experiment Videos

Last Updated: Sep 2, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K
Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS
04:40

Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS

Published on: July 30, 2020

3.0K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.4K

Area of Science:

  • Biostatistics
  • Health Informatics
  • Machine Learning

Background:

  • Electronic health records (EHRs) are crucial for patient status labeling.
  • High-dimensional data from EHRs present challenges for prediction models.
  • Existing statistical inference methods for case probability are limited.

Purpose of the Study:

  • To develop a novel bias-corrected estimator for case probability.
  • To enable valid statistical inference in high-dimensional sparse logistic regression models for EHR data.
  • To improve patient case-control labeling accuracy.

Main Methods:

  • Proposed a bias-corrected estimator for case probability.
  • Utilized linearization and variance enhancement techniques.
  • Established asymptotic normality for the estimator in high dimensions.
  • Developed confidence intervals and hypothesis testing procedures.

Main Results:

  • Demonstrated the validity of the proposed estimator through extensive simulation studies.
  • Successfully applied the method to real-world electronic health record data.
  • The novel estimator provides accurate case probability estimation.

Conclusions:

  • The proposed method offers a robust solution for statistical inference of case probability in high-dimensional EHR data.
  • This advancement facilitates more reliable patient case-control labeling.
  • The techniques enhance the utility of prediction models in healthcare.