Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Assumptions of Survival Analysis

Assumptions of Survival Analysis

Survival models analyze the time until one or more events occur, such as death in biological organisms or failure in mechanical systems. These models are widely used across fields like medicine, biology, engineering, and public health to study time-to-event phenomena. To ensure accurate results, survival analysis relies on key assumptions and careful study design.

Truncation in Survival Analysis

Truncation in Survival Analysis

Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

[Catalytic synthesis of dihydroavenanthramide D by lipase RWL].

Sheng wu gong cheng xue bao = Chinese journal of biotechnology·2026

Same author

Non-interruptive decision support to increase appropriate screening for primary hyperparathyroidism.

American journal of surgery·2026

Same author

Early Pregnancy Blood Pressure Trajectory Groups Predict Hypertensive Disorders of Pregnancy.

JACC. Advances·2026

Same author

Case Report: A rare co-occurrence of IgA pemphigus and pyoderma gangrenosum associated with IgA-κ type monoclonal gammopathy of undetermined significance: a 19-year diagnostic and therapeutic journey.

Frontiers in immunology·2026

Same author

Association of sarcopenia with the long-term risk of overall infections and infectious diseases: a prospective cohort study of 458 332 participants.

MedScience·2026

Same author

Association between serum meteorin-like protein and metabolic dysfunction-associated steatotic liver disease in patients with type 2 diabetes mellitus: a cross-sectional study.

Endocrine connections·2026

Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026

Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026

Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026

Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026

Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 2, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Inference for the Case Probability in High-dimensional Logistic Regression.

Zijian Guo¹, Prabrisha Rakshit¹, Daniel S Herman²

¹Department of Statistics, Rutgers University, Piscataway, New Jersey, USA.

Journal of Machine Learning Research : JMLR

|August 8, 2022

Summary

This summary is machine-generated.

This study introduces a new bias-corrected method for estimating case probability in electronic health records. This improves statistical inference for patient case-control labeling using high-dimensional data.

Keywords:

Case-control Contraction principle EHR phenotyping Outcome labelling Re-weighting

More Related Videos

Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS

Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS

Published on: July 30, 2020

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Related Experiment Videos

Last Updated: Sep 2, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS

Tactile Semiautomatic Passive-Finger Angle Stimulator TSPAS

Published on: July 30, 2020

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Area of Science:

Biostatistics
Health Informatics
Machine Learning

Background:

Electronic health records (EHRs) are crucial for patient status labeling.
High-dimensional data from EHRs present challenges for prediction models.
Existing statistical inference methods for case probability are limited.

Purpose of the Study:

To develop a novel bias-corrected estimator for case probability.
To enable valid statistical inference in high-dimensional sparse logistic regression models for EHR data.
To improve patient case-control labeling accuracy.

Main Methods:

Proposed a bias-corrected estimator for case probability.
Utilized linearization and variance enhancement techniques.
Established asymptotic normality for the estimator in high dimensions.
Developed confidence intervals and hypothesis testing procedures.

Main Results:

Demonstrated the validity of the proposed estimator through extensive simulation studies.
Successfully applied the method to real-world electronic health record data.
The novel estimator provides accurate case probability estimation.

Conclusions:

The proposed method offers a robust solution for statistical inference of case probability in high-dimensional EHR data.
This advancement facilitates more reliable patient case-control labeling.
The techniques enhance the utility of prediction models in healthcare.