Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Correlation and Regression

Correlation and Regression

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Microsoft Excel: Regression Analysis

Microsoft Excel: Regression Analysis

Regression analysis in Microsoft Excel is a powerful statistical method for examining the relationship between a dependent variable and one or more independent variables. It's used extensively in fields such as economics, biology, and business to predict outcomes, understand relationships, and make data-driven decisions. The most common type is linear regression, which attempts to fit a straight line through the data points to model the relationship between variables.
To perform regression...

Optimal Foraging

Optimal Foraging

How animals obtain and eat their food is called foraging behavior. Foraging can include searching for plants and hunting for prey and depends on the species and environment.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Findings of a videofluoroscopic swallowing study in patients with dysphagia.

Frontiers in neurology·2023

Same author

Analysis of changes in the spatiotemporal characteristics of impervious surfaces and their influencing factors in the Central Plains Urban Agglomeration of China from 2000 to 2018.

Heliyon·2023

Same author

Nanoparticle-mediated synergistic anticancer effect of ferroptosis and photodynamic therapy: Novel insights and perspectives.

Asian journal of pharmaceutical sciences·2023

Same author

Review on Processing Methods of Toxic Chinese Materia Medica and the Related Mechanisms of Action.

The American journal of Chinese medicine·2023

Same author

Pro‑angiogenic activity of salvianolate and its potential therapeutic effect against acute cerebral ischemia.

Experimental and therapeutic medicine·2023

Same author

CXCL12-CXCR4/CXCR7 Axis in Cancer: from Mechanisms to Clinical Applications.

International journal of biological sciences·2023

Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 7, 2026

Optimizing Sample Preparation for Cryogenic Electron Microscopy

Optimizing Sample Preparation for Cryogenic Electron Microscopy

Published on: April 11, 2025

Optimal Subsampling for Large Sample Logistic Regression.

HaiYing Wang¹, Rong Zhu², Ping Ma³

¹Department of Mathematics and Statistics, University of New Hampshire, Durham, NH 03824.

Journal of the American Statistical Association

|August 7, 2018

Summary

This summary is machine-generated.

This study introduces fast subsampling algorithms for logistic regression, enabling efficient approximation of maximum likelihood estimates for large datasets. The proposed methods significantly reduce computational burden while maintaining statistical accuracy.

Keywords:

A-optimality Logistic Regression Massive Data Optimal Subsampling Rare Event

More Related Videos

Test Samples for Optimizing STORM Super-Resolution Microscopy

Test Samples for Optimizing STORM Super-Resolution Microscopy

Published on: September 6, 2013

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

Related Experiment Videos

Last Updated: Feb 7, 2026

Optimizing Sample Preparation for Cryogenic Electron Microscopy

Optimizing Sample Preparation for Cryogenic Electron Microscopy

Published on: April 11, 2025

Test Samples for Optimizing STORM Super-Resolution Microscopy

Test Samples for Optimizing STORM Super-Resolution Microscopy

Published on: September 6, 2013

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

Area of Science:

Statistics
Machine Learning
Computational Statistics

Background:

Subsampling algorithms are crucial for managing massive datasets and reducing computational load.
Current research primarily focuses on approximating ordinary least squares estimates in linear regression, often using statistical leverage scores for subsampling probabilities.

Purpose of the Study:

To develop and evaluate fast subsampling algorithms for efficiently approximating maximum likelihood estimates in logistic regression.
To establish theoretical guarantees for the proposed subsampling methods in terms of consistency and asymptotic normality.

Main Methods:

Derivation of optimal subsampling probabilities to minimize asymptotic mean squared error.
Development of a two-step algorithm to approximate optimal subsampling probabilities, addressing their dependence on full data estimates.
Theoretical analysis establishing consistency and asymptotic normality for estimators obtained through general and two-step subsampling algorithms.

Main Results:

The proposed subsampling algorithms efficiently approximate maximum likelihood estimates in logistic regression.
The two-step algorithm offers significant computational time reduction compared to full data approaches.
Theoretical consistency and asymptotic normality are established for the proposed methods.

Conclusions:

Fast subsampling algorithms provide an efficient and statistically sound approach for logistic regression with massive data.
The developed two-step algorithm effectively approximates optimal subsampling strategies, balancing computational efficiency and accuracy.
Empirical evaluations on synthetic and real data confirm the practical utility of the proposed methods.