Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Regression Toward the Mean01:52

Regression Toward the Mean

7.1K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
7.1K
Multiple Regression01:25

Multiple Regression

4.0K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
4.0K
Correlation and Regression00:53

Correlation and Regression

3.5K
In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...
3.5K
Regression Analysis01:11

Regression Analysis

8.4K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
8.4K
Microsoft Excel: Regression Analysis01:18

Microsoft Excel: Regression Analysis

1.6K
Regression analysis in Microsoft Excel is a powerful statistical method for examining the relationship between a dependent variable and one or more independent variables. It's used extensively in fields such as economics, biology, and business to predict outcomes, understand relationships, and make data-driven decisions. The most common type is linear regression, which attempts to fit a straight line through the data points to model the relationship between variables.
To perform regression...
1.6K
Optimal Foraging00:48

Optimal Foraging

13.9K
How animals obtain and eat their food is called foraging behavior. Foraging can include searching for plants and hunting for prey and depends on the species and environment.
13.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Findings of a videofluoroscopic swallowing study in patients with dysphagia.

Frontiers in neurology·2023
Same author

Analysis of changes in the spatiotemporal characteristics of impervious surfaces and their influencing factors in the Central Plains Urban Agglomeration of China from 2000 to 2018.

Heliyon·2023
Same author

Nanoparticle-mediated synergistic anticancer effect of ferroptosis and photodynamic therapy: Novel insights and perspectives.

Asian journal of pharmaceutical sciences·2023
Same author

Review on Processing Methods of Toxic Chinese Materia Medica and the Related Mechanisms of Action.

The American journal of Chinese medicine·2023
Same author

Pro‑angiogenic activity of salvianolate and its potential therapeutic effect against acute cerebral ischemia.

Experimental and therapeutic medicine·2023
Same author

CXCL12-CXCR4/CXCR7 Axis in Cancer: from Mechanisms to Clinical Applications.

International journal of biological sciences·2023
Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026
Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026
Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026
Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026
Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026
Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026
See all related articles

Related Experiment Video

Updated: Feb 7, 2026

Optimizing Sample Preparation for Cryogenic Electron Microscopy
06:32

Optimizing Sample Preparation for Cryogenic Electron Microscopy

Published on: April 11, 2025

1.0K

Optimal Subsampling for Large Sample Logistic Regression.

HaiYing Wang1, Rong Zhu2, Ping Ma3

  • 1Department of Mathematics and Statistics, University of New Hampshire, Durham, NH 03824.

Journal of the American Statistical Association
|August 7, 2018
PubMed
Summary
This summary is machine-generated.

This study introduces fast subsampling algorithms for logistic regression, enabling efficient approximation of maximum likelihood estimates for large datasets. The proposed methods significantly reduce computational burden while maintaining statistical accuracy.

Keywords:
A-optimalityLogistic RegressionMassive DataOptimal SubsamplingRare Event

More Related Videos

Test Samples for Optimizing STORM Super-Resolution Microscopy
16:52

Test Samples for Optimizing STORM Super-Resolution Microscopy

Published on: September 6, 2013

31.6K
Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.8K

Related Experiment Videos

Last Updated: Feb 7, 2026

Optimizing Sample Preparation for Cryogenic Electron Microscopy
06:32

Optimizing Sample Preparation for Cryogenic Electron Microscopy

Published on: April 11, 2025

1.0K
Test Samples for Optimizing STORM Super-Resolution Microscopy
16:52

Test Samples for Optimizing STORM Super-Resolution Microscopy

Published on: September 6, 2013

31.6K
Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples
07:30

Optimization for Sequencing and Analysis of Degraded FFPE-RNA Samples

Published on: June 8, 2020

12.8K

Area of Science:

  • Statistics
  • Machine Learning
  • Computational Statistics

Background:

  • Subsampling algorithms are crucial for managing massive datasets and reducing computational load.
  • Current research primarily focuses on approximating ordinary least squares estimates in linear regression, often using statistical leverage scores for subsampling probabilities.

Purpose of the Study:

  • To develop and evaluate fast subsampling algorithms for efficiently approximating maximum likelihood estimates in logistic regression.
  • To establish theoretical guarantees for the proposed subsampling methods in terms of consistency and asymptotic normality.

Main Methods:

  • Derivation of optimal subsampling probabilities to minimize asymptotic mean squared error.
  • Development of a two-step algorithm to approximate optimal subsampling probabilities, addressing their dependence on full data estimates.
  • Theoretical analysis establishing consistency and asymptotic normality for estimators obtained through general and two-step subsampling algorithms.

Main Results:

  • The proposed subsampling algorithms efficiently approximate maximum likelihood estimates in logistic regression.
  • The two-step algorithm offers significant computational time reduction compared to full data approaches.
  • Theoretical consistency and asymptotic normality are established for the proposed methods.

Conclusions:

  • Fast subsampling algorithms provide an efficient and statistically sound approach for logistic regression with massive data.
  • The developed two-step algorithm effectively approximates optimal subsampling strategies, balancing computational efficiency and accuracy.
  • Empirical evaluations on synthetic and real data confirm the practical utility of the proposed methods.