Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Steps in Outbreak Investigation

Steps in Outbreak Investigation

In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Association of RNA m<sup>7</sup>G Modification Gene Polymorphisms with Pediatric Glioma Risk.

BioMed research international·2023

Same author

Treatment of sewage sludge hydrothermal carbonization aqueous phase by Fe(II)/CaO<sub>2</sub> system: Oxidation behaviors and mechanism of organic compounds.

Waste management (New York, N.Y.)·2023

Same author

TeCD: The eccDNA Collection Database for extrachromosomal circular DNA.

BMC genomics·2023

Same author

Genetic variants in m5C modification core genes are associated with the risk of Chinese pediatric acute lymphoblastic leukemia: A five-center case-control study.

Frontiers in oncology·2023

Same author

New association between splicing factor-coding gene polymorphisms and the risk of acute lymphoblastic leukemia in southern Chinese children: A five-center case-control study.

The journal of gene medicine·2023

Same author

The Mef2c/AdipoR1 axis is responsible for myogenic differentiation and is regulated by resistin in skeletal muscles.

Gene·2023

Same journal

Combination Chemotherapy Optimization with Discrete Dosing.

INFORMS journal on computing·2024

Same journal

A High-Fidelity Model to Predict Length-of-Stay in the Neonatal Intensive Care Unit (NICU).

INFORMS journal on computing·2022

Same journal

Supervised t-distributed stochastic neighbor embedding for data visualization and classification.

INFORMS journal on computing·2021

Same journal

Palindromes in SARS and Other Coronaviruses.

INFORMS journal on computing·2014

Same journal

Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction.

INFORMS journal on computing·2010

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 19, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Predictive Analytics with Strategically Missing Data.

Juheng Zhang¹, Xiaoping Liu², Xiao-Bai Li¹

¹Department of Operations and Information Systems, University of Massachusetts, Lowell, Massachusetts 01854.

INFORMS Journal on Computing

|September 27, 2021

Summary

This summary is machine-generated.

This study introduces a new method to handle missing data in predictive analytics. Our approach uses Support Vector Regression to accurately impute missing values, encouraging honest data disclosure.

Keywords:

business analytics data manipulation information disclosure strategic learning support vector regression

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: Oct 19, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Area of Science:

Data Science
Machine Learning
Predictive Analytics

Background:

Real-world data often has strategically missing values due to intentional concealment by data providers.
This strategic data omission occurs in various domains like finance, admissions, and marketing, impacting decision-making.
Existing methods struggle to address the incentive problem behind strategically missing data.

Purpose of the Study:

To develop a novel approach for handling strategically missing data in regression prediction.
To create a mechanism that incentivizes data providers to disclose truthful information.
To minimize imputation errors for missing values in predictive models.

Main Methods:

Utilizing Support Vector Regression (SVR) models to derive imputation values for missing data.
Developing a framework that aligns data provider incentives with accurate data disclosure.
Applying the proposed method to real-world datasets for validation.

Main Results:

The proposed method effectively imputes strategically missing data.
Support Vector Regression models are leveraged for accurate imputation.
Imputation errors are minimized under specific conditions, as demonstrated by experiments.
The approach incentivizes data providers to reveal true information, improving data quality.

Conclusions:

The novel approach effectively addresses strategically missing data problems in predictive analytics.
Support Vector Regression provides a robust foundation for imputing missing values.
The method offers a practical solution for decision-makers facing data concealment.
Experimental validation confirms the approach's effectiveness on real-world data.