Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.0K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.0K
Censoring Survival Data01:09

Censoring Survival Data

57
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
57
Data: Types and Distribution01:19

Data: Types and Distribution

679
In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...
679
Sampling Distribution01:12

Sampling Distribution

12.3K
Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...
12.3K
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

7.6K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
7.6K
Choosing Between z and t Distribution01:25

Choosing Between z and t Distribution

2.7K
The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...
2.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Privacy-preserving verification of preprocessing in federated learning for genomic data.

JAMIA open·2026
Same author

Sustainable Personalized Home Care for Pandemic Management: A Service-Oriented Approach.

Digital government (New York, N.Y.)·2026
Same author

Semantically Correct Policy Mining and Enforcement for Attribute based Access Control.

ACM transactions on Internet technology·2026
Same author

Performance Analysis of Dynamic ABAC Systems using a Queuing Theoretic Framework.

Computers & security·2026
Same author

Privacy-Preserving Verification of ML Preprocessing via Model Behavior Indicators.

IEEE transactions on privacy·2026
Same author

MALITE: Lightweight Malware Detection and Classification for Constrained Devices.

IEEE transactions on emerging topics in computing·2025
Same journal

MedAssist: LLM-Empowered Medical Assistant for Assisting the Scrutinization and Comprehension of Electronic Health Records.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2026
Same journal

Bridging the Scientific Knowledge Gap and Reproducibility: A Survey of Provenance, Assertion and Evidence Ontologies.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2025
Same journal

Uncertainty-Aware Pre-Trained Foundation Models for Patient Risk Prediction via Gaussian Process.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2025
Same journal

DPAR: Decoupled Graph Neural Networks with Node-Level Differential Privacy.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2024
Same journal

Federated Node Classification over Graphs with Latent Link-type Heterogeneity.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2024
Same journal

Application of an ontology for model cards to generate computable artifacts for linking machine learning information from biomedical research.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2024
See all related articles
  1. Home
  2. Preserving Missing Data Distribution In Synthetic Data.
  1. Home
  2. Preserving Missing Data Distribution In Synthetic Data.

Related Experiment Video

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan
11:58

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan

Published on: June 29, 2018

9.0K

Preserving Missing Data Distribution in Synthetic Data.

Xinyue Wang1, Hafiz Asif1, Jaideep Vaidya1

  • 1Rutgers University, Newark, USA.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference
|January 28, 2025

View abstract on PubMed

Summary
This summary is machine-generated.

This study introduces novel methods for generating synthetic data that retain the informational value of missing data points. This approach enhances privacy-preserving data analysis by preserving crucial missing data distributions.

Keywords:
GANMissing DataPrivacySynthetic Data Generation

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.4K
Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans
09:23

Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans

Published on: August 16, 2017

8.0K

Related Experiment Videos

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan
11:58

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan

Published on: June 29, 2018

9.0K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.4K
Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans
09:23

Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans

Published on: August 16, 2017

8.0K

Area of Science:

  • Computer Science
  • Data Science
  • Statistics

Background:

  • Web data is often sensitive and requires privacy-preserving methods for analysis.
  • Synthetic data generation is a key technique for protecting sensitive information.
  • Missing data in web artifacts contains valuable information often lost during traditional data preprocessing.

Purpose of the Study:

  • To develop and evaluate methods for generating synthetic data that preserve both observable and missing data distributions.
  • To address the loss of information inherent in imputation or deletion of missing data before synthetic data generation.

Main Methods:

  • Proposed novel methods for synthetic data generation.
  • Focused on preserving the distribution of both observed and missing data.
  • Conducted extensive empirical evaluations on fabricated and real-world datasets.
  • Main Results:

    • Demonstrated the effectiveness of the proposed methods in preserving missing data distributions.
    • Showcased the ability of synthetic data to retain informational content from missingness.
    • Empirical evaluations confirmed the utility of the approach across various datasets.

    Conclusions:

    • The proposed methods offer a significant advancement in privacy-preserving synthetic data generation.
    • Preserving missing data distributions is crucial for maintaining data utility in sensitive web data analysis.
    • This approach enables more robust and informative data analysis from web artifacts.