Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Data: Types and Distribution

Data: Types and Distribution

In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...

Sampling Distribution

Sampling Distribution

Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Privacy-preserving verification of preprocessing in federated learning for genomic data.

JAMIA open·2026

Same author

Sustainable Personalized Home Care for Pandemic Management: A Service-Oriented Approach.

Digital government (New York, N.Y.)·2026

Same author

Semantically Correct Policy Mining and Enforcement for Attribute based Access Control.

ACM transactions on Internet technology·2026

Same author

Performance Analysis of Dynamic ABAC Systems using a Queuing Theoretic Framework.

Computers & security·2026

Same author

Privacy-Preserving Verification of ML Preprocessing via Model Behavior Indicators.

IEEE transactions on privacy·2026

Same author

MALITE: Lightweight Malware Detection and Classification for Constrained Devices.

IEEE transactions on emerging topics in computing·2025

Same journal

MedAssist: LLM-Empowered Medical Assistant for Assisting the Scrutinization and Comprehension of Electronic Health Records.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2026

Same journal

Bridging the Scientific Knowledge Gap and Reproducibility: A Survey of Provenance, Assertion and Evidence Ontologies.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2025

Same journal

Uncertainty-Aware Pre-Trained Foundation Models for Patient Risk Prediction via Gaussian Process.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2025

Same journal

DPAR: Decoupled Graph Neural Networks with Node-Level Differential Privacy.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2024

Same journal

Federated Node Classification over Graphs with Latent Link-type Heterogeneity.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2024

Same journal

Application of an ontology for model cards to generate computable artifacts for linking machine learning information from biomedical research.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference·2024

查看所有相关文章

Search research articles

相关实验视频

Updated: May 30, 2025

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan

Published on: June 29, 2018

在合成数据中保存缺失的数据分布.

Xinyue Wang¹, Hafiz Asif¹, Jaideep Vaidya¹

¹Rutgers University, Newark, USA.

Proceedings of the ... International World-Wide Web Conference. International WWW Conference

|January 28, 2025

概括

此摘要是机器生成的。

本研究引入了用于生成合成数据的新方法,这些数据保留了缺失数据点的信息价值. 这种方法通过保留关键的缺失数据分布来增强保护隐私的数据分析.

关键词:

没有了,没有了,没有了.缺失的数据缺失的数据隐私隐私隐私隐私隐私隐私合成数据生成合成数据生成

更多相关视频

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans

Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans

Published on: August 16, 2017

相关实验视频

Last Updated: May 30, 2025

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan

The Replica Set Method: A High-throughput Approach to Quantitatively Measure Caenorhabditis elegans Lifespan

Published on: June 29, 2018

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans

Quantification of Information Encoded by Gene Expression Levels During Lifespan Modulation Under Broad-range Dietary Restriction in C. elegans

Published on: August 16, 2017

科学领域:

计算机科学计算机科学
数据科学数据科学数据科学
统计统计统计统计

背景情况:

网络数据通常是敏感的,需要保护隐私的分析方法.
合成数据生成是保护敏感信息的关键技术.
网页文物中缺失的数据包含有价值的信息,通常在传统的数据预处理过程中丢失.

研究的目的:

开发和评估生成可观测和缺失数据分布的合成数据的方法.
在合成数据生成之前,解决与归算或删除缺失数据相关的信息丢失问题.

主要方法:

提出了用于合成数据生成的新方法.
专注于保持观察和缺失数据的分布.
对虚构的和真实的数据集进行了广泛的经验评估.

主要成果:

证明了拟议方法在保存缺失数据分布方面的有效性.
展示了合成数据保留信息内容从缺失的能力.
经验评估证实了这种方法在各种数据集中的实用性.

结论:

拟议的方法在保护隐私的合成数据生成方面取得了重大进展.
保存缺失的数据分布对于保持敏感网络数据分析中的数据实用性至关重要.
这种方法使得来自网络文物的数据分析更强大,更具信息性.