Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Wilcoxon Signed-Ranks Test for Matched Pairs01:09

Wilcoxon Signed-Ranks Test for Matched Pairs

219
The Wilcoxon signed-rank test for matched pairs evaluates the null hypothesis by combining the ranks of differences with their signs. It essentially tests whether the median of the differences in a population of matched pairs is zero. Since the test incorporates more information than the sign test, it generally yields more trustable conclusions. This test also does not require the data to follow a normal distribution, but two conditions must be met for it to be applicable: (1) the data must...
219
Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test01:09

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

1.8K
In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...
1.8K
Data Validation01:15

Data Validation

234
Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:
234
Correlation of Experimental Data01:23

Correlation of Experimental Data

269
Dimensional analysis simplifies complex physical problems and guides experimental investigations, but it does not provide complete solutions. It identifies the dimensionless groups that influence a phenomenon, but experimental data is needed to establish the specific relationships and validate theoretical predictions.
For example, a spherical particle moving through a viscous fluid experiences drag. Dimensional analysis shows that the drag force depends on the particle's diameter, velocity,...
269
Statgraphics01:10

Statgraphics

192
Statgraphics is a comprehensive statistical software suite designed for both basic and advanced data analysis. Originating in 1980 at Princeton University under Dr. Neil W. Polhemus, it was one of the pioneering tools for statistical computing on personal computers, with its public release in 1982 marking an early milestone in data science software. Over the years, it has evolved into a robust platform for data science, offering tools for regression analysis, ANOVA, multivariate statistics,...
192
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

7.3K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
7.3K
  1. Home
  2. Research Domains
  3. Information And Computing Sciences
  4. Data Management And Data Science
  5. Query Processing And Optimisation
  6. Evaluating The Utility Of Data Integration With Synthetic Data And Statistical Matching.
  1. Home
  2. Research Domains
  3. Information And Computing Sciences
  4. Data Management And Data Science
  5. Query Processing And Optimisation
  6. Evaluating The Utility Of Data Integration With Synthetic Data And Statistical Matching.

Related Experiment Video

A Data Integration Workflow to Identify Drug Combinations Targeting Synthetic Lethal Interactions
07:40

A Data Integration Workflow to Identify Drug Combinations Targeting Synthetic Lethal Interactions

Published on: May 27, 2021

4.3K

Evaluating the utility of data integration with synthetic data and statistical matching.

Eunjeong Ji1, Jung Hun Ohn2, Hyemin Jo2,3

  • 1Division of Statistics, Medical Research Collaborating Center, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do, 13620, South Korea.

Scientific Reports
|September 1, 2025

View abstract on PubMed

Summary
This summary is machine-generated.

Synthetic data integration can maintain data utility while reducing privacy risks. This study found that using synthetic data for statistical matching offers comparable accuracy to real data, especially when integrating datasets.

Keywords:
Biomedical dataStatistical matchingSynthetic data

More Related Videos

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons
07:59

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

1.5K
Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

7.0K

Related Experiment Videos

A Data Integration Workflow to Identify Drug Combinations Targeting Synthetic Lethal Interactions
07:40

A Data Integration Workflow to Identify Drug Combinations Targeting Synthetic Lethal Interactions

Published on: May 27, 2021

4.3K
Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons
07:59

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

1.5K
Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

7.0K

Area of Science:

  • Bioinformatics
  • Data Science
  • Epidemiology

Background:

  • Data integration improves dataset utility but poses privacy risks.
  • Synthetic data is a potential privacy-preserving solution.
  • The role of synthetic data in data integration requires further investigation.

Purpose of the Study:

  • To assess synthetic data integration's impact on data utility.
  • To evaluate varying common variables in statistical matching.
  • To explore synthetic-real dataset combinations in donor-recipient scenarios.

Main Methods:

  • Utilized the Korean Genome and Epidemiology Study (KoGES) cohort.
  • Generated multiple synthetic datasets with varied common variables.
  • Performed statistical matching using the nearest-neighbor hotdeck method.

Main Results:

  • All-available matched synthetic data generally outperformed other conditions.
  • Clinically relevant matching variables sometimes showed equivalent performance.
  • Synthetic data demonstrated comparable model accuracy to real data.

Conclusions:

  • Statistically matched synthetic data offers utility comparable to real data.
  • This approach can reduce privacy risks while preserving data utility.
  • Further research is needed to fully understand performance differences.