Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.9K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.9K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.5K
Causes of Similarity-Dissimilarity Effect01:26

Causes of Similarity-Dissimilarity Effect

253
The similarity-dissimilarity effect, a fundamental concept in social psychology, explains how interpersonal similarities and differences influence attraction and social interactions. This effect is supported by three key psychological perspectives: balance theory, social comparison theory, and consensual validation.Balance Theory and Cognitive ConsistencyBalance theory, developed by Fritz Heider, posits that individuals seek cognitive consistency in their relationships. When two people share...
253
Difference from Background: Limit of Detection01:05

Difference from Background: Limit of Detection

8.0K
The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...
8.0K
Types of Errors: Detection and Minimization01:12

Types of Errors: Detection and Minimization

9.7K
Error is the deviation of the obtained result from the true, expected value or the estimated central value. Errors are expressed in absolute or relative terms.
Absolute error in a measurement is the numerical difference from the true or central value. Relative error is the ratio between absolute error and the true or central value, expressed as a percentage.
Errors can be classified by source, magnitude, and sign. There are three types of errors: systematic, random, and gross.
Systematic or...
9.7K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

7.2K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
7.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Prognostic value of peri-operative circulating tumour DNA levels estimated by cell-free DNA methylation in patients with resectable colorectal liver metastases.

EBioMedicine·2026
Same author

Signpost Testing to Navigate the Parameter Space of the Gaussian Graphical Model With High-Dimensional Data.

Biometrical journal. Biometrische Zeitschrift·2026
Same author

Informative Co-Data Learning for High-Dimensional Horseshoe Regression.

Biometrical journal. Biometrische Zeitschrift·2025
Same author

Sparse Canonical Correlation Analysis for Multiple Measurements With Latent Trajectories.

Biometrical journal. Biometrische Zeitschrift·2025
Same author

Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings.

The international journal of biostatistics·2025
Same author

Alternatives to default shrinkage methods can improve prediction accuracy, calibration, and coverage: A methods comparison study.

Statistical methods in medical research·2025
Same journal

A Causal Framework for Evaluating the Total Effect of Strategies Aiming to Expand Screening and to Improve Outcomes.

Statistics in medicine·2026
Same journal

Causal Effects on Nonterminal Event Time With Application to Antibiotic Usage and Future Resistance.

Statistics in medicine·2026
Same journal

Subgroup Analysis of Interval-censored Failure Time Data With Application to Alzheimer's Disease.

Statistics in medicine·2026
Same journal

Rejoinder to Commentaries on "A Perspective on the Appropriate Implementation of ICH E9(R1) Addendum Strategies for Handling Intercurrent Events".

Statistics in medicine·2026
Same journal

A Multi-Stage Drop-the-Loser Design With Superiority Boundaries.

Statistics in medicine·2026
Same journal

Interpretable ROI Identification in Brain Image Analysis: Overcoming CNN Black Box Challenges With Kriging-Enhanced Adaptive Sampling.

Statistics in medicine·2026
See all related articles

Related Experiment Video

Updated: Jan 14, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.7K

False Discovery Estimation in Record Linkage.

Kayané Robach1,2, Michel H Hof1,2, Mark A van de Wiel1,2

  • 1Department of Epidemiology and Data Science, Amsterdam UMC Location Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.

Statistics in Medicine
|October 17, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a novel method to estimate the false discovery proportion (FDP) in record linkage (RL) by using synthetic data. This approach enhances the reliability of linked datasets, crucial for accurate data analysis in research.

Keywords:
false discovery proportionlinkage errorrecord linkage

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K
An Integrated Workflow of Identification and Quantification on FDR Control-Based Untargeted Metabolome
05:35

An Integrated Workflow of Identification and Quantification on FDR Control-Based Untargeted Metabolome

Published on: September 20, 2022

4.2K

Related Experiment Videos

Last Updated: Jan 14, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.7K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K
An Integrated Workflow of Identification and Quantification on FDR Control-Based Untargeted Metabolome
05:35

An Integrated Workflow of Identification and Quantification on FDR Control-Based Untargeted Metabolome

Published on: September 20, 2022

4.2K

Area of Science:

  • Data Science
  • Biostatistics
  • Epidemiology

Background:

  • Integrating diverse datasets offers research advantages but lacks unique identifiers due to privacy and varied collection methods.
  • Record linkage (RL) algorithms probabilistically link records using identifying variables, but imperfect matches necessitate assessing false discoveries.
  • The false discovery proportion (FDP) is critical for validating linked data reliability in subsequent analyses.

Purpose of the Study:

  • To introduce a novel method for estimating the FDP in RL for two overlapping datasets.
  • To provide a reliable approach for assessing and improving the quality of linked data across various RL techniques and settings.
  • To highlight the importance of accounting for linkage errors in healthcare record analysis.

Main Methods:

  • A novel FDP estimation method using synthetic data generated from empirical distributions alongside real data.
  • Synthetic records, unable to link with real entities, quantify falsely linked pairs.
  • The method is applicable to all RL techniques, especially in complex scenarios with poorly discriminative variables.

Main Results:

  • The proposed method effectively estimates FDP in RL, enabling assessment and improvement of linked data reliability.
  • Evaluated performance using established RL algorithms and benchmark datasets.
  • Successfully applied to link siblings in the Netherlands Perinatal Registry, confirming its practical utility.

Conclusions:

  • The developed method provides a robust way to estimate FDP in RL, enhancing data reliability.
  • Accurate FDP estimation is vital for trustworthy research outcomes derived from linked datasets.
  • Accounting for linkage errors is essential, particularly in sensitive healthcare data studies like mother-child dynamics.