Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Wilcoxon Signed-Ranks Test for Matched Pairs01:09

Wilcoxon Signed-Ranks Test for Matched Pairs

595
The Wilcoxon signed-rank test for matched pairs evaluates the null hypothesis by combining the ranks of differences with their signs. It essentially tests whether the median of the differences in a population of matched pairs is zero. Since the test incorporates more information than the sign test, it generally yields more trustable conclusions. This test also does not require the data to follow a normal distribution, but two conditions must be met for it to be applicable: (1) the data must...
595
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.9K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.9K
Sign Test for Matched Pairs01:17

Sign Test for Matched Pairs

464
The sign test for matched pairs offers a robust method for comparing two paired samples, often for the effects of an intervention in one of them. This method is very useful in situations where the underlying distribution of the data is unknown. The test compares two related samples—often pre- and post-treatment measurements on the same subjects—to determine if there are significant differences in their median values.
To conduct the sign test, we first calculate the differences in...
464
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

567
Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...
567
Identifying Statistically Significant Differences: The F-Test01:14

Identifying Statistically Significant Differences: The F-Test

4.2K
The F-test is used to compare two sample variances to each other or compare the sample variance to the population variance. It is used to decide whether an indeterminate error can explain the difference in their values. The underlying assumptions that allow the use of the F-test include the data set or sets are normally distributed, and the data sets are independent of each other. The test statistic F is calculated by dividing one variance by another. In other words, the square of one standard...
4.2K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Outcome-Assisted Multiple Imputation of Missing Treatments.

Observational studies·2026
Same author

Fully Synthetic Data for Complex Surveys.

Survey methodology·2025
Same author

Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.

Journal of the Royal Statistical Society. Series A, (Statistics in Society)·2025
Same author

Evaluating Binary Outcome Classifiers Estimated from Survey Data.

Epidemiology (Cambridge, Mass.)·2024
Same author

The association between long-term PM2.5 exposure and risk for pancreatic cancer: an application of social informatics.

American journal of epidemiology·2024
Same author

Regression-Assisted Bayesian Record Linkage for Causal Inference in Observational Studies with Covariates Spread Over Two Files.

Journal of statistical planning and inference·2024
Same journal

Neural posterior estimation on exponential random graph models: evaluating bias and implementation challenges.

Statistics and computing·2026
Same journal

Subgroup Analysis of Differential Networks with Latent Variables.

Statistics and computing·2026
Same journal

Non-negative matrix factorization algorithms generally improve topic model fits.

Statistics and computing·2026
Same journal

Approximating evidence via bounded harmonic means.

Statistics and computing·2026
Same journal

Efficient Inference in First Passage Time Models.

Statistics and computing·2026
Same journal

Accelerated inference for stochastic compartmental models with over-dispersed partial observations.

Statistics and computing·2026
See all related articles
  1. Home
  2. Optimal F-score Matching For Bipartite Record Linkage.
  1. Home
  2. Optimal F-score Matching For Bipartite Record Linkage.

Related Experiment Video

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'
10:31

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

Published on: February 10, 2017

11.7K

Optimal F-score Matching for Bipartite Record Linkage.

Eric A Bai1, Olivier Binette2, Jerome P Reiter2

  • 1Duke University, Department of Electrical and Computer Engineering, Durham, NC, USA.

Statistics and Computing
|March 30, 2026

View abstract on PubMed

Summary
This summary is machine-generated.

This study introduces a new estimator to improve probabilistic record linkage accuracy. The novel approach maximizes the expected F-score, enhancing the matching of records between files with potential errors.

Keywords:
BayesianClusteringEntity resolutionFusion

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K
Combined Immunofluorescence and DNA FISH on 3D-preserved Interphase Nuclei to Study Changes in 3D Nuclear Organization
13:55

Combined Immunofluorescence and DNA FISH on 3D-preserved Interphase Nuclei to Study Changes in 3D Nuclear Organization

Published on: February 3, 2013

19.2K

Related Experiment Videos

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'
10:31

A Visual Guide to Sorting Electrophysiological Recordings Using 'SpikeSorter'

Published on: February 10, 2017

11.7K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K
Combined Immunofluorescence and DNA FISH on 3D-preserved Interphase Nuclei to Study Changes in 3D Nuclear Organization
13:55

Combined Immunofluorescence and DNA FISH on 3D-preserved Interphase Nuclei to Study Changes in 3D Nuclear Organization

Published on: February 3, 2013

19.2K

Area of Science:

  • Statistics
  • Data Science
  • Computer Science

Background:

  • Probabilistic record linkage is crucial for matching records across datasets, especially when identifiers like names have errors.
  • Bipartite record linkage scenarios involve matching records between two files without internal duplicates, where entities may exist in both files.

Purpose of the Study:

  • To introduce a novel estimator for probabilistic record linkage that optimizes the F-score.
  • To provide a point estimate for the linkage structure, ensuring each record is matched to at most one record in the other file.

Main Methods:

  • Developed an estimator that maximizes the expected F-score for the linkage structure.
  • Targeted methods producing posterior distributions or match probabilities for record pairs.

Main Results:

  • The proposed F-score estimator demonstrates desirable properties in simulations.
  • Applications with real-world data validate the effectiveness of the F-score estimators.

Conclusions:

  • The developed F-score maximization estimator offers an improved approach to probabilistic record linkage.
  • This method is suitable for linkage techniques that provide probabilistic outputs, enhancing matching accuracy.