Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Classification of Systems-II01:31

Classification of Systems-II

580
Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,
580
Classification of Systems-I01:26

Classification of Systems-I

700
Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:
700
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

614
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
614
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

5.0K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
5.0K
Introduction to Test of Independence01:21

Introduction to Test of Independence

3.2K
In statistics, the term independence means that one can directly obtain the probability of any event involving both variables by multiplying their individual probabilities. Tests of independence are chi-square tests involving the use of a contingency table of observed (data) values.
The test statistic for a test of independence is similar to that of a goodness-of-fit test:
3.2K
Aggregates Classification01:29

Aggregates Classification

1.2K
Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
1.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

[Smart eye data : Development of a foundation for medical research using Smart Data applications].

Der Ophthalmologe : Zeitschrift der Deutschen Ophthalmologischen Gesellschaft·2016
Same author

[What can and cannot be achieved by registries : Perspective of the registry working group of the German Network of Health Services Research].

Der Unfallchirurg·2016
Same author

[Robustness of Hospital Benchmarking with the Hospital Standardized Mortality Ratio (HSMR): An Analysis of Secondary Data from 37 German Hospitals].

Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany))·2015
Same author

[Trend of quality of hospital care in German hospitals between 2008 and 2011: a study from national quality assurance data].

Deutsche medizinische Wochenschrift (1946)·2015
Same author

[A registry of registries and cohorts: recommendations for metadata and policies].

Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany))·2014
Same author

[Quality of hospital care in Germany: analysis of the trend between 2004 and 2008 from external quality assurance data].

Deutsche medizinische Wochenschrift (1946)·2014
Same journal

Design and methodological development of a digital clinical safety training programme informed by a national framework: a New Zealand case study.

Methods of information in medicine·2026
Same journal

Panic Prediction from Digital Phenotyping: Subject-Level Cross-Validation Reveals Limited Between-Person Generalization.

Methods of information in medicine·2026
Same journal

Agent-Based Modeling Approach for Population Dynamics of the Biological Vector Aedes Aegypti.

Methods of information in medicine·2026
Same journal

A Statistical Framework for Person-centered Analysis of Digital Service Use in Public Health and Social Care.

Methods of information in medicine·2026
Same journal

Assessing the Quality of Electronic Discharge Summaries: A Cross-Sectional Study Using the Validated Spanish Version of the PDQI-9.

Methods of information in medicine·2026
Same journal

A Knowledge Graph-Driven Hypergeometric Efficacy Prediction Model for Classical Traditional Chinese Herbal Formulas.

Methods of information in medicine·2026
See all related articles

Related Experiment Video

Updated: Apr 14, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Evaluation of a Binary Semi-supervised Classification Technique for Probabilistic Record Linkage.

D Nasseh1, J Stausberg

  • 1Daniel Nasseh, Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie, Ludwig-Maximilians-Universität München, Marchioninistraße 15, 81377 Munich, Germany,

Methods of Information in Medicine
|April 21, 2015
PubMed
Summary
This summary is machine-generated.

Semi-supervised classification for record linkage outperforms unsupervised methods, especially with lower data quality. This privacy-preserving technique enhances entity matching in healthcare data merging.

Keywords:
Medical record linkageclassification

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.1K

Related Experiment Videos

Last Updated: Apr 14, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.1K

Area of Science:

  • Computer Science
  • Bioinformatics
  • Data Science

Background:

  • Record linkage merges data from various sources, crucial in healthcare.
  • Privacy concerns necessitate transforming sensitive data into encrypted pseudonyms.
  • Automated record linkage requires robust binary classification for entity matching.

Purpose of the Study:

  • Introduce and evaluate an automatable semi-supervised binary classification system for record linkage.
  • Compare its performance against advanced unsupervised classification techniques.
  • Demonstrate its capability in privacy-preserving data merging.

Main Methods:

  • Developed and implemented an automatable semi-supervised binary classification model.
  • Compared its performance against an active learning approach (unsupervised).
  • Evaluated systems on 400 artificial test sets derived from real patient data with varying quality.

Main Results:

  • Semi-supervised classification achieved an F-measure of 0.996 in high-quality data.
  • Unsupervised classification achieved an F-measure of 0.993 in high-quality data.
  • Performance diverged significantly in lower-quality data: 0.964 (semi-supervised) vs. 0.803 (unsupervised).

Conclusions:

  • Semi-supervised classification offers a viable model for automated probabilistic record linkage.
  • Semi-supervised techniques show potential to outperform unsupervised methods, particularly with lower data quality.
  • This approach enhances privacy-preserving record linkage in sensitive data environments.