Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Classification of Systems-II

Classification of Systems-II

Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,

Classification of Systems-I

Classification of Systems-I

Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Introduction to Test of Independence

Introduction to Test of Independence

In statistics, the term independence means that one can directly obtain the probability of any event involving both variables by multiplying their individual probabilities. Tests of independence are chi-square tests involving the use of a contingency table of observed (data) values.
The test statistic for a test of independence is similar to that of a goodness-of-fit test:

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

[Smart eye data : Development of a foundation for medical research using Smart Data applications].

Der Ophthalmologe : Zeitschrift der Deutschen Ophthalmologischen Gesellschaft·2016

Same author

[What can and cannot be achieved by registries : Perspective of the registry working group of the German Network of Health Services Research].

Der Unfallchirurg·2016

Same author

[Robustness of Hospital Benchmarking with the Hospital Standardized Mortality Ratio (HSMR): An Analysis of Secondary Data from 37 German Hospitals].

Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany))·2015

Same author

[Trend of quality of hospital care in German hospitals between 2008 and 2011: a study from national quality assurance data].

Deutsche medizinische Wochenschrift (1946)·2015

Same author

[A registry of registries and cohorts: recommendations for metadata and policies].

Gesundheitswesen (Bundesverband der Arzte des Offentlichen Gesundheitsdienstes (Germany))·2014

Same author

[Quality of hospital care in Germany: analysis of the trend between 2004 and 2008 from external quality assurance data].

Deutsche medizinische Wochenschrift (1946)·2014

Same journal

Design and methodological development of a digital clinical safety training programme informed by a national framework: a New Zealand case study.

Methods of information in medicine·2026

Same journal

Panic Prediction from Digital Phenotyping: Subject-Level Cross-Validation Reveals Limited Between-Person Generalization.

Methods of information in medicine·2026

Same journal

Agent-Based Modeling Approach for Population Dynamics of the Biological Vector Aedes Aegypti.

Methods of information in medicine·2026

Same journal

A Statistical Framework for Person-centered Analysis of Digital Service Use in Public Health and Social Care.

Methods of information in medicine·2026

Same journal

Assessing the Quality of Electronic Discharge Summaries: A Cross-Sectional Study Using the Validated Spanish Version of the PDQI-9.

Methods of information in medicine·2026

Same journal

A Knowledge Graph-Driven Hypergeometric Efficacy Prediction Model for Classical Traditional Chinese Herbal Formulas.

Methods of information in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 14, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Evaluation of a Binary Semi-supervised Classification Technique for Probabilistic Record Linkage.

D Nasseh¹, J Stausberg

¹Daniel Nasseh, Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie, Ludwig-Maximilians-Universität München, Marchioninistraße 15, 81377 Munich, Germany,

Methods of Information in Medicine

|April 21, 2015

Summary

This summary is machine-generated.

Semi-supervised classification for record linkage outperforms unsupervised methods, especially with lower data quality. This privacy-preserving technique enhances entity matching in healthcare data merging.

Keywords:

Medical record linkage classification

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Related Experiment Videos

Last Updated: Apr 14, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Area of Science:

Computer Science
Bioinformatics
Data Science

Background:

Record linkage merges data from various sources, crucial in healthcare.
Privacy concerns necessitate transforming sensitive data into encrypted pseudonyms.
Automated record linkage requires robust binary classification for entity matching.

Purpose of the Study:

Introduce and evaluate an automatable semi-supervised binary classification system for record linkage.
Compare its performance against advanced unsupervised classification techniques.
Demonstrate its capability in privacy-preserving data merging.

Main Methods:

Developed and implemented an automatable semi-supervised binary classification model.
Compared its performance against an active learning approach (unsupervised).
Evaluated systems on 400 artificial test sets derived from real patient data with varying quality.

Main Results:

Semi-supervised classification achieved an F-measure of 0.996 in high-quality data.
Unsupervised classification achieved an F-measure of 0.993 in high-quality data.
Performance diverged significantly in lower-quality data: 0.964 (semi-supervised) vs. 0.803 (unsupervised).

Conclusions:

Semi-supervised classification offers a viable model for automated probabilistic record linkage.
Semi-supervised techniques show potential to outperform unsupervised methods, particularly with lower data quality.
This approach enhances privacy-preserving record linkage in sensitive data environments.