Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Pedigree Analysis01:35

Pedigree Analysis

90.5K
Overview
90.5K
Pedigree Analysis01:35

Pedigree Analysis

19.0K
19.0K
Incomplete Dominance01:43

Incomplete Dominance

32.1K
Gregor Mendel's work (1822 - 1884) was primarily focused on pea plants. Through his initial experiments, he determined that every gene in a diploid cell has two variants called alleles inherited from each parent. He suggested that amongst these two alleles, one allele is dominant in character and the other recessive. The combination of alleles determines the phenotype of a gene in an organism.
32.1K
Probability Laws01:49

Probability Laws

44.8K
Overview
44.8K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

16.5K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
16.5K
Punnett Squares01:00

Punnett Squares

127.2K
Overview
127.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Benchmarking reliability and calibration of LLMs for multi-cancer early detection test communication.

JAMIA open·2026
Same author

The biology of hypomorphic TP53 variants and implications for clinical management.

Clinical cancer research : an official journal of the American Association for Cancer Research·2026
Same author

Multivariate causal effects: a Bayesian causal regression factor model.

Biometrics·2026
Same author

Diverse mediators of cancer predisposition uncovered by germline whole genome sequencing of unexplained familial cancers.

medRxiv : the preprint server for health sciences·2026
Same author

Variation at the R181 residue of p53 confers loss of p53 DNA binding cooperativity with the retention of mitochondrial-associated apoptosis.

Molecular cancer research : MCR·2026
Same author

Invest where impact begins: recommendations from Breast Cancer Research Foundation Early Career Investigator Working Group (Part 1 of 2).

NPJ breast cancer·2026
Same journal

Bayesian Methods for Subgroup Efficacy and Safety: Application to Japanese Patients in JAVELIN Renal 101.

JCO clinical cancer informatics·2026
Same journal

Effect of a Multidimensional Digital Health Intervention on Quality of Life in Breast Cancer Survivors: A Randomized Controlled Trial.

JCO clinical cancer informatics·2026
Same journal

Can Small Open-Source Language Models With Retrieval-Augmented Generation Match GPT-4 Performance in Breast Cancer Clinical Decision Support?

JCO clinical cancer informatics·2026
Same journal

Machine Learning Algorithm for the Detection of Tumor Microsatellite Instability Based on Multiomics Biomarkers.

JCO clinical cancer informatics·2026
Same journal

Foundation Model-Driven Regions of Interest Classification and Renaming in Cancer Radiotherapy: A Customizable, Retraining-Free Workflow Across Institutions.

JCO clinical cancer informatics·2026
Same journal

Announcing a New Article Type in <i>JCO Clinical Cancer Informatics</i>: The Resource Report.

JCO clinical cancer informatics·2026
See all related articles

Related Experiment Video

Updated: Mar 18, 2026

Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information
09:37

Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information

Published on: August 15, 2019

10.6K

Interpretable Active Learning for Pedigree Data Deduplication in Cancer Genetics.

Maria S Rosito1,2, Aleck E Cervantes3, Christine Hong3

  • 1Department of Data Science, Dana-Farber Cancer Institute, Boston, MA.

JCO Clinical Cancer Informatics
|March 16, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces an interpretable active learning method to efficiently identify duplicate pedigree records in multicenter genetic studies. The approach significantly automates the deduplication process, improving data quality for rare genetic conditions like Li-Fraumeni syndrome.

More Related Videos

Author Spotlight: Finding New Therapeutic Targets for Malignant Peripheral Nerve Sheath Tumor Through Genome-Scale shRNA Screens
09:33

Author Spotlight: Finding New Therapeutic Targets for Malignant Peripheral Nerve Sheath Tumor Through Genome-Scale shRNA Screens

Published on: August 25, 2023

1.8K
In Vivo Modeling of the Morbid Human Genome using Danio rerio
12:31

In Vivo Modeling of the Morbid Human Genome using Danio rerio

Published on: August 24, 2013

21.4K

Related Experiment Videos

Last Updated: Mar 18, 2026

Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information
09:37

Navigating MARRVEL, a Web-Based Tool that Integrates Human Genomics and Model Organism Genetics Information

Published on: August 15, 2019

10.6K
Author Spotlight: Finding New Therapeutic Targets for Malignant Peripheral Nerve Sheath Tumor Through Genome-Scale shRNA Screens
09:33

Author Spotlight: Finding New Therapeutic Targets for Malignant Peripheral Nerve Sheath Tumor Through Genome-Scale shRNA Screens

Published on: August 25, 2023

1.8K
In Vivo Modeling of the Morbid Human Genome using Danio rerio
12:31

In Vivo Modeling of the Morbid Human Genome using Danio rerio

Published on: August 24, 2013

21.4K

Area of Science:

  • Genetics and Bioinformatics
  • Computational Biology
  • Data Science

Background:

  • Multicenter studies are crucial for rare genetic conditions but face challenges with duplicate records due to deidentified data.
  • Duplicate pedigree data can introduce bias in family-based genetic studies, necessitating robust deduplication methods.
  • The Li-Fraumeni and TP53: Understanding and Progress (LiFT UP) study involves families with TP53 mutations, highlighting the need for accurate data.

Purpose of the Study:

  • To develop and evaluate an interpretable, active learning-based approach for efficient pedigree deduplication in multicenter genetic studies.
  • To address the challenge of duplicate records arising from relatives enrolled at different sites in deidentified datasets.
  • To improve data quality for genetic studies, specifically focusing on families with TP53 mutations.

Main Methods:

  • Combined heuristic labeling with graph-based features and a machine learning model for iterative duplicate detection.
  • Generated a partially labeled dataset using mutation variant diversity and family characteristics.
  • Trained a random forest classifier and employed active learning for refinement, applied to LiFT UP study data.

Main Results:

  • Achieved 99.95% automated processing in the pedigree deduplication workflow for the LiFT UP study data.
  • Minimized manual effort by prioritizing likely duplicates for human review, ensuring high specificity.
  • Demonstrated a scalable, automated solution that avoids reliance on traditional identifier-matching filters.

Conclusions:

  • Interpretable active learning is an effective strategy for pedigree deduplication in multicenter genetic research.
  • The developed method offers a scalable solution for enhancing data quality in genetic studies.
  • Future research will focus on refining duplicate identification and assessing generalizability to other genetic datasets.