Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Steps in Outbreak Investigation01:18

Steps in Outbreak Investigation

160
In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:
160
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

451
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
451
Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

698
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
698
Cluster Sampling Method01:20

Cluster Sampling Method

12.1K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
12.1K
Multiple Regression01:25

Multiple Regression

3.1K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.1K
Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

7.8K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
7.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Lineage-specific transmission and spatial clustering of Mycobacterium tuberculosis in Kaohsiung, Taiwan, in 2019-23: a population-based genomic study.

The Lancet. Microbe·2026
Same author

Comparison of phylogenetic metrics of transmission between symptomatic and asymptomatic tuberculosis in individuals who were incarcerated in Brazil in 2008-24: a retrospective genomic epidemiology study.

The Lancet. Microbe·2026
Same author

Complete genomes reveal a refined map of Mycobacterium tuberculosis genetic diversity across evolutionary scales.

Nature communications·2026
Same author

Intravesical mycobacteria reshape bladder and gut microbiota in a murine bladder cancer model.

NPJ biofilms and microbiomes·2026
Same author

SAASI: Sampling Aware Ancestral State Inference.

Nature communications·2026
Same author

e3SIM: Epidemiological-ecological-evolutionary simulation framework for genomic epidemiology.

Methods in ecology and evolution·2026
Same journal

Culture-enriched metagenomic sequencing reveals within-patient diversity and transmission of vancomycin-resistant <i>Enterococcus faecium</i>.

Microbial genomics·2026
Same journal

New isolates from the 1970s to early 2000s provide insights into the evolution of <i>Acinetobacter baumannii</i> international clone 2 and its resistome.

Microbial genomics·2026
Same journal

Comprehensive identification of sequence types belonging to <i>Acinetobacter baumannii</i> clonal complexes.

Microbial genomics·2026
Same journal

Genomic outbreak investigation of biosafety-level-3 pathogens using nanopore sequencing.

Microbial genomics·2026
Same journal

Genome mining and phylogenomics reveal diverse biosynthetic gene clusters in desert Actinobacteria.

Microbial genomics·2026
Same journal

Genomic insights into the resistome, mobilome and functional adaptation of <i>Achromobacter xylosoxidans</i> across clinical and environmental contexts.

Microbial genomics·2026
See all related articles

Related Experiment Video

Updated: Aug 8, 2025

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

168

Epidemiological cluster identification using multiple data sources: an approach using logistic regression.

Kurnia Susvitasari1, Paul F Tupper1, Irving Cancino-Muños2,3

  • 1Department of Mathematics, Simon Fraser University, Burnaby, Canada.

Microbial Genomics
|March 3, 2023
PubMed
Summary
This summary is machine-generated.

This study develops a statistical model to assign unsequenced infectious disease cases to genomic clusters using available demographic and location data. The method accurately predicts case clustering, aiding outbreak management when full genomic data is unavailable.

Keywords:
TB casesgenomic clustering

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data
10:46

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

10.7K

Related Experiment Videos

Last Updated: Aug 8, 2025

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

168
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data
10:46

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

10.7K

Area of Science:

  • Epidemiology
  • Genomic epidemiology
  • Statistical modeling

Background:

  • Infectious disease outbreak management relies on cluster identification and epidemiological understanding.
  • Genomic epidemiology often uses pathogen sequences, but not all isolates can be sequenced.
  • Unsequenced cases, though lacking sequence data, possess demographic, clinical, and location information crucial for transmission analysis.

Purpose of the Study:

  • To develop a statistical model for assigning unsequsequenced cases to existing genomic clusters.
  • To leverage available non-sequence data for improved cluster identification and epidemiological insights.
  • To estimate the true size of known clusters by incorporating unsequenced cases.

Main Methods:

  • Statistical modeling based on pairwise case similarity.
  • Predicting cluster assignment for unsequenced cases without direct contact tracing.
  • Developing methods to assess pair-wise clustering, assign cases to probable clusters, identify membership in specific clusters, and estimate cluster size.

Main Results:

  • The model successfully assigned unsequenced cases to clusters using spatial distance and nationality similarity.
  • Achieved approximately 35% accuracy in identifying the correct cluster for an unsequenced case among 38 possibilities.
  • Outperformed direct multinomial regression (17%) and random selection (<5%) in cluster assignment accuracy.

Conclusions:

  • Statistical modeling effectively integrates non-sequence data to assign unsequenced cases to genomic clusters.
  • This approach enhances infectious disease outbreak management by providing a more complete epidemiological picture.
  • The method offers a valuable tool for understanding transmission dynamics when complete genomic data is not feasible.