Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Investigation of Disease Outbreaks01:23

Investigation of Disease Outbreaks

Multistate foodborne outbreaks pose significant public health risks and require meticulous investigation to identify sources and implement control measures. The Centers for Disease Control and Prevention (CDC) utilizes a dynamic seven-step process for these investigations, integrating data from laboratories, interviews, and environmental assessments to protect public health.Outbreak Detection: The detection of multistate outbreaks typically begins with PulseNet, the CDC's national laboratory...
Classification of Illness01:17

Classification of Illness

The meaning of illness is individualized to each person who experiences an alteration in health. In contrast, disease is a medical term indicating a pathological change in the structure and function of the body or mind. It is a condition that has specific symptoms and boundaries.
An illness is a response to a disease in which the person's level of functioning is changed compared with a previous level. The general classification of illness includes acute and chronic.
Acute illness is severe and...
Steps in Outbreak Investigation01:18

Steps in Outbreak Investigation

In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:
Aggregates Classification01:29

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
Classification of Signals01:30

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
Classification of Leukocytes01:30

Classification of Leukocytes

Leukocytes are classified into two groups based on the presence or absence of cytoplasmic granules. Granular leukocytes, which contain granules, belong to the myeloid lineage and are divided into three subtypes: neutrophils, eosinophils, and basophils. These cells are roughly spherical and characterized by the granules in their cytoplasm.
Neutrophils are the most abundant type of granular leukocytes, comprising 50-70% of all leukocytes. They feature small, evenly distributed granules and a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A theory-informed deep learning approach to extracting and characterizing substance use-related stigma in social media.

BMC digital health·2026
Same author

Identifying Stigma Phenotypes in Social Media Narratives of Substance Use: Observational Study.

Journal of medical Internet research·2025
Same author

Evaluating a global classroom initiative to teach machine learning applications in healthcare.

BMC medical education·2025
Same author

Psychology student and mental health practitioner experiences of and perspectives on Client101, a virtual client chatbot training tool.

BMC medical education·2025
Same author

Comparing the Use Experiences, Contextual Factors, and Recovery Strategies Associated with Different Substances: An Analysis of Social Media Narratives.

Substance use & misuse·2025
Same author

Leveraging Large Language Models for Simulated Psychotherapy Client Interactions: Development and Usability Study of Client101.

JMIR medical education·2025
Same journal

BlockFedMed: A blockchain-federated learning framework for privacy-preserving mortality prediction across heterogeneous intensive care units.

International journal of medical informatics·2026
Same journal

Integrating clinical decision support systems in pediatric oncology: A scoping review of applications, implementation gaps, and management Implications.

International journal of medical informatics·2026
Same journal

Understanding digital health capability of allied health professionals - a mixed-methods study with content validity analysis.

International journal of medical informatics·2026
Same journal

On-premises open-source large language models for privacy-preserving multimodal depression screening.

International journal of medical informatics·2026
Same journal

Data mining methods, tasks, and algorithms for adverse drug reaction analysis in pharmacovigilance: A scoping review.

International journal of medical informatics·2026
Same journal

Development and validation of an interpretable machine learning model for predicting systemic inflammatory response syndrome after percutaneous nephrolithotomy: A multicenter study.

International journal of medical informatics·2026
See all related articles

Related Experiment Video

Updated: Jun 23, 2026

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports
07:35

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

Classifying disease outbreak reports using n-grams and semantic features.

Mike Conway1, Son Doan, Ai Kawazoe

  • 1National Institute of Informatics, Tokyo, Japan. mike@nii.ac.jp

International Journal of Medical Informatics
|May 19, 2009
PubMed
Summary
This summary is machine-generated.

Feature selection combined with n-grams and semantic features significantly improves disease outbreak report classification accuracy. This approach enhances the BioCaster text mining system

Related Experiment Videos

Last Updated: Jun 23, 2026

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports
07:35

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

Area of Science:

  • Natural Language Processing
  • Computational Linguistics
  • Epidemiology

Background:

  • The BioCaster system aims to mine disease outbreak reports for epidemiological surveillance.
  • Classifying these reports accurately is crucial for timely public health response.
  • Existing methods may not fully leverage diverse textual features.

Purpose of the Study:

  • To evaluate the effectiveness of n-grams and semantic features for classifying disease outbreak reports.
  • To investigate the contribution of a general-purpose semantic tagger (USAS) in this classification task.
  • To compare different machine learning algorithms and feature selection techniques.

Main Methods:

  • Utilized the BioCaster corpus (1000 documents) for classification experiments.
  • Employed feature sets including Named Entity recognition, n-grams (unigrams, bigrams, trigrams), and USAS semantic tags.
  • Applied Naïve Bayes, Support Vector Machine, and C4.5 decision tree algorithms.
  • Performed feature selection using the chi(2) algorithm.

Main Results:

  • A combination of unigrams, bigrams, trigrams, and semantic features with the Naïve Bayes algorithm and feature selection achieved the highest classification accuracy and F-score.
  • This performance improvement was statistically significant compared to baseline and prior work.
  • Feature selection was identified as the primary driver of improved performance, more so than semantic tagging.

Conclusions:

  • The study demonstrates that integrating bag-of-words, n-grams, and semantic features, coupled with feature selection, significantly enhances disease outbreak report classification.
  • This optimized approach offers a statistically validated improvement over previous methods in the domain.
  • The findings provide valuable insights for developing more effective text mining systems for public health surveillance.