Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Random Sampling Method01:09

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
Randomized Experiments01:13

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
Cluster Sampling Method01:20

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
Sampling Plans01:23

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
Stratified Sampling Method01:16

Stratified Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...
Systematic Sampling Method01:17

Systematic Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
Systematic sampling is one of the simplest methods...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Medication-Wide Association Study of Alzheimer's Disease and Related Dementias: Identifying Drug Candidates from Electronic Health Records through Explainable AI.

medRxiv : the preprint server for health sciences·2026
Same author

Cognitive Trajectories After Major Surgery in Older Adults and Factors Associated With Severe Decline.

Journal of the American Geriatrics Society·2026
Same author

Characteristics and Outcomes of Over 1 Million Veterans With Heart Failure Phenotyped Using Artificial Intelligence Approaches: the National DCVA-HF Registry.

Journal of cardiac failure·2026
Same author

Beware the Little Foxes that Spoil the Vines: Small Inconsistencies in Clinical Data Can Distort Machine Learning Findings.

Fortune journal of health sciences·2026
Same author

Exercise cardiac magnetic resonance biventricular volumetric reserve in heart failure with preserved ejection fraction.

European journal of heart failure·2026
Same author

Target-Dose Versus Below-Target-Dose ACE Inhibitors and Lower Risk of Kidney Failure in U.S. Veterans with HFrEF.

European journal of heart failure·2026
Same journal

Digital divide in clinical and operational artificial intelligence adoption and implementation stages: US hospital diffusion patterns and AI deserts.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Extending the fundamental theorem of biomedical informatics: a proposal and illustrative examples.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Human factors methods for designing safe health information technology: what do the experts think?

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Equity-by-design for socially assistive robots as digital health tools.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

Orchestrator multi-agent clinical decision support system for secondary headache diagnosis in primary care.

Journal of the American Medical Informatics Association : JAMIA·2026
Same journal

CUI-Curate: a GraphRAG-based framework for automated clinical concept curation for NLP applications.

Journal of the American Medical Informatics Association : JAMIA·2026
See all related articles

Related Experiment Video

Updated: May 21, 2026

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets
03:37

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Published on: March 1, 2024

Active learning for clinical text classification: is it better than random sampling?

Rosa L Figueroa1, Qing Zeng-Treitler, Long H Ngo

  • 1Departamento de Ingeniería Eléctrica, Facultad de Ingeniería, Universidad de Concepción, Concepción, Chile.

Journal of the American Medical Informatics Association : JAMIA
|June 19, 2012
PubMed
Summary
This summary is machine-generated.

Active learning algorithms can significantly reduce the need for large training datasets in medical text classification. Distance-based and combined algorithms show improved performance compared to passive learning, especially with diverse or uncertain datasets.

Related Experiment Videos

Last Updated: May 21, 2026

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets
03:37

Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

Published on: March 1, 2024

Area of Science:

  • Medical informatics
  • Machine learning
  • Natural Language Processing

Background:

  • Large labeled datasets are crucial for training effective medical text classification models.
  • Active learning strategies aim to reduce the annotation burden by intelligently selecting informative data points for labeling.

Purpose of the Study:

  • To evaluate the efficacy of active learning algorithms in reducing training set requirements for medical text classification.
  • To compare the performance of distance-based (DIST), diversity-based (DIV), and combined (CMB) active learning algorithms against passive learning.
  • To investigate the influence of dataset characteristics (diversity, uncertainty) on active learning algorithm performance.

Main Methods:

  • Three active learning algorithms (DIST, DIV, CMB) were applied to five medical text datasets.
  • Performance was assessed using classification accuracy and Area Under the ROC Curve (AUC) at varying sample sizes.
  • Dataset diversity and uncertainty were quantified using relative entropy and correlated with algorithm performance.

Main Results:

  • The DIST and CMB active learning algorithms outperformed passive learning across multiple datasets.
  • DIST demonstrated superior performance over passive learning in all five datasets.
  • Significant correlations were observed between dataset diversity and DIV performance, and dataset uncertainty and DIST performance.

Conclusions:

  • Active learning algorithms can achieve performance comparable to passive learning with substantially smaller training sets in medical text classification.
  • The DIV algorithm is more effective on diverse datasets, while the DIST algorithm performs better on datasets with lower uncertainty.