Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Substring selection for biomedical document classification.

Bo Han1, Zoran Obradovic, Zhang-Zhi Hu

  • 1Center for Information Science and Technology, Temple University, Philadelphia, PA 19122, USA.

Bioinformatics (Oxford, England)
|July 14, 2006
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Synovial Fluid Mesenchymal Stem Cells for Knee Arthritis and Cartilage Defects: A Review of the Literature.

The journal of knee surgery·2020
Same author

A framework for assessing carbon effect of land consolidation with life cycle assessment: A case study in China.

Journal of environmental management·2020
Same author

Precise control of the interlayer twist angle in large scale MoS<sub>2</sub> homostructures.

Nature communications·2020
Same author

Atomic-Precision Repair of a Few-Layer 2H-MoTe<sub>2</sub> Thin Film by Phase Transition and Recrystallization Induced by a Heterophase Interface.

Advanced materials (Deerfield Beach, Fla.)·2020
Same author

A single molecular sensor for selective and differential colorimetric/ratiometric detection of Cu<sup>2+</sup> and Pd<sup>2+</sup> in 100% aqueous solution.

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy·2020
Same author

Effect of extracellular polymer substances on the tetracycline removal during coagulation process.

Bioresource technology·2020
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

This study introduces a novel attribute selection method for document classification, bypassing traditional stemming. The new approach uses discriminative substrings, improving accuracy, especially for small biomedical datasets.

Area of Science:

  • Biomedical Informatics
  • Natural Language Processing
  • Machine Learning

Background:

  • Attribute selection is crucial for document classification systems.
  • Standard practice involves stemming words, but this can reduce accuracy in complex biomedical terminology, particularly with limited labeled data.
  • General-purpose stemmers may remove informative word stems.

Purpose of the Study:

  • To propose a new algorithm for attribute selection in document classification.
  • To address the limitations of stemming algorithms in biomedical text analysis.
  • To improve classification accuracy, especially when dealing with small labeled datasets.

Main Methods:

  • Developed an algorithm that omits word stemming.
  • Utilizes the most discriminative substrings as attributes for classification.

Related Experiment Videos

  • Tested the approach on five annotated abstract datasets related to protein post-translational modifications.
  • Main Results:

    • The proposed attribute selection method consistently outperformed the Porter stemmer algorithm.
    • Classifiers (Naive Bayes, Support Vector Machine) achieved higher Area Under the ROC Curve (AUC) accuracy (0.92-0.97) with the new method compared to Porter stemming (0.86-0.93).
    • The approach demonstrated particular effectiveness with small labeled datasets.

    Conclusions:

    • The proposed substring-based attribute selection method enhances document classification accuracy in the biomedical domain.
    • This method offers a significant advantage over traditional stemming, especially for datasets with limited examples.
    • The algorithm provides a more robust approach to handling complex biomedical terminology.