Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Doublet method for very fast autocoding.

Jules J Berman1

  • 1Cancer Diagnosis Program, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA. bermanj@mail.nih.gov.

BMC Medical Informatics and Decision Making
|September 17, 2004
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

DIFFERENTIATION OF A PRIMARY CHEMICALLY INDUCED RAT NEPHROBLASTOMA IN ORGAN CULTURE.

Development, growth & differentiation·2023
Same author

Post-Informatics pathology.

Journal of pathology informatics·2011
Same author

Informatics research using publicly available pathology data.

Journal of pathology informatics·2011
Same author

The tissue microarray OWL schema: An open-source tool for sharing tissue microarray data.

Journal of pathology informatics·2010
Same author

Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE).

Nature biotechnology·2008
Same author

Availability and quality of paraffin blocks identified in pathology archives: a multi-institutional study by the Shared Pathology Informatics Network (SPIN).

BMC cancer·2007
Same journal

Established machine learning matches tabular foundation models in clinical predictions.

BMC medical informatics and decision making·2026
Same journal

Explainable AI machine learning framework for chronic kidney disease prediction utilizing electronic health records.

BMC medical informatics and decision making·2026
Same journal

Interpretable SHAP-based machine learning framework for patient satisfaction prediction: a case study in Thammasat University Hospital.

BMC medical informatics and decision making·2026
Same journal

Automated generation of structured breast ultrasound reports using BreastViT and ChatGPT.

BMC medical informatics and decision making·2026
Same journal

Shared decision-making and medication adherence among community adults with chronic diseases: a cross-sectional study in Hubei Province, China.

BMC medical informatics and decision making·2026
Same journal

Classification of periapical radiographic findings for root canal therapy decision support using deep neural networks.

BMC medical informatics and decision making·2026
See all related articles

The doublet method offers a novel, rapid algorithm for autocoding biomedical text, significantly outperforming existing phrase-based methods in speed and accuracy. This open-source tool efficiently organizes large document collections by concept, crucial for fast-paced research environments.

Area of Science:

  • Computational biology
  • Bioinformatics
  • Natural language processing in medicine

Background:

  • Autocoding, or automatic concept indexing, organizes text by mapping extracted terms to a standard nomenclature.
  • Rapid accumulation of biomedical textual data necessitates computationally efficient autocoding methods.
  • Existing methods may lack the speed required for large-scale biomedical text organization.

Purpose of the Study:

  • To introduce and describe the doublet method, a new algorithm for highly efficient text autocoding.
  • To provide a computational solution for organizing large volumes of biomedical text rapidly.

Main Methods:

  • Developed an autocoder that transforms plain text into intercalated word doublets.
  • Checked doublets against a nomenclature index; matching doublets were assigned numeric codes.

Related Experiment Videos

  • Concatenated runs of matching doublets to identify and match nomenclature terms.
  • Main Results:

    • The doublet method autocoder was 8.4 times faster than a phrase autocoder (211 vs. 1,776 seconds) on a 170+ MB text collection.
    • Achieved a coding speed of 0.8 MB/second on a standard desktop computer.
    • The doublet method identified terms missed by the phrase autocoder, demonstrating superior recall.

    Conclusions:

    • The doublet method is a novel and rapid algorithm for text autocoding applicable to any nomenclature and plain text.
    • An open-source Perl implementation of the algorithm is provided.
    • This efficient method facilitates the organization of large biomedical text datasets.