Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Tools for loading MEDLINE into a local relational database.

Diane E Oliver1, Gaurav Bhalotia, Ariel S Schwartz

  • 1Department of Genetics, Stanford University, Stanford, CA, USA. oliver@SMI.Stanford.EDU <oliver@SMI.Stanford.EDU>

BMC Bioinformatics
|October 9, 2004
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

TikTok is a valuable data source for tracking the opioid crisis.

NPJ digital medicine·2026
Same author

Drug-Target Interaction Prediction with PIGLET.

bioRxiv : the preprint server for biology·2026
Same author

GATSBI: Improving context-aware protein embeddings through biologically motivated data splits.

bioRxiv : the preprint server for biology·2026
Same author

Biological data governance in an age of AI.

Science (New York, N.Y.)·2026
Same author

The Human Omnibus of Targetable Pockets.

Journal of cheminformatics·2025
Same author

Publisher Correction: CRISPR-GPT for agentic automation of gene-editing experiments.

Nature biomedical engineering·2025
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Researchers can now locally manage and query MEDLINE data using relational databases. Software tools parse MEDLINE XML files, enabling efficient text mining and information extraction for biological and medical research.

Area of Science:

  • Biomedical Informatics
  • Computational Biology
  • Medical Research

Background:

  • MEDLINE data is distributed in XML format, posing challenges for local text mining and information extraction.
  • Researchers require efficient methods to query and manage large biomedical datasets locally.

Purpose of the Study:

  • To develop and evaluate software tools for parsing MEDLINE XML files and loading them into a relational database.
  • To provide researchers with a manageable local version of MEDLINE for advanced text analysis.

Main Methods:

  • Developed three software packages to parse MEDLINE data and load it into relational database management systems (RDBMS).
  • Installed separate MEDLINE database instances using different configurations (DBMS, processors, programming languages).

Related Experiment Videos

  • Collected data on loading times and disk-space utilization for each installation.
  • Main Results:

    • Loading times varied from 76 to 196 hours, influenced by system configurations and processing methods (sequential vs. parallel).
    • Disk-space utilization ranged from 31.6 GB to 46.3 GB, depending on indexing and data storage choices.
    • Performance was deemed reasonable despite variations in hardware and software infrastructures.

    Conclusions:

    • Relational database technology effectively supports indexing and querying of large MEDLINE datasets.
    • Local MEDLINE installations facilitate tasks beyond the PubMed application programming interface.
    • Publicly available database schemas and conversion software are provided to aid researchers.