Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Apr 21, 2026

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

10.7K

A Distributed Look-up Architecture for Text Mining Applications using MapReduce.

Atilla Soner Balkir1, Ian Foster2, Andrey Rzhetsky3

  • 1Department of Computer Science, University of Chicago.

Proceedings of the ... International Symposium on High Performance Distributed Computing
|October 31, 2014
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

An effective encoding of human medical conditions in disease space provides a versatile framework for deciphering disease associations.

Quantitative biology (Beijing, China)·2026
Same author

Longitudinal analysis of electronic health records reveals medical conditions associated with subsequent Alzheimer's disease development.

Alzheimer's research & therapy·2025
Same author

Public Health.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025
Same author

Cartesian equivariant representations for learning and understanding molecular orbitals.

Proceedings of the National Academy of Sciences of the United States of America·2025
Same author

32 examples of LLM applications in materials science and chemistry: towards automation, assistants, agents, and accelerated scientific discovery.

Machine learning: science and technology·2025
Same author

PSI/J: A Portable Interface for Submitting, Monitoring, and Managing Jobs.

Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience·2025
Same journal

Modeling sequence and function similarity between proteins for protein functional annotation.

Proceedings of the ... International Symposium on High Performance Distributed Computing·2014
Same journal

An Integrated Framework for Parameter-based Optimization of Scientific Workflows.

Proceedings of the ... International Symposium on High Performance Distributed Computing·2011
See all related articles

This study addresses scalability challenges in text mining by proposing a novel multi-layered look-up architecture. This approach optimizes distributed parameter management for large datasets, improving performance in Hadoop clusters.

Area of Science:

  • Computer Science
  • Data Science
  • Natural Language Processing

Background:

  • Text mining statistical models require iterative parameter access and updates.
  • Large datasets lead to parameter-rich models, causing scalability issues with naive parallel implementations.
  • Maintaining distributed look-up tables for model parameters is a key challenge.

Purpose of the Study:

  • To evaluate existing coordination alternatives for worker nodes in Hadoop clusters.
  • To propose a new multi-layered look-up architecture for parameter management.
  • To optimize text mining scalability for large corpora.

Main Methods:

  • Evaluation of existing coordination methods in Hadoop clusters.
  • Development of a multi-layered look-up architecture.

More Related Videos

Mining Spatial Transcriptomics Datasets using DeepSpaceDB
10:16

Mining Spatial Transcriptomics Datasets using DeepSpaceDB

Published on: September 5, 2025

1.0K

Related Experiment Videos

Last Updated: Apr 21, 2026

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

10.7K
Mining Spatial Transcriptomics Datasets using DeepSpaceDB
10:16

Mining Spatial Transcriptomics Datasets using DeepSpaceDB

Published on: September 5, 2025

1.0K
  • Exploitation of power-law distribution of n-gram counts.
  • Integration of Bloom Filter, in-memory cache, and HBase cluster.
  • Main Results:

    • Naive parallel implementations fail to scale for parameter-rich text mining models.
    • The proposed multi-layered architecture offers an optimized solution for specific problem domains.
    • The architecture effectively manages distributed look-up tables for model parameters.
    • Leveraging power-law distributions enhances the efficiency of the look-up system.

    Conclusions:

    • The novel multi-layered look-up architecture significantly improves scalability for parameter-rich text mining models.
    • The solution is optimized for domains exhibiting power-law characteristics in data, such as n-gram counts.
    • The integration of Bloom Filter, cache, and HBase provides a robust and scalable parameter management system.
    • This work advances distributed computing techniques for large-scale text analysis.