Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

A statistical methodology for analyzing co-occurrence data from a large sample.

Hui Cao1, George Hripcsak, Marianthi Markatou

  • 1Department of Biomedical Informatics, 622 West 168th Street, VC-5, Columbia University, New York, NY 10032, USA.

Journal of Biomedical Informatics
|January 2, 2007
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A comparison of Fast Healthcare Interoperability Resources and Observational Medical Mutcomes Partnership electronic health record data within the All of Us Research Program.

Journal of the American Medical Informatics Association : JAMIA·2026
Same author

Heterogeneity of Treatment Effects Across Nine Glucose-Lowering Drug Classes in Type 2 Diabetes: Extension of the LEGEND-T2DM Network Study.

Diabetes, obesity & metabolism·2026
Same author

Semaglutide and Neovascular Age-Related Macular Degeneration Among Adults with Type 2 Diabetes: An OHDSI Network Study.

Ophthalmology·2026
Same author

Comparative Cardiovascular Effectiveness of Glucagon-Like Peptide 1 Receptor Agonists and Sodium-Glucose Cotransporter 2 Inhibitors in Diabetes Mellitus.

Journal of the American College of Cardiology·2026
Same author

Toward AI-Powered Cancer Etiology Research.

Cancer discovery·2026
Same author

Real-world evidence for comparative safety of second-line antihyperglycemic agents in older adults with type 2 diabetes.

Nature communications·2026
Same journal

CoAff-DTI: Fine-grained drug-target interaction prediction using pre-trained language models and affinity-guided mechanisms.

Journal of biomedical informatics·2026
Same journal

Evaluation of temporal preservation in synthetic longitudinal patient data.

Journal of biomedical informatics·2026
Same journal

ARKE: An ontology-driven framework for automated mapping of local radiology procedure terms to the LOINC-RadLex playbook using large language model.

Journal of biomedical informatics·2026
Same journal

A validation-driven training controller for cross-lingual biomedical NER via reinforcement learning-based adaptive loss weighting.

Journal of biomedical informatics·2026
Same journal

ASP-HR: An Adaptive Spatial Perception and Hierarchical Reasoning mechanism for document-level biomedical relation extraction.

Journal of biomedical informatics·2026
Same journal

Beyond Accuracy: Safety-Centered guidelines for the evaluation of LLM-based therapy recommendation systems for chronic multimorbidity patients.

Journal of biomedical informatics·2026
See all related articles

Identifying significant associations in large datasets is difficult. This study introduces a novel method combining the volume test and p-value plots to establish a rigorous, non-arbitrary threshold for detecting meaningful item associations.

Area of Science:

  • Statistics
  • Data Mining
  • Bioinformatics

Background:

  • Identifying significant associations in large databases is challenging due to numerous hypotheses and the risk of selecting statistically significant but clinically irrelevant associations.
  • Standard chi-squared (chi2) tests often yield inappropriate associations exceeding traditional significance thresholds (e.g., alpha=.05).
  • Arbitrarily choosing stricter thresholds can lead to the exclusion of potentially important findings.

Purpose of the Study:

  • To develop a more rigorous and less arbitrary method for selecting thresholds to identify meaningful associations in large datasets.
  • To improve the clinical relevance of identified associations by filtering out statistically significant but weak findings.

Main Methods:

  • Combined the volume test of Diaconis and Efron with p-value plots to adjust the p-value of the chi2-statistic.

Related Experiment Videos

  • Utilized a plot of adjusted p-values (1 - p versus N(p)) to identify deviations from linearity, indicating true associations.
  • Employed linear regression for reproducible threshold selection.
  • Main Results:

    • The proposed method successfully identified a threshold for associations.
    • The selected threshold was comparable to thresholds obtained through manual review in experimental settings.
    • The approach offers a systematic way to differentiate true associations from spurious ones.

    Conclusions:

    • The volume test combined with p-value plots provides a robust and reproducible method for threshold selection in association studies.
    • This technique enhances the reliability of identifying clinically significant associations in large databases.
    • The method addresses the limitations of traditional statistical tests in handling multiple comparisons and weak associations.