Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

End Point Prediction: Gran Plot

End Point Prediction: Gran Plot

A Gran plot is used to predict the equivalence volume or endpoint of a potentiometric or acid-base titration without reaching the endpoint. Typically, titration data is collected as a function of the titrant's volume up to a point less than the equivalence volume and then transformed into a linear format. The straight line is extended to the x-axis, indicating the necessary titrant volume to achieve the equivalence point.
For potentiometric titration, the Gran plot is created by plotting...

Vector Algebra: Graphical Method

Vector Algebra: Graphical Method

Vectors can be multiplied by scalars, added to other vectors, or subtracted from other vectors. The vector sum of two (or more) vectors is called the resultant vector or, for short, the resultant.
We use the laws of geometry to construct resultant vectors, followed by trigonometry to find vector magnitudes and directions. For a geometric construction of the sum of two vectors in a plane, we follow the parallelogram rule. Suppose two vectors are at arbitrary positions. Translate either one of...

Accuracy and Errors in Hypothesis Testing

Accuracy and Errors in Hypothesis Testing

Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Using human genetic variation to estimate the effect of lipoprotein(a) lowering on pregnancy outcomes.

medRxiv : the preprint server for health sciences·2026

Same author

Using human genetics to understand the effect of modulating targets of antihypertensive drugs in pregnancy.

medRxiv : the preprint server for health sciences·2026

Same author

Integrative mendelian randomization approaches for therapeutic target prioritisation in immune-mediated diseases.

Scientific reports·2026

Same author

The genetic architecture of postoperative delirium after major surgery and its relationship with nonpostoperative neurocognitive conditions: A genome-wide association study.

PLoS medicine·2026

Same author

<i>CanDrivR-CS</i>: a cancer-specific machine learning framework for distinguishing recurrent and rare variants.

Bioinformatics advances·2026

Same author

Integrating Single-Cell Transcriptome-Wide Mendelian Randomization and Differentially Expressed Gene Analyses to Prioritize Dynamic Immune-Related Drug Targets for Cancers.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2025

Same journal

CardiaTics: An explainable AI integrated heart disease diagnosis model with feature engineering and stacked ensemble approach.

Journal of big data·2026

Same journal

Comprehensive representation of health-related phenotypes in one million dogs using topic modelling of electronic health records.

Journal of big data·2026

Same journal

UniqueNOSD: a novel framework for NoSQL over SQL databases.

Journal of big data·2025

Same journal

<i>F</i>u<i>n</i>Da: scalable serverless data analytics and in situ query processing.

Journal of big data·2025

Same journal

Integrating Big Data, Artificial Intelligence, and motion analysis for emerging precision medicine applications in Parkinson's Disease.

Journal of big data·2024

Same journal

Interpolation-split: a data-centric deep learning approach with big interpolated data to boost airway segmentation performance.

Journal of big data·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 30, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Assessing the effects of hyperparameters on knowledge graph embedding quality.

Oliver Lloyd¹, Yi Liu¹, Tom R Gaunt¹

¹MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK.

Journal of Big Data

|May 11, 2023

Summary

This summary is machine-generated.

Optimizing knowledge graph embeddings is computationally expensive. This study uses Sobol sensitivity analysis to identify crucial hyperparameters, reducing computational cost and improving embedding quality. A leakage-robust variant of the UMLS knowledge graph is also presented.

Keywords:

Embedding Hyperparameter tuning Knowledge graph Sensitivity analysis

More Related Videos

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: Jul 30, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Artificial Intelligence
Machine Learning
Data Science

Background:

Knowledge graph embeddings are vital for tasks like link prediction and node classification.
Current embedding methods require significant computational resources due to hyperparameter optimization.
Hyperparameter tuning involves extensive sampling and testing, leading to high costs.

Purpose of the Study:

To reduce the computational cost of knowledge graph embedding by identifying and prioritizing important hyperparameters.
To investigate the impact of hyperparameter tuning on embedding quality using sensitivity analysis.
To address data leakage issues in the UMLS knowledge graph and propose a robust variant.

Main Methods:

Sobol sensitivity analysis was employed to assess hyperparameter influence on embedding quality variance.
Thousands of embedding trials were conducted with varying hyperparameter configurations.
Regression models were used to calculate Sobol indices for each hyperparameter.
The UMLS knowledge graph was analyzed for inverse relations causing data leakage.

Main Results:

Significant variability in hyperparameter sensitivity was observed across different knowledge graph datasets.
Dataset characteristics were identified as a probable cause for these inconsistencies.
Several relations in the UMLS knowledge graph were found to contribute to data leakage.
A leakage-robust variant, UMLS-43, was derived from the original UMLS graph.

Conclusions:

Prioritizing key hyperparameters can significantly reduce the computational burden of knowledge graph embedding.
Understanding dataset-specific hyperparameter importance is crucial for efficient embedding.
Addressing data leakage is essential for reliable knowledge graph analysis, as demonstrated by the UMLS-43 variant.