Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

2.6K
2.6K
Survival Tree01:19

Survival Tree

125
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
125
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

168
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
168
End Point Prediction: Gran Plot01:07

End Point Prediction: Gran Plot

401
A Gran plot is used to predict the equivalence volume or endpoint of a potentiometric or acid-base titration without reaching the endpoint. Typically, titration data is collected as a function of the titrant's volume up to a point less than the equivalence volume and then transformed into a linear format. The straight line is extended to the x-axis, indicating the necessary titrant volume to achieve the equivalence point.
For potentiometric titration, the Gran plot is created by plotting...
401
Vector Algebra: Graphical Method01:10

Vector Algebra: Graphical Method

12.5K
Vectors can be multiplied by scalars, added to other vectors, or subtracted from other vectors. The vector sum of two (or more) vectors is called the resultant vector or, for short, the resultant.
We use the laws of geometry to construct resultant vectors, followed by trigonometry to find vector magnitudes and directions. For a geometric construction of the sum of two vectors in a plane, we follow the parallelogram rule. Suppose two vectors are at arbitrary positions. Translate either one of...
12.5K
Accuracy and Errors in Hypothesis Testing01:13

Accuracy and Errors in Hypothesis Testing

234
Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...
234

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Using human genetic variation to estimate the effect of lipoprotein(a) lowering on pregnancy outcomes.

medRxiv : the preprint server for health sciences·2026
Same author

Using human genetics to understand the effect of modulating targets of antihypertensive drugs in pregnancy.

medRxiv : the preprint server for health sciences·2026
Same author

Integrative mendelian randomization approaches for therapeutic target prioritisation in immune-mediated diseases.

Scientific reports·2026
Same author

The genetic architecture of postoperative delirium after major surgery and its relationship with nonpostoperative neurocognitive conditions: A genome-wide association study.

PLoS medicine·2026
Same author

<i>CanDrivR-CS</i>: a cancer-specific machine learning framework for distinguishing recurrent and rare variants.

Bioinformatics advances·2026
Same author

Integrating Single-Cell Transcriptome-Wide Mendelian Randomization and Differentially Expressed Gene Analyses to Prioritize Dynamic Immune-Related Drug Targets for Cancers.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2025
Same journal

CardiaTics: An explainable AI integrated heart disease diagnosis model with feature engineering and stacked ensemble approach.

Journal of big data·2026
Same journal

Comprehensive representation of health-related phenotypes in one million dogs using topic modelling of electronic health records.

Journal of big data·2026
Same journal

UniqueNOSD: a novel framework for NoSQL over SQL databases.

Journal of big data·2025
Same journal

<i>F</i>u<i>n</i>Da: scalable serverless data analytics and in situ query processing.

Journal of big data·2025
Same journal

Integrating Big Data, Artificial Intelligence, and motion analysis for emerging precision medicine applications in Parkinson's Disease.

Journal of big data·2024
Same journal

Interpolation-split: a data-centric deep learning approach with big interpolated data to boost airway segmentation performance.

Journal of big data·2024
See all related articles

Related Experiment Video

Updated: Jul 30, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

366

Assessing the effects of hyperparameters on knowledge graph embedding quality.

Oliver Lloyd1, Yi Liu1, Tom R Gaunt1

  • 1MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK.

Journal of Big Data
|May 11, 2023
PubMed
Summary
This summary is machine-generated.

Optimizing knowledge graph embeddings is computationally expensive. This study uses Sobol sensitivity analysis to identify crucial hyperparameters, reducing computational cost and improving embedding quality. A leakage-robust variant of the UMLS knowledge graph is also presented.

Keywords:
EmbeddingHyperparameter tuningKnowledge graphSensitivity analysis

More Related Videos

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports
07:35

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

1.7K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

641

Related Experiment Videos

Last Updated: Jul 30, 2025

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

366
A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports
07:35

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Published on: October 13, 2023

1.7K
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

641

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Data Science

Background:

  • Knowledge graph embeddings are vital for tasks like link prediction and node classification.
  • Current embedding methods require significant computational resources due to hyperparameter optimization.
  • Hyperparameter tuning involves extensive sampling and testing, leading to high costs.

Purpose of the Study:

  • To reduce the computational cost of knowledge graph embedding by identifying and prioritizing important hyperparameters.
  • To investigate the impact of hyperparameter tuning on embedding quality using sensitivity analysis.
  • To address data leakage issues in the UMLS knowledge graph and propose a robust variant.

Main Methods:

  • Sobol sensitivity analysis was employed to assess hyperparameter influence on embedding quality variance.
  • Thousands of embedding trials were conducted with varying hyperparameter configurations.
  • Regression models were used to calculate Sobol indices for each hyperparameter.
  • The UMLS knowledge graph was analyzed for inverse relations causing data leakage.

Main Results:

  • Significant variability in hyperparameter sensitivity was observed across different knowledge graph datasets.
  • Dataset characteristics were identified as a probable cause for these inconsistencies.
  • Several relations in the UMLS knowledge graph were found to contribute to data leakage.
  • A leakage-robust variant, UMLS-43, was derived from the original UMLS graph.

Conclusions:

  • Prioritizing key hyperparameters can significantly reduce the computational burden of knowledge graph embedding.
  • Understanding dataset-specific hyperparameter importance is crucial for efficient embedding.
  • Addressing data leakage is essential for reliable knowledge graph analysis, as demonstrated by the UMLS-43 variant.