Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Calculating and Interpreting the Linear Correlation Coefficient

Calculating and Interpreting the Linear Correlation Coefficient

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable, x, and the dependent variable, y. Hence, it is also known as the Pearson product-moment correlation coefficient. It can be calculated using the following equation:

Wilcoxon Signed-Ranks Test for Matched Pairs

Wilcoxon Signed-Ranks Test for Matched Pairs

The Wilcoxon signed-rank test for matched pairs evaluates the null hypothesis by combining the ranks of differences with their signs. It essentially tests whether the median of the differences in a population of matched pairs is zero. Since the test incorporates more information than the sign test, it generally yields more trustable conclusions. This test also does not require the data to follow a normal distribution, but two conditions must be met for it to be applicable: (1) the data must...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Cross Product

Cross Product

The cross product is a fundamental concept in vector algebra that is a vector operation on two different vectors to obtain a third vector. Unlike the scalar product, the cross product results in a vector quantity perpendicular to both the original vectors.
The magnitude of the cross product is obtained by multiplying the magnitude of both the vectors and the sine of the angle between them. This means that a larger angle between the vectors will lead to a greater magnitude of the cross product.

Coefficient of Correlation

Coefficient of Correlation

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.
What the VALUE of r tells us:
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Incidence and remission of endometriosis in Germany based on prevalence data from 35 million patients from the statutory health insurance.

BMC women's health·2026

Same author

DEVELOPMENT AND APPLICATION OF BRAIN TISSUE BASED MULTI-OMICS PROFILE SCORES FOR ALZHEIMER'S DISEASE.

Research square·2026

Same author

Socio-spatial characterization of sub-sewersheds for wastewater-based epidemiology (WBE): Developing and evaluating two estimators for population-related variables.

Spatial and spatio-temporal epidemiology·2025

Same author

Hierarchical modeling of risk factors with and without prior information-the process of regression model evaluation for an example of respiratory diseases in piglet production from daily practice data.

Frontiers in veterinary science·2025

Same author

Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records.

BioData mining·2025

Same author

A simulation-based framework for modeling and prediction of personalized blood pressure trajectories in hypertensive patients after antihypertensive treatment.

PloS one·2025

Same journal

Ensuring Quality in Preclinical Research: The Importance of Being Human.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Addressing Cluster-Level Treatment Effect Heterogeneity in Sample Size Determination for Hierarchical 2 × 2 Factorial Designs.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

A Multiple Imputation Approach to Distinguish Curative From Life-Prolonging Effects in the Presence of Missing Covariates.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Tests for Categorical Data Beyond Pearson: A Distance Covariance and Energy Distance Approach.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Nonparametric Estimation of the Patient-Weighted While-Alive Estimand.

Biometrical journal. Biometrische Zeitschrift·2026

Same journal

Two-Stage Multiple Test Procedures Controlling False Discovery Rate With Auxiliary Variable and Their Application to Set4 <math><semantics><mi>Δ</mi> <annotation>$\Delta$</annotation></semantics></math> Mutant Data.

Biometrical journal. Biometrische Zeitschrift·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 6, 2025

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Detecting Interactions in High-Dimensional Data Using Cross Leverage Scores.

Sven Teschke¹, Katja Ickstadt^1,2, Alexander Munteanu¹

¹Faculty of Statistics, TU Dortmund University, Dortmund, Germany.

Biometrical Journal. Biometrische Zeitschrift

|November 29, 2024

Summary

This summary is machine-generated.

We developed a scalable method using cross leverage scores (CLSs) to identify gene interactions influencing health outcomes. This approach efficiently detects important genetic interactions in large datasets, including genome-wide data.

Keywords:

cross leverage scores genetics high‐dimensional data interaction effects sketching variable selection

More Related Videos

High-throughput Identification of Synergistic Drug Combinations by the Overlap2 Method

High-throughput Identification of Synergistic Drug Combinations by the Overlap2 Method

Published on: May 21, 2018

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Published on: November 10, 2023

Related Experiment Videos

Last Updated: Jun 6, 2025

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

High-throughput Identification of Synergistic Drug Combinations by the Overlap2 Method

High-throughput Identification of Synergistic Drug Combinations by the Overlap2 Method

Published on: May 21, 2018

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Published on: November 10, 2023

Area of Science:

Genetics
Statistical genetics
Bioinformatics

Background:

Investigating gene interactions (e.g., single-nucleotide polymorphisms or SNPs) is crucial for understanding complex health outcomes.
Analyzing interactions in large genetic datasets is computationally challenging due to the high dimensionality.

Purpose of the Study:

To develop a computationally efficient variable selection method for detecting interactions in large-scale regression models.
To introduce and evaluate cross leverage scores (CLSs) for identifying important variable interactions while maintaining interpretability.

Main Methods:

Developed a variable selection method based on cross leverage scores (CLSs) for interaction detection.
Implemented data batching and windowing techniques to scale computations for large datasets.
Utilized sketching-based approximations to further enhance computational efficiency.

Main Results:

Cross leverage scores (CLSs) were shown to be directly correlated with the importance of variables in interaction effects.
Approximation methods using sketching were found to be effective for large-scale data analysis, preserving the interaction detection capabilities of CLSs.
The methods demonstrated scalability for genome-wide data analysis.

Conclusions:

The developed CLS method and its approximations offer a scalable solution for identifying gene-gene interactions in large genetic datasets.
These methods facilitate efficient analysis of complex genetic architectures influencing health outcomes.
The approach is validated through simulations and application to real-world genetic data (HapMap project).