Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Correlation of Experimental Data

Correlation of Experimental Data

Dimensional analysis simplifies complex physical problems and guides experimental investigations, but it does not provide complete solutions. It identifies the dimensionless groups that influence a phenomenon, but experimental data is needed to establish the specific relationships and validate theoretical predictions.
For example, a spherical particle moving through a viscous fluid experiences drag. Dimensional analysis shows that the drag force depends on the particle's diameter, velocity,...

Correlation

Correlation

In statistics, two variables are said to be correlated if the values of one variable are associated with the other variable. Depending on the relationship between two variables, correlation can be of three types– positive correlation, negative correlation, and zero correlation.
Two variables, for example, a and b, are said to be positively correlated if both variables move in the same direction. In other words, a positive correlation exists between two variables, a and b, if:

Correlations

Correlations

Correlation means that there is a relationship between two or more variables (such as ice cream consumption and crime), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one variable changes, so does the other. We can measure correlation by calculating a statistic known as a correlation coefficient. A correlation coefficient is a number from -1 to +1 that indicates the strength and direction of the relationship between...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Coefficient of Correlation

Coefficient of Correlation

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.
What the VALUE of r tells us:
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Heart rate variability as a dual-use digital biomarker: integrating clinical, AI, and operational perspectives on human performance and resilience.

BMC cardiovascular disorders·2026

Same author

Separating OR, SUM, and XOR Circuits.

Journal of computer and system sciences·2017

Same author

Circumspect descent prevails in solving random constraint satisfaction problems.

Proceedings of the National Academy of Sciences of the United States of America·2008

Same journal

Tree-Packing Revisited: Faster Fully Dynamic Min-Cut and Arboricity.

Algorithmica·2026

Same journal

A General Upper Bound for the Runtime of a Coevolutionary Algorithm on Impartial Combinatorial Games.

Algorithmica·2026

Same journal

Fully Characterizing Lossy Catalytic Computation.

Algorithmica·2026

Same journal

Parameterized Complexities of Dominating and Independent Set Reconfiguration.

Algorithmica·2026

Same journal

The SLO Hierarchy of Pseudo-Boolean Functions and Runtime of Evolutionary Algorithms.

Algorithmica·2026

Same journal

From Data Completion to Problems on Hypercubes: A Parameterized Analysis of the Independent Set Problem.

Algorithmica·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 4, 2025

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Published on: November 10, 2023

Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time.

Matti Karppa^1,2, Petteri Kaski^1,2, Jukka Kohonen^1,2

¹Helsinki Institute for Information Technology (HIIT), Espoo, Finland.

|October 22, 2020

Summary

This summary is machine-generated.

This study presents a deterministic subquadratic-time algorithm for finding outlier correlations in binary data. It achieves deterministic subquadratic scaling, improving upon randomized approaches for high-dimensional similarity joins.

Keywords:

Correlation Derandomization Expander graph Outlier Similarity search

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Related Experiment Videos

Last Updated: Dec 4, 2025

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Published on: November 10, 2023

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Area of Science:

Computer Science
Algorithms
Data Mining

Background:

Subquadratic-time algorithms for outlier correlation detection are crucial for high-dimensional data analysis.
Valiant's randomized algorithm provides a subquadratic-time solution but lacks deterministic guarantees.

Purpose of the Study:

To derandomize Valiant's subquadratic-time algorithm for finding outlier correlations in binary data.
To establish a deterministic subquadratic-time similarity join for high-dimensional data.

Main Methods:

Derandomization of Valiant's algorithm using explicit correlation amplifiers.
Construction of correlation amplifiers via zigzag-product expanders.

Main Results:

A deterministic subquadratic-time algorithm for outlier correlation detection in binary data.
Deterministic subquadratic scaling achieved for high-dimensional similarity joins within a similar parameter range as the randomized version.

Conclusions:

It is possible to achieve deterministic subquadratic-time similarity joins for high-dimensional data.
The derandomized approach offers deterministic guarantees, albeit with more modest constant factor improvements over quadratic scaling compared to the randomized algorithm.