Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Correlation of Experimental Data01:23

Correlation of Experimental Data

410
Dimensional analysis simplifies complex physical problems and guides experimental investigations, but it does not provide complete solutions. It identifies the dimensionless groups that influence a phenomenon, but experimental data is needed to establish the specific relationships and validate theoretical predictions.
For example, a spherical particle moving through a viscous fluid experiences drag. Dimensional analysis shows that the drag force depends on the particle's diameter, velocity,...
410
Correlation01:09

Correlation

14.1K
In statistics, two variables are said to be correlated if the values of one variable are associated with the other variable. Depending on the relationship between two variables, correlation can be of three types– positive correlation, negative correlation, and zero correlation.
Two variables, for example, a and b, are said to be positively correlated if both variables move in the same direction. In other words, a positive correlation exists between two variables, a and b, if:
14.1K
Correlations02:20

Correlations

35.5K
Correlation means that there is a relationship between two or more variables (such as ice cream consumption and crime), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one variable changes, so does the other. We can measure correlation by calculating a statistic known as a correlation coefficient. A correlation coefficient is a number from -1 to +1 that indicates the strength and direction of the relationship between...
35.5K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.3K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.3K
Coefficient of Correlation01:12

Coefficient of Correlation

7.8K
The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.
What the VALUE of r tells us:
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the...
7.8K
Outliers and Influential Points01:08

Outliers and Influential Points

5.5K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
5.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Heart rate variability as a dual-use digital biomarker: integrating clinical, AI, and operational perspectives on human performance and resilience.

BMC cardiovascular disorders·2026
Same author

Separating OR, SUM, and XOR Circuits.

Journal of computer and system sciences·2017
Same author

Circumspect descent prevails in solving random constraint satisfaction problems.

Proceedings of the National Academy of Sciences of the United States of America·2008
Same journal

Tree-Packing Revisited: Faster Fully Dynamic Min-Cut and Arboricity.

Algorithmica·2026
Same journal

A General Upper Bound for the Runtime of a Coevolutionary Algorithm on Impartial Combinatorial Games.

Algorithmica·2026
Same journal

Fully Characterizing Lossy Catalytic Computation.

Algorithmica·2026
Same journal

Parameterized Complexities of Dominating and Independent Set Reconfiguration.

Algorithmica·2026
Same journal

The SLO Hierarchy of Pseudo-Boolean Functions and Runtime of Evolutionary Algorithms.

Algorithmica·2026
Same journal

From Data Completion to Problems on Hypercubes: A Parameterized Analysis of the Independent Set Problem.

Algorithmica·2025
See all related articles

Related Experiment Video

Updated: Dec 4, 2025

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis
07:11

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Published on: November 10, 2023

3.0K

Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time.

Matti Karppa1,2, Petteri Kaski1,2, Jukka Kohonen1,2

  • 1Helsinki Institute for Information Technology (HIIT), Espoo, Finland.

Algorithmica
|October 22, 2020
PubMed
Summary
This summary is machine-generated.

This study presents a deterministic subquadratic-time algorithm for finding outlier correlations in binary data. It achieves deterministic subquadratic scaling, improving upon randomized approaches for high-dimensional similarity joins.

Keywords:
CorrelationDerandomizationExpander graphOutlierSimilarity search

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.6K
Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.6K

Related Experiment Videos

Last Updated: Dec 4, 2025

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis
07:11

Author Spotlight: Emerging Technologies and Advanced Tools for Decoding Metabolomics Data Analysis

Published on: November 10, 2023

3.0K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.6K
Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.6K

Area of Science:

  • Computer Science
  • Algorithms
  • Data Mining

Background:

  • Subquadratic-time algorithms for outlier correlation detection are crucial for high-dimensional data analysis.
  • Valiant's randomized algorithm provides a subquadratic-time solution but lacks deterministic guarantees.

Purpose of the Study:

  • To derandomize Valiant's subquadratic-time algorithm for finding outlier correlations in binary data.
  • To establish a deterministic subquadratic-time similarity join for high-dimensional data.

Main Methods:

  • Derandomization of Valiant's algorithm using explicit correlation amplifiers.
  • Construction of correlation amplifiers via zigzag-product expanders.

Main Results:

  • A deterministic subquadratic-time algorithm for outlier correlation detection in binary data.
  • Deterministic subquadratic scaling achieved for high-dimensional similarity joins within a similar parameter range as the randomized version.

Conclusions:

  • It is possible to achieve deterministic subquadratic-time similarity joins for high-dimensional data.
  • The derandomized approach offers deterministic guarantees, albeit with more modest constant factor improvements over quadratic scaling compared to the randomized algorithm.