Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Bonferroni Test

Bonferroni Test

The Bonferroni test is a statistical test named after Carlo Emilio Bonferroni, an Italian mathematician best known for Bonferroni inequalities. This statistical test is a type of multiple comparison test to determine which means are different than the rest. Bonferroni test can minimize the Type 1 error by reducing the significance level alpha, which otherwise increases with sample pairs.
The means of different samples are first paired in all possible combinations.
The null hypothesis of the...

Compacting Factor test

Compacting Factor test

The compacting factor test is a method used to assess the workability of concrete. It is especially suitable for concrete mixes containing aggregates up to one and a half inches in size. This test involves specialized equipment consisting of two truncated cone-shaped hoppers and a cylinder, all with polished interior surfaces to minimize friction.
The procedure begins by placing concrete into the upper hopper without any compaction. Once filled, the bottom door of this hopper is opened,...

Testing a Claim about Population Proportion

Testing a Claim about Population Proportion

A complete procedure for testing a claim about a population proportion is provided here.
There are two methods of testing a claim about a population proportion: (1) Using the sample proportion from the data where a binomial distribution is approximated to the normal distribution and (2) Using the binomial probabilities calculated from the data.
The first method uses normal distribution as an approximation to the binomial distribution. The requirements are as follows: sample size is large...

Introduction to Test of Independence

Introduction to Test of Independence

In statistics, the term independence means that one can directly obtain the probability of any event involving both variables by multiplying their individual probabilities. Tests of independence are chi-square tests involving the use of a contingency table of observed (data) values.
The test statistic for a test of independence is similar to that of a goodness-of-fit test:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Using application benchmark call graphs to quantify and improve the practical relevance of microbenchmark suites.

PeerJ. Computer science·2021

Same journal

How students use generative AI for software testing: An observational study.

Empirical software engineering·2026

Same journal

Is common sense all you need? Using expert defined rules to identify vulnerability patches instead of machine learning.

Empirical software engineering·2026

Same journal

Less is more: usefulness of data flow diagrams and large language models for security threat validation.

Empirical software engineering·2026

Same journal

SecMLOps: A comprehensive framework for integrating security throughout the machine learning operations lifecycle.

Empirical software engineering·2026

Same journal

Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?

Empirical software engineering·2025

Same journal

AI support for data scientists: An empirical study on workflow and alternative code recommendations.

Empirical software engineering·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 13, 2025

Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction

Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction

Published on: February 26, 2014

Applying test case prioritization to software microbenchmarks.

Christoph Laaber¹, Harald C Gall¹, Philipp Leitner²

¹Department of Informatics, University of Zurich, Zurich, Switzerland.

Empirical Software Engineering

|November 15, 2021

Summary

This summary is machine-generated.

Test case prioritization (TCP) techniques can effectively detect performance regressions in software microbenchmarks. The total greedy strategy and dynamic-coverage methods are most effective, offering a viable option for performance regression testing with manageable overhead.

Keywords:

JMH performance testing regression testing software microbenchmarking test case prioritization

More Related Videos

A Computerized Functional Skills Assessment and Training Program Targeting Technology Based Everyday Functional Skills

A Computerized Functional Skills Assessment and Training Program Targeting Technology Based Everyday Functional Skills

Published on: February 13, 2020

A Quantitative Fitness Analysis Workflow

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

Related Experiment Videos

Last Updated: Oct 13, 2025

Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction

Automated, Quantitative Cognitive/Behavioral Screening of Mice: For Genetics, Pharmacology, Animal Cognition and Undergraduate Instruction

Published on: February 26, 2014

A Computerized Functional Skills Assessment and Training Program Targeting Technology Based Everyday Functional Skills

A Computerized Functional Skills Assessment and Training Program Targeting Technology Based Everyday Functional Skills

Published on: February 13, 2020

A Quantitative Fitness Analysis Workflow

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

Area of Science:

Software Engineering
Software Testing
Performance Analysis

Background:

Regression testing is crucial for software evolution, but performance regression testing, especially for microbenchmarks, is under-researched.
Microbenchmark suites are time-consuming to execute, making efficient fault detection critical.

Purpose of the Study:

To empirically investigate the effectiveness and efficiency of coverage-based test case prioritization (TCP) techniques for software microbenchmarks.
To compare different TCP strategies (total vs. additional greedy) and coverage types (static vs. dynamic).

Main Methods:

Empirical study of 54 unique coverage-based TCP technique instantiations.
Application of total and additional greedy strategies across multiple parameterization dimensions.
Evaluation using average percentage of fault-detection on performance (APFD-P) and analysis of runtime overhead.

Main Results:

TCP techniques achieved a mean APFD-P between 0.54 and 0.71.
The top three performance regressions were detected between 29% and 66% of the microbenchmark suite execution.
The most effective TCP technique incurred an 11% runtime overhead.
The total strategy outperformed the additional strategy, and dynamic-coverage was generally preferred over static-coverage.

Conclusions:

Test case prioritization is a viable technique for performance regression testing of microbenchmarks.
Dynamic-coverage TCP techniques are recommended when analysis time permits, while static-coverage offers an alternative for time-constrained scenarios.
The total greedy strategy is superior for performance regression detection in microbenchmarks.