Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Decision Making: P-value Method

Decision Making: P-value Method

The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim is also stated. These statements can act as null and alternative hypotheses: a null hypothesis would be a neutral statement while the alternative hypothesis can...

Difference from Background: Limit of Detection

Difference from Background: Limit of Detection

The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...

Force Classification

Force Classification

Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Representation Learning for Interpersonal and Multimodal Behavior Dynamics: A Multiview Extension of Latent Change Score Models.

Proceedings of the ... ACM International Conference on Multimodal Interaction. ICMI (Conference)·2025

Same author

Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2025

Same author

Computational Analysis of Expressive Behavior in Clinical Assessment.

Annual review of clinical psychology·2025

Same author

Dynamic and dyadic relationships between facial behavior, working alliance, and treatment outcomes during depression therapy.

Journal of consulting and clinical psychology·2025

Same author

Advances in Behavioral Science Using Automated Facial Image Analysis and Synthesis.

IEEE signal processing magazine·2025

Same author

Big team science reveals promises and limitations of machine learning efforts to model physiological markers of affective experience.

Royal Society open science·2025

Same journal

Regional patch-based MRI brain age modeling with an interpretable cognitive reserve proxy.

Pattern recognition letters·2026

Same journal

Plug and Play Labeling Strategies for Boosting Small Brain Lesion Segmentation.

Pattern recognition letters·2026

Same journal

MedLesSynth-LD: Lesion synthesis using physics-based noise models for robust lesion segmentation in low-data medical imaging regimes.

Pattern recognition letters·2025

Same journal

On the bias in the AUC variance estimate.

Pattern recognition letters·2024

Same journal

A too-good-to-be-true prior to reduce shortcut reliance.

Pattern recognition letters·2023

Same journal

An efficient region expansion algorithm for regular triangulated meshes.

Pattern recognition letters·2023

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 18, 2025

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Time to retire F1-binary score for action unit detection.

Saurabh Hinduja¹, Tara Nourivandi², Jeffrey F Cohn¹

¹Department of Psychology, University of Pittsburgh, Pittsburgh, USA.

Pattern Recognition Letters

|August 1, 2024

Summary

This summary is machine-generated.

The F1-binary score is unreliable for evaluating action unit detection due to class imbalance. Researchers recommend replacing it with the F1-micro score for more accurate facial expression analysis.

Keywords:

Action units Data imbalance F1 score Machine learning

More Related Videos

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Jun 18, 2025

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Computer Vision
Machine Learning
Human-Computer Interaction

Background:

Action unit detection is crucial for facial expression recognition, as expressions can be broken down into individual action units.
The F1-binary score is commonly used to evaluate action unit detection models.
Class imbalance poses a significant challenge in machine learning tasks, including face analysis.

Purpose of the Study:

To argue against the use of the F1-binary score for evaluating action unit detection models.
To demonstrate the negative impact of class imbalance on the reliability of the F1-binary score.
To propose and justify the F1-micro score as a more suitable replacement metric.

Main Methods:

Investigated the influence of class imbalance on action unit detection performance.
Evaluated the impact of class imbalance in training sets, testing sets, and on generalizability to new data.
Conducted empirical analyses to compare the performance of F1-binary and F1-micro scores under imbalanced conditions.

Main Results:

Class imbalance significantly undermines the reliability of the F1-binary score in action unit detection.
The F1-binary score provides a misleading evaluation when dealing with imbalanced datasets common in face analysis.
Empirical evidence supports the superiority of the F1-micro score in accurately reflecting model performance.

Conclusions:

The F1-binary score should be retired as an evaluation metric for action unit detection due to its susceptibility to class imbalance.
The F1-micro score is a more robust and reliable metric for evaluating action unit detection models, especially in the presence of class imbalance.
Adopting F1-micro will lead to more accurate assessments and advancements in facial expression recognition systems.