Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Decision Making: P-value Method01:09

Decision Making: P-value Method

5.3K
The process of hypothesis testing based on the P-value method includes calculating the P- value using the sample data and interpreting it.
First, a specific claim about the population parameter is proposed. The claim is based on the research question and is stated in a simple form. Further, an opposing statement to the claim  is also stated. These statements can act as null and alternative hypotheses:  a null hypothesis would be a neutral statement while the alternative hypothesis can...
5.3K
Difference from Background: Limit of Detection01:05

Difference from Background: Limit of Detection

6.0K
The limit of detection (LOD) is the smallest amount of analyte that can be distinguished from the background noise. The LOD value corresponds to the concentration at which the analyte signal is three times larger than the standard deviation of the blank signal. Below this value, the analyte signal cannot be differentiated from the background noise. It is calculated by dividing the calibration slope by 3 times the standard deviation of the blank signals.
The LOD indicates the presence or absence...
6.0K
Force Classification01:22

Force Classification

1.2K
Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...
1.2K
Multiple Comparison Tests01:13

Multiple Comparison Tests

3.9K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
3.9K
Classification of Signals01:30

Classification of Signals

427
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
427
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

105
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
105

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Representation Learning for Interpersonal and Multimodal Behavior Dynamics: A Multiview Extension of Latent Change Score Models.

Proceedings of the ... ACM International Conference on Multimodal Interaction. ICMI (Conference)·2025
Same author

Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2025
Same author

Computational Analysis of Expressive Behavior in Clinical Assessment.

Annual review of clinical psychology·2025
Same author

Dynamic and dyadic relationships between facial behavior, working alliance, and treatment outcomes during depression therapy.

Journal of consulting and clinical psychology·2025
Same author

Advances in Behavioral Science Using Automated Facial Image Analysis and Synthesis.

IEEE signal processing magazine·2025
Same author

Big team science reveals promises and limitations of machine learning efforts to model physiological markers of affective experience.

Royal Society open science·2025
Same journal

Regional patch-based MRI brain age modeling with an interpretable cognitive reserve proxy.

Pattern recognition letters·2026
Same journal

Plug and Play Labeling Strategies for Boosting Small Brain Lesion Segmentation.

Pattern recognition letters·2026
Same journal

MedLesSynth-LD: Lesion synthesis using physics-based noise models for robust lesion segmentation in low-data medical imaging regimes.

Pattern recognition letters·2025
Same journal

On the bias in the AUC variance estimate.

Pattern recognition letters·2024
Same journal

A too-good-to-be-true prior to reduce shortcut reliance.

Pattern recognition letters·2023
Same journal

An efficient region expansion algorithm for regular triangulated meshes.

Pattern recognition letters·2023
See all related articles

Related Experiment Video

Updated: Jun 18, 2025

Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

10.6K

Time to retire F1-binary score for action unit detection.

Saurabh Hinduja1, Tara Nourivandi2, Jeffrey F Cohn1

  • 1Department of Psychology, University of Pittsburgh, Pittsburgh, USA.

Pattern Recognition Letters
|August 1, 2024
PubMed
Summary
This summary is machine-generated.

The F1-binary score is unreliable for evaluating action unit detection due to class imbalance. Researchers recommend replacing it with the F1-micro score for more accurate facial expression analysis.

Keywords:
Action unitsData imbalanceF1 scoreMachine learning

More Related Videos

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats
11:00

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

19.7K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Related Experiment Videos

Last Updated: Jun 18, 2025

Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

10.6K
Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats
11:00

Targeted Training of Ultrasonic Vocalizations in Aged and Parkinsonian Rats

Published on: August 8, 2011

19.7K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Area of Science:

  • Computer Vision
  • Machine Learning
  • Human-Computer Interaction

Background:

  • Action unit detection is crucial for facial expression recognition, as expressions can be broken down into individual action units.
  • The F1-binary score is commonly used to evaluate action unit detection models.
  • Class imbalance poses a significant challenge in machine learning tasks, including face analysis.

Purpose of the Study:

  • To argue against the use of the F1-binary score for evaluating action unit detection models.
  • To demonstrate the negative impact of class imbalance on the reliability of the F1-binary score.
  • To propose and justify the F1-micro score as a more suitable replacement metric.

Main Methods:

  • Investigated the influence of class imbalance on action unit detection performance.
  • Evaluated the impact of class imbalance in training sets, testing sets, and on generalizability to new data.
  • Conducted empirical analyses to compare the performance of F1-binary and F1-micro scores under imbalanced conditions.

Main Results:

  • Class imbalance significantly undermines the reliability of the F1-binary score in action unit detection.
  • The F1-binary score provides a misleading evaluation when dealing with imbalanced datasets common in face analysis.
  • Empirical evidence supports the superiority of the F1-micro score in accurately reflecting model performance.

Conclusions:

  • The F1-binary score should be retired as an evaluation metric for action unit detection due to its susceptibility to class imbalance.
  • The F1-micro score is a more robust and reliable metric for evaluating action unit detection models, especially in the presence of class imbalance.
  • Adopting F1-micro will lead to more accurate assessments and advancements in facial expression recognition systems.