Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Response Surface Methodology

Response Surface Methodology

Response Surface Methodology (RSM) is a collection of statistical and mathematical techniques used to develop, improve, and optimize processes. It is particularly valuable when many input variables or factors potentially influence a response variable.
The process of RSM involves several key steps:

Surveys

Surveys

Often, psychologists develop surveys as a means of gathering data. Surveys are lists of questions to be answered by research participants, and can be delivered as paper-and-pencil questionnaires, administered electronically, or conducted verbally. Generally, the survey itself can be completed in a short time, and the ease of administering a survey makes it easy to collect data from a large number of people.

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Beyond the hype: A simulation study evaluating the predictive performance of machine learning models in psychology.

Psychological methods·2026

Same author

Comparing Different Approaches of (Not) Accounting for Rapid Guessing in Plausible Values Estimation.

Educational and psychological measurement·2026

Same author

Predicting Juvenile Delinquency and Criminal Behavior in Adulthood Using Machine Learning.

International journal of behavioral development·2025

Same author

Revisiting the structure of Diagnostic and Statistical Manual of Mental Disorders, fifth edition, Section II personality disorder criteria using individual participant data meta-analysis.

Personality disorders·2025

Same author

Data from the National Educational Panel Study (NEPS) in Germany: Educational Pathways of Students in Grade 5 and Higher.

Journal of open psychology data·2025

Same author

Data for Psychological Research in the Educational Field: Spotlights, Data Infrastructures, and Findings from Research.

Journal of open psychology data·2025

Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026

Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026

Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026

Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026

Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026

Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 7, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Detecting Careless Responding in Survey Data Using Stochastic Gradient Boosting.

Ulrich Schroeders¹, Christoph Schmidt², Timo Gnambs³

¹University of Kassel, Kassel, Germany.

Educational and Psychological Measurement

|January 7, 2022

Summary

This summary is machine-generated.

Gradient boosted trees, a machine learning method, were tested for detecting careless survey responses. While effective in simulations, this approach did not outperform traditional methods in real-world studies.

Keywords:

careless responding data cleaning gradient boosted trees outlier detection response times

More Related Videos

Dual-Task Stroop Paradigm for Detecting Cognitive Deficits in High-Functioning Stroke Patients

Dual-Task Stroop Paradigm for Detecting Cognitive Deficits in High-Functioning Stroke Patients

Published on: December 16, 2022

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Related Experiment Videos

Last Updated: Oct 7, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Dual-Task Stroop Paradigm for Detecting Cognitive Deficits in High-Functioning Stroke Patients

Dual-Task Stroop Paradigm for Detecting Cognitive Deficits in High-Functioning Stroke Patients

Published on: December 16, 2022

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Area of Science:

Psychological Measurement
Survey Methodology
Machine Learning Applications

Background:

Careless responding poses a significant threat to the reliability and validity of psychological measurements by disregarding item content.
Existing methods for detecting aberrant responses include probing questions, paradata (e.g., response times), and statistical techniques (e.g., Mahalanobis distance).

Purpose of the Study:

To introduce gradient boosted trees, a machine learning technique, for identifying careless respondents in survey data.
To compare the performance of gradient boosting machines against established detection methods using simulated and empirical data.

Main Methods:

Gradient boosted trees were employed as a novel machine learning approach to detect careless responding.
Performance was evaluated against traditional methods (outlier methods, consistency analyses, response pattern functions).
Both simulated data and empirical data from an experimentally induced careless responding study were utilized.

Main Results:

In simulation studies, gradient boosting machines demonstrated superior performance in flagging aberrant responses compared to traditional methods.
This performance advantage did not translate to the empirical study; precision was unsatisfactory for both novel and traditional methods.
Real-world survey responses appeared more erratic than anticipated by simulation studies, impacting detection accuracy.

Conclusions:

The effectiveness of gradient boosting machines for detecting careless responding is promising in simulations but requires further validation in real-world settings.
Current detection methods, both traditional and novel, exhibit limitations in precision for identifying aberrant response patterns.
Future research should focus on improving the generalizability and accuracy of detection methods for real-world survey data.