Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Receiver Operating Characteristic Plot

Receiver Operating Characteristic Plot

A ROC (Receiver Operating Characteristic) plot is a graphical tool used to assess the performance of a binary classification model by illustrating the trade-off between sensitivity (true positive rate) and specificity (false positive rate). By plotting sensitivity against 1 - specificity across various threshold settings, the ROC curve shows how well the model distinguishes between classes, with a curve closer to the top-left corner indicating a more accurate model. The area under the ROC curve...

Response Surface Methodology

Response Surface Methodology

Response Surface Methodology (RSM) is a collection of statistical and mathematical techniques used to develop, improve, and optimize processes. It is particularly valuable when many input variables or factors potentially influence a response variable.
The process of RSM involves several key steps:

Decision Making: Traditional Method

Decision Making: Traditional Method

The process of hypothesis testing based on the traditional method includes calculating the critical value, testing the value of the test statistic using the sample data, and interpreting these values.
First, a specific claim about the population parameter is decided based on the research question and is stated in a simple form. Further, an opposing statement to this claim is also stated. These statements can act as null and alternative hypotheses, out of which a null hypothesis would be a...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Testing a Claim about Standard Deviation

Testing a Claim about Standard Deviation

A complete procedure to test a claim about population standard deviation or population variance is explained here.
The hypothesis testing for the claim of population standard deviation (or variance) requires the data and samples to be random and unbiased. The population distribution also must be normal. There is no specific requirement on the sample size as the estimation is based on the chi-square distribution.
As a first step, the hypothesis (null and alternative) concerning the claim about...

Critical Region, Critical Values and Significance Level

Critical Region, Critical Values and Significance Level

The critical region, critical value, and significance level are interdependent concepts crucial in hypothesis testing.
In hypothesis testing, a sample statistic is converted to a test statistic using z, t, or chi-square distribution. A critical region is an area under the curve in probability distributions demarcated by the critical value. When the test statistic falls in this region, it suggests that the null hypothesis must be rejected. As this region contains all those values of the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Few and Different: Detecting Examinees With Preknowledge Using Extended Isolation Forests.

Applied psychological measurement·2025

Same author

Profiles of Adoptee Adjustment in Young Adulthood.

Adoption quarterly·2023

Same author

Pilot Study of a Patient-Centered Radiology Process Model.

Journal of the American College of Radiology : JACR·2016

Same author

Clinical outcomes following hospital-wide implementation of prolonged-infusion cefepime and ceftazidime.

International journal of antimicrobial agents·2015

Same author

Using multivariate generalizability theory to assess the effect of content stratification on the reliability of a performance assessment.

Advances in health sciences education : theory and practice·2010

Same author

Assessing the impact of modifications to the documentation component's scoring rubric and rater training on USMLE integrated clinical encounter scores.

Academic medicine : journal of the Association of American Medical Colleges·2009

Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026

Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026

Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026

Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026

Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026

Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 7, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process.

Dongwei Wang¹, Lisa A Keller¹

¹University of Massachusetts Amherst, USA.

Educational and Psychological Measurement

|November 18, 2024

Summary

This summary is machine-generated.

Optimizing educational assessment cut scores involves considering sample distribution, prevalence, and cost ratios. Adjusting cut scores based on these factors improves classification accuracy, especially in low-prevalence scenarios.

Keywords:

ROC analysis cut score refine standard setting

More Related Videos

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Jun 7, 2025

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Educational Measurement
Psychometrics
Statistical Analysis

Background:

Standard setting defines cut scores in educational assessment using subject matter experts.
Refining cut scores requires statistical and theoretical evidence for improved classification accuracy.

Purpose of the Study:

Investigate the impact of sample distribution, prevalence, and cost ratio on classification accuracy.
Provide statistical evidence for refining cut scores in educational assessments.
Examine how receiver operating characteristic (ROC) analysis can inform cut score adjustments.

Main Methods:

Simulated 40 item responses for four sample distributions.
Manipulated prevalence of positive events and cost ratios (false negatives vs. false positives).
Utilized receiver operating characteristic (ROC) analysis and the Youden Index (J) to identify optimal cut scores.

Main Results:

Optimal cut scores shift towards the mode of the proficiency distribution.
Cut score adjustments are influenced by prevalence and cost ratio.
Increasing cut scores improves classification for low-prevalence events; decreasing for high-prevalence events.
Higher cost ratios lead to lower optimal cut scores.

Conclusions:

Cut score refinement is essential for accurate educational assessment.
Statistical evidence supports adjusting cut scores based on prevalence and cost ratios.
Findings offer guidance for policy decisions regarding cut score optimization.