Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Causes of Similarity-Dissimilarity Effect

Causes of Similarity-Dissimilarity Effect

The similarity-dissimilarity effect, a fundamental concept in social psychology, explains how interpersonal similarities and differences influence attraction and social interactions. This effect is supported by three key psychological perspectives: balance theory, social comparison theory, and consensual validation.Balance Theory and Cognitive ConsistencyBalance theory, developed by Fritz Heider, posits that individuals seek cognitive consistency in their relationships. When two people share...

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Kendall's Coefficient of Concordance

Kendall's Coefficient of Concordance

Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...

Spearman's Rank Correlation Test

Spearman's Rank Correlation Test

Spearman's rank correlation test, also known as Spearman's rho, is a nonparametric method for assessing the strength and direction of association between two variables. This test is particularly valuable when the data distribution is unknown or when the assumption of normality does not hold. Named after the English psychologist and statistician Dr. Charles Edward Spearman, it serves as the nonparametric counterpart to Pearson's correlation coefficient.
Spearman's test calculates correlation by...

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

Improving Translational Accuracy

Improving Translational Accuracy

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Modeling Item Revisit Behavior: The Hierarchical Speed-Accuracy-Revisits Model.

Educational and psychological measurement·2023

Same author

A Robust Method for Detecting Item Misfit in Large-Scale Assessments.

Educational and psychological measurement·2023

Same author

Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks.

Educational and psychological measurement·2023

Same author

Erratum to: A Response-Time-Based Latent Response Mixture Model for Identifying and Modeling Careless and Insufficient Effort Responding in Survey Data.

Psychometrika·2022

Same author

A Response-Time-Based Latent Response Mixture Model for Identifying and Modeling Careless and Insufficient Effort Responding in Survey Data.

Psychometrika·2021

Same author

Erratum: Electronic cigarette use and its association with asthma, chronic obstructive pulmonary disease (COPD) and asthma-COPD overlap syndrome among never cigarette smokers.

Tobacco induced diseases·2021

Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026

Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026

Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026

Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026

Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026

Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 8, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Reconceptualizing Scoring Reliability Through Linguistic Similarity.

Ji Yoon Jung¹, Ummugul Bezirhan¹, Matthias von Davier¹

¹Boston College, Chestnut Hill, MA, USA.

Educational and Psychological Measurement

|December 22, 2025

Summary

This summary is machine-generated.

This study introduces the Linguistic-integrated Reliability Audit (LiRA), a new method for assessing scoring reliability in large-scale assessments. LiRA enhances reliability estimation across multilingual datasets, offering more comprehensive results than traditional methods.

Keywords:

ILSAs cross-country scoring consistency scoring reliability semantic similarity weighted majority voting

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Related Experiment Videos

Last Updated: Jan 8, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Area of Science:

Educational Measurement
Psychometrics
Computational Linguistics

Background:

Traditional cross-country scoring reliability in large-scale assessments relies on double scoring, often with limited multilingual samples.
Existing methods face challenges in efficiently estimating reliability across diverse linguistic datasets.

Purpose of the Study:

To introduce the Linguistic-integrated Reliability Audit (LiRA), a novel method for comprehensive scoring reliability estimation.
To extend reliability analysis to entire datasets in large-scale, multilingual assessment contexts.

Main Methods:

LiRA automatically generates a second score for each response by analyzing semantic alignment within similar responses.
A weighted majority voting mechanism determines a consensus score, ensuring robust reliability measurement.
The method is designed for application to entire datasets, accommodating multilingual responses.

Main Results:

LiRA provides a more comprehensive and systematic estimation of scoring reliability.
Reliability is assessed effectively at item, country, and language levels.
The method preserves the core principles of traditional reliability estimation.

Conclusions:

LiRA offers an advanced, scalable approach to scoring reliability in international assessments.
The method enhances the systematic evaluation of scoring quality across diverse linguistic contexts.
LiRA represents a significant advancement in psychometric analysis for multilingual educational data.