Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Causes of Similarity-Dissimilarity Effect01:26

Causes of Similarity-Dissimilarity Effect

236
The similarity-dissimilarity effect, a fundamental concept in social psychology, explains how interpersonal similarities and differences influence attraction and social interactions. This effect is supported by three key psychological perspectives: balance theory, social comparison theory, and consensual validation.Balance Theory and Cognitive ConsistencyBalance theory, developed by Fritz Heider, posits that individuals seek cognitive consistency in their relationships. When two people share...
236
Reliability and Validity01:29

Reliability and Validity

13.7K
Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.
13.7K
Kendall's Coefficient of Concordance01:20

Kendall's Coefficient of Concordance

915
Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...
915
Spearman's Rank Correlation Test01:20

Spearman's Rank Correlation Test

1.4K
Spearman's rank correlation test, also known as Spearman's rho, is a nonparametric method for assessing the strength and direction of association between two variables. This test is particularly valuable when the data distribution is unknown or when the assumption of normality does not hold. Named after the English psychologist and statistician Dr. Charles Edward Spearman, it serves as the nonparametric counterpart to Pearson's correlation coefficient.
Spearman's test calculates correlation by...
1.4K
Language and Cognition01:27

Language and Cognition

688
Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.
688
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.5K
3.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Modeling Item Revisit Behavior: The Hierarchical Speed-Accuracy-Revisits Model.

Educational and psychological measurement·2023
Same author

A Robust Method for Detecting Item Misfit in Large-Scale Assessments.

Educational and psychological measurement·2023
Same author

Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks.

Educational and psychological measurement·2023
Same author

Erratum to: A Response-Time-Based Latent Response Mixture Model for Identifying and Modeling Careless and Insufficient Effort Responding in Survey Data.

Psychometrika·2022
Same author

A Response-Time-Based Latent Response Mixture Model for Identifying and Modeling Careless and Insufficient Effort Responding in Survey Data.

Psychometrika·2021
Same author

Erratum: Electronic cigarette use and its association with asthma, chronic obstructive pulmonary disease (COPD) and asthma-COPD overlap syndrome among never cigarette smokers.

Tobacco induced diseases·2021
Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026
Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026
Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026
Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026
Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026
Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026
See all related articles

Related Experiment Video

Updated: Jan 8, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K

Reconceptualizing Scoring Reliability Through Linguistic Similarity.

Ji Yoon Jung1, Ummugul Bezirhan1, Matthias von Davier1

  • 1Boston College, Chestnut Hill, MA, USA.

Educational and Psychological Measurement
|December 22, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces the Linguistic-integrated Reliability Audit (LiRA), a new method for assessing scoring reliability in large-scale assessments. LiRA enhances reliability estimation across multilingual datasets, offering more comprehensive results than traditional methods.

Keywords:
ILSAscross-country scoring consistencyscoring reliabilitysemantic similarityweighted majority voting

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

774
Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
08:05

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

8.0K

Related Experiment Videos

Last Updated: Jan 8, 2026

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.9K
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

774
Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
08:05

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

8.0K

Area of Science:

  • Educational Measurement
  • Psychometrics
  • Computational Linguistics

Background:

  • Traditional cross-country scoring reliability in large-scale assessments relies on double scoring, often with limited multilingual samples.
  • Existing methods face challenges in efficiently estimating reliability across diverse linguistic datasets.

Purpose of the Study:

  • To introduce the Linguistic-integrated Reliability Audit (LiRA), a novel method for comprehensive scoring reliability estimation.
  • To extend reliability analysis to entire datasets in large-scale, multilingual assessment contexts.

Main Methods:

  • LiRA automatically generates a second score for each response by analyzing semantic alignment within similar responses.
  • A weighted majority voting mechanism determines a consensus score, ensuring robust reliability measurement.
  • The method is designed for application to entire datasets, accommodating multilingual responses.

Main Results:

  • LiRA provides a more comprehensive and systematic estimation of scoring reliability.
  • Reliability is assessed effectively at item, country, and language levels.
  • The method preserves the core principles of traditional reliability estimation.

Conclusions:

  • LiRA offers an advanced, scalable approach to scoring reliability in international assessments.
  • The method enhances the systematic evaluation of scoring quality across diverse linguistic contexts.
  • LiRA represents a significant advancement in psychometric analysis for multilingual educational data.