Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Language Development

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Modeling and Similitude

Modeling and Similitude

Scaled modeling is a fundamental technique in engineering, enabling the study of large and complex systems by creating smaller, manageable replicas that recreate critical characteristics of the original. In hydrology and civil infrastructure, for example, scaled models of dams help analyze water flow, turbulence, and pressure. This method allows for accurate predictions of real-world behavior within a controlled environment, significantly reducing the cost and time involved in full-scale...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment.

Educational and psychological measurement·2023

Same author

Gab1 but not Grb2 mediates tumor progression in Met overexpressing colorectal cancer cells.

Carcinogenesis·2008

Same author

Long-term donor-specific tolerance in rat cardiac allografts by intrabone marrow injection of donor bone marrow cells.

Transplantation·2008

Same author

Lsr2 of Mycobacterium tuberculosis is a DNA-bridging protein.

Nucleic acids research·2008

Same author

Amphetamine selectively enhances avoidance responding to a less salient stimulus in rats.

Journal of neural transmission (Vienna, Austria : 1996)·2008

Same author

Retrospective analysis of anterior correction and fusion for adolescent idiopathic thoracolumbar/lumbar scoliosis: the relationship between preserving mobile segments and trunk balance.

International orthopaedics·2008

Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026

Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026

Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026

Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026

Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026

Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Item Difficulty Modeling Using Fine-tuned Small and Large Language Models.

Ming Li¹, Hong Jiao¹, Tianyi Zhou¹

¹University of Maryland, College Park, MD, USA.

Educational and Psychological Measurement

|July 9, 2025

Summary

This summary is machine-generated.

Novel data augmentation strategies significantly improve item difficulty modeling in large-scale assessments using small language models (SLMs). Fine-tuned SLMs like BERT outperformed benchmarks, while large language models (LLMs) showed limited success.

Keywords:

data augmentation item difficulty modeling large language models small language models

More Related Videos

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Related Experiment Videos

Last Updated: Sep 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Area of Science:

Educational Measurement
Natural Language Processing
Machine Learning

Background:

Item difficulty modeling is crucial for large-scale assessments.
Existing methods face challenges with data imbalance and feature extraction.
Evaluating the efficacy of small and large language models (LLMs) for this task is needed.

Purpose of the Study:

To investigate and enhance item difficulty modeling using advanced language models.
To develop and validate novel data augmentation strategies.
To compare the performance of small language models (SLMs) and LLMs in predicting item difficulty.

Main Methods:

Implementation of novel data augmentation techniques: augmentation on the fly and distribution balancing.
Fine-tuning of SLMs (BERT, RoBERTa) and evaluation of domain-specific models (BioClinicalBERT, PubMedBERT).
Exploration of LLM (GPT-4) capabilities with chain-of-thought prompting and rationale generation; utilization of embedding-based methods (NV-Embed-v2).

Main Results:

Augmentation strategies significantly improved performance, outperforming benchmarks and mitigating data imbalance.
Fine-tuned SLMs achieved lower root mean squared error than the top model in the BEA 2024 Shared Task.
LLMs showed generalization but struggled with difficulty prediction; ensemble learning with SLMs enhanced accuracy.

Conclusions:

Novel data augmentation strategies are highly effective for item difficulty modeling.
Fine-tuned SLMs, particularly through ensemble methods, offer superior performance over LLMs for this specific task.
Further research is needed to improve LLM performance, potentially through increased training data or advanced reasoning techniques.