Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

11.9K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.9K
Language Development01:22

Language Development

456
Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
456
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

103
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
103
Modeling and Similitude01:12

Modeling and Similitude

340
Scaled modeling is a fundamental technique in engineering, enabling the study of large and complex systems by creating smaller, manageable replicas that recreate critical characteristics of the original. In hydrology and civil infrastructure, for example, scaled models of dams help analyze water flow, turbulence, and pressure. This method allows for accurate predictions of real-world behavior within a controlled environment, significantly reducing the cost and time involved in full-scale...
340
Survival Tree01:19

Survival Tree

166
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
166
Language and Cognition01:27

Language and Cognition

453
Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.
453

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment.

Educational and psychological measurement·2023
Same author

Gab1 but not Grb2 mediates tumor progression in Met overexpressing colorectal cancer cells.

Carcinogenesis·2008
Same author

Long-term donor-specific tolerance in rat cardiac allografts by intrabone marrow injection of donor bone marrow cells.

Transplantation·2008
Same author

Lsr2 of Mycobacterium tuberculosis is a DNA-bridging protein.

Nucleic acids research·2008
Same author

Amphetamine selectively enhances avoidance responding to a less salient stimulus in rats.

Journal of neural transmission (Vienna, Austria : 1996)·2008
Same author

Retrospective analysis of anterior correction and fusion for adolescent idiopathic thoracolumbar/lumbar scoliosis: the relationship between preserving mobile segments and trunk balance.

International orthopaedics·2008
Same journal

A Simple Approach for Differential Test Functioning Based on Sum Scores.

Educational and psychological measurement·2026
Same journal

Evaluating Factor Retention in Large Factor Analysis Models: A Simulation Study Comparing 15 Methods.

Educational and psychological measurement·2026
Same journal

Agreement and Alignment in Binary Rating Tasks: Strategic Convergence as an Equilibrium Outcome.

Educational and psychological measurement·2026
Same journal

Interactions Between Termination Criteria and Ability Estimators in Computerized Adaptive Testing.

Educational and psychological measurement·2026
Same journal

Identification and Diagnosis of Misreporting in Surveys.

Educational and psychological measurement·2026
Same journal

The Aggregated Latent Profile Index: Measuring Person Profile Differentiation Within a Bootstrap-Validated Latent Profile Space.

Educational and psychological measurement·2026
See all related articles

Related Experiment Video

Updated: Sep 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

692

Item Difficulty Modeling Using Fine-tuned Small and Large Language Models.

Ming Li1, Hong Jiao1, Tianyi Zhou1

  • 1University of Maryland, College Park, MD, USA.

Educational and Psychological Measurement
|July 9, 2025
PubMed
Summary
This summary is machine-generated.

Novel data augmentation strategies significantly improve item difficulty modeling in large-scale assessments using small language models (SLMs). Fine-tuned SLMs like BERT outperformed benchmarks, while large language models (LLMs) showed limited success.

Keywords:
data augmentationitem difficulty modelinglarge language modelssmall language models

More Related Videos

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.3K
Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

738

Related Experiment Videos

Last Updated: Sep 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

692
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.3K
Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

738

Area of Science:

  • Educational Measurement
  • Natural Language Processing
  • Machine Learning

Background:

  • Item difficulty modeling is crucial for large-scale assessments.
  • Existing methods face challenges with data imbalance and feature extraction.
  • Evaluating the efficacy of small and large language models (LLMs) for this task is needed.

Purpose of the Study:

  • To investigate and enhance item difficulty modeling using advanced language models.
  • To develop and validate novel data augmentation strategies.
  • To compare the performance of small language models (SLMs) and LLMs in predicting item difficulty.

Main Methods:

  • Implementation of novel data augmentation techniques: augmentation on the fly and distribution balancing.
  • Fine-tuning of SLMs (BERT, RoBERTa) and evaluation of domain-specific models (BioClinicalBERT, PubMedBERT).
  • Exploration of LLM (GPT-4) capabilities with chain-of-thought prompting and rationale generation; utilization of embedding-based methods (NV-Embed-v2).

Main Results:

  • Augmentation strategies significantly improved performance, outperforming benchmarks and mitigating data imbalance.
  • Fine-tuned SLMs achieved lower root mean squared error than the top model in the BEA 2024 Shared Task.
  • LLMs showed generalization but struggled with difficulty prediction; ensemble learning with SLMs enhanced accuracy.

Conclusions:

  • Novel data augmentation strategies are highly effective for item difficulty modeling.
  • Fine-tuned SLMs, particularly through ensemble methods, offer superior performance over LLMs for this specific task.
  • Further research is needed to improve LLM performance, potentially through increased training data or advanced reasoning techniques.