Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

Typical Model Studies

Typical Model Studies

Fluid mechanics model studies often utilize scaled-down systems to predict fluid behavior in full-scale environments, such as river flows, dam spillways, and structures interacting with open surfaces. Maintaining Froude number similarity in river models is crucial, as it replicates surface flow features like wave patterns and velocities.

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Modeling and Similitude

Modeling and Similitude

Scaled modeling is a fundamental technique in engineering, enabling the study of large and complex systems by creating smaller, manageable replicas that recreate critical characteristics of the original. In hydrology and civil infrastructure, for example, scaled models of dams help analyze water flow, turbulence, and pressure. This method allows for accurate predictions of real-world behavior within a controlled environment, significantly reducing the cost and time involved in full-scale...

Modeling in Therapy

Modeling in Therapy

Modeling, a key technique in therapy, uses observational learning to help clients acquire and practice new skills by watching therapists demonstrate desired behaviors. This approach, rooted in Albert Bandura's concept of vicarious learning, plays a significant role in therapeutic interventions for various psychological conditions, including social anxiety, ADHD, and depression.
Participant Modeling
Participant modeling involves therapists demonstrating calm and effective behaviors in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Soft-Templated Synthesis of Large-Extrinsic-Mesopore Covalent Organic Frameworks with Tunable Pore Architecture and Size.

ACS nano·2026

Same author

A reporting checklist for large language models in behavioural science.

Nature human behaviour·2026

Same author

Perceived authenticity drives gaze behavior when watching AI-generated videos of physical scenes.

Scientific reports·2026

Same author

Conniving With Continuations: Representing Goals in a Domain-Specific Language of Thought.

Topics in cognitive science·2026

Same author

Neural representation of action symbols in primate frontal cortex.

Nature·2026

Same author

Human-level learning of complex novel tasks as theory-based modelling, exploration and planning.

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences·2026

Same journal

In This Issue.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same journal

Correction for Otsuki et al., Extracellular sulfatases support cartilage homeostasis by regulating BMP and FGF signaling pathways.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same journal

Hive mind: Microbial communities and the making of memory.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same journal

Targets for disease modification in schizophrenia: New findings add to evidence for the involvement of the immune complement system.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same journal

Correction for Wang et al., The role of reduced aerosol masking from air pollutant emission reductions in recent global warming acceleration (2013-2023).

Proceedings of the National Academy of Sciences of the United States of America·2026

Same journal

Correction for Mishra, Ecology is not yet ready for AI-and why that matters.

Proceedings of the National Academy of Sciences of the United States of America·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 24, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evaluating language models for mathematics through interactions.

Katherine M Collins¹, Albert Q Jiang¹, Simon Frieder²

¹University of Cambridge, Cambridge CB2 1TN, United Kingdom.

Proceedings of the National Academy of Sciences of the United States of America

|June 3, 2024

Summary

This summary is machine-generated.

Evaluating large language models (LLMs) for interactive problem-solving requires more than static tests. Our study shows that while models like GPT-4 perform well in math, human interaction reveals nuances in helpfulness and correctness.

Keywords:

AI human–computer interaction language models theorem proving

More Related Videos

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

Published on: February 19, 2018

Related Experiment Videos

Last Updated: Jun 24, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Multimedia Battery for Assessment of Cognitive and Basic Skills in Mathematics BM-PROMA

Published on: August 28, 2021

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

Published on: February 19, 2018

Area of Science:

Artificial Intelligence
Human-Computer Interaction
Mathematics Education

Background:

Large language models (LLMs) show promise as problem-solving assistants.
Current LLM evaluation methods using static input-output pairs are inadequate for interactive settings.
Understanding LLM capabilities in dynamic, real-world applications is crucial.

Purpose of the Study:

To introduce CheckMate, a platform for interactive LLM evaluation.
To assess InstructGPT, ChatGPT, and GPT-4 as mathematical problem-solving assistants.
To analyze human interaction patterns and LLM performance in a mathematical context.

Main Methods:

Developed and utilized the CheckMate platform for human-LLM interaction.
Conducted a study involving undergraduate mathematics students and professors.
Collected interaction data and ratings to form the MathConverse dataset.
Performed case studies on GPT-4's mathematical problem-solving capabilities.

Main Results:

Derived a taxonomy of human query behaviors during LLM interaction.
Observed divergence between LLM output correctness and perceived helpfulness.
Identified specific strengths and weaknesses of GPT-4 in mathematical proofs.
Released the MathConverse dataset for further research.

Conclusions:

Interactive evaluation is essential for understanding LLM utility.
LLMs that communicate uncertainty and accept corrections are better assistants.
Mathematicians and ML practitioners should be aware of LLM limitations and potential fallibility.