Embers of autoregression show how large language models are shaped by the problem they are trained to solve

R Thomas McCoy¹, Shunyu Yao¹, Dan Friedman¹, Mathew D Hardy², Thomas L Griffiths^{1, 2}

Affiliations

¹Department of Computer Science, Princeton University, Princeton, NJ 08542.
²Department of Psychology, Princeton University, Princeton, NJ 08542.

Proceedings of the National Academy of Sciences of the United States of America

October 4, 2024

Abstract

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that to develop a holistic understanding of these systems, we must consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts, we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. Using this approach-which we call the teleological approach-we identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. To test our predictions, we evaluate five LLMs (GPT-3.5, GPT-4, Claude 3, Llama 3, and Gemini 1.0) on 11 tasks, and we find robust evidence that LLMs are influenced by probability in the hypothesized ways. Many of the experiments reveal surprising failure modes. For instance, GPT-4’s accuracy at decoding a simple cipher is 51% when the output is a high-probability sentence but only 13% when it is low-probability, even though this task is a deterministic one for which probability should not matter. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system-one that has been shaped by its own particular set of pressures.

Keywords:

Related Concept Videos

8.2K

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in…

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.<br…

2.5K

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot…

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a…

6.1K

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when…

5.0K

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

In the equation, is the dependent…

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Related Experiment Videos

These videos have been matched automatically. Contact Us if you have any questions.

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Abstract

Keywords:

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Related Concept Videos

Improving Translational Accuracy

Language and Cognition

Multiple Regression

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Regression Toward the Mean

Regression Analysis

ABOUT JoVE

AUTHORS

LIBRARIANS

RESEARCH

EDUCATION

What is JoVE Visualize?

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Share

Related Experiment Videos These videos have been matched automatically. Contact Us if you have any questions.

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Abstract

Keywords:

Related Experiment Videos These videos have been matched automatically. Contact us if they are not relevant.

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Related Concept Videos

Improving Translational Accuracy

Language and Cognition

Multiple Regression

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Regression Toward the Mean

Regression Analysis

Related Experiment Videos

These videos have been matched automatically. Contact Us if you have any questions.

Related Experiment Videos

These videos have been matched automatically. Contact us if they are not relevant.