Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Calibration Curves: Linear Least Squares

Calibration Curves: Linear Least Squares

A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...

Instrument Calibration

Instrument Calibration

Instrument calibration is essential for ensuring that instruments produce accurate and consistent results. It is vital in manufacturing, healthcare, testing laboratories, and scientific research. Calibration processes are specific to each instrument and help enhance data accuracy. Each instrument has a unique calibration process tailored to its design and function to improve data accuracy.
Analytical Balance Calibration
An analytical balance measures mass and requires regular calibration to...

Linearization and Approximation

Linearization and Approximation

Linearization is a mathematical technique used to approximate complex, nonlinear functions with simpler linear models in the vicinity of a chosen reference point. The method is based on the idea that, although a function may be difficult to evaluate exactly, its behavior near a specific input value can often be closely approximated by the tangent line at that point. This approach is particularly useful when small deviations from a known value are involved.Consider the square root function, for...

Language Development

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prexdiction with Multimodal Biomedical Data.

ArXiv·2026

Same author

Integrating Social Determinants of Health in a Multi-Modal Deep Clustering Survival Model for Injury-Risk in Alzheimer's and Related Dementia Patients.

Proceedings of machine learning research·2026

Same author

IRIS: Interpretable Risk Clustering Intelligence for Survival Analysis.

Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data·2026

Same author

Multi-Modal Deep Clustering Survival Machines for Alzheimer's Disease Subtype Discovery.

... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision·2026

Same author

Fair Multi-modal Canonical Correlation Analysis: A Neuroimaging Study of Alzheimer's Disease.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same author

ICAFS: Inter-Client-Aware Feature Selection for Vertical Federated Learning.

IEEE transactions on artificial intelligence·2026

Same journal

Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift.

Proceedings of machine learning research·2026

Same journal

Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video.

Proceedings of machine learning research·2026

Same journal

Perspective: Machine Learning for Health Should Consider Social Drivers of Health.

Proceedings of machine learning research·2026

Same journal

Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression.

Proceedings of machine learning research·2026

Same journal

Does Domain-Specific Retrieval Augmented Generation Help LLMs Answer Consumer Health Questions?

Proceedings of machine learning research·2026

Same journal

Quantitative Convergence Analysis of Projected Stochastic Gradient Descent for Non-Convex Losses via the Goldstein Subdifferential.

Proceedings of machine learning research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 24, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach.

Jiancong Xiao¹, Bojian Hou¹, Zhanliang Wang¹

¹University of Pennsylvania, PA, USA.

Proceedings of Machine Learning Research

|March 23, 2026

Summary

This summary is machine-generated.

Preference alignment in Large Language Models (LLMs) causes poor calibration, leading to overconfidence. This study introduces domain-specific fine-tuning and calibration-aware methods to improve LLM calibration without sacrificing performance.

More Related Videos

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Related Experiment Videos

Last Updated: Mar 24, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

Area of Science:

Artificial Intelligence
Machine Learning
Natural Language Processing

Background:

Large Language Models (LLMs) rely on preference alignment for success.
Preference alignment often results in poor model calibration, a phenomenon known as overconfidence.
Pre-trained models are typically well-calibrated, but LLMs degrade after alignment.

Purpose of the Study:

Investigate the reasons behind calibration degradation in LLMs post-preference alignment.
Develop methods to address and mitigate poor calibration in aligned LLMs.
Analyze the impact of calibration on LLM performance and propose solutions for different model regimes.

Main Methods:

Observed that preference collapse during alignment generalizes to calibration issues, causing overconfidence.
Demonstrated the effectiveness of fine-tuning with domain-specific knowledge to reduce overconfidence.
Categorized models into 'calibratable' and 'non-calibratable' based on Expected Calibration Error (ECE).
Proposed a calibration-aware fine-tuning approach for the calibratable regime.
Developed an EM-algorithm-based ECE regularization for the non-calibratable regime.

Main Results:

Preference alignment leads to overconfidence and poor calibration in LLMs.
Domain-specific fine-tuning alleviates overconfidence.
A calibration-aware fine-tuning approach maintains performance in the calibratable regime.
ECE regularization effectively reduces calibration error in the non-calibratable regime.
Proposed methods were validated through extensive experiments.

Conclusions:

Preference alignment negatively impacts LLM calibration due to preference collapse.
Domain-specific knowledge and calibration-aware fine-tuning are crucial for improving LLM calibration.
Tailored methods for calibratable and non-calibratable models effectively address overconfidence and maintain performance.