Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Calibration Curves: Linear Least Squares

Calibration Curves: Linear Least Squares

A calibration curve is a plot of the instrument's response against a series of known concentrations of a substance. This curve is used to set the instrument response levels, using the substance and its concentrations as standards. Alternatively, or additionally, an equation is fitted to the calibration curve plot and subsequently used to calculate the unknown concentrations of other samples reliably.
For data that follow a straight line, the standard method for fitting is the linear...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Calibration Curves: Correlation Coefficient

Calibration Curves: Correlation Coefficient

In a linear calibration curve, there is a value called the calibration coefficient, denoted by 'r,' which measures the strength and the direction of association between two variables. The correlation coefficient value ranges from −1 to +1. A value of +1 indicates a perfect positive linear correlation, −1 denotes a perfect negative correlation, and 0 implies no correlation between the two variables. A positive correlation value establishes that as one variable increases, the...

Instrument Calibration

Instrument Calibration

Instrument calibration is essential for ensuring that instruments produce accurate and consistent results. It is vital in manufacturing, healthcare, testing laboratories, and scientific research. Calibration processes are specific to each instrument and help enhance data accuracy. Each instrument has a unique calibration process tailored to its design and function to improve data accuracy.
Analytical Balance Calibration
An analytical balance measures mass and requires regular calibration to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Inferring ADR causality by predicting the Naranjo Score from Clinical Notes.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2021

Same author

Towards Drug Safety Surveillance and Pharmacovigilance: Current Progress in Detecting Medication and Adverse Drug Events from Electronic Health Records.

Drug safety·2019

Same author

Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).

Drug safety·2019

Same author

NK4 gene therapy inhibits HGF/Met-induced growth of human cholangiocarcinoma cells.

Digestive diseases and sciences·2013

Same author

[Low-grade extraskeletal osteosarcoma of mediastinum: report of a case].

Zhonghua bing li xue za zhi = Chinese journal of pathology·2013

Same author

MedTxting: learning based and knowledge rich SMS-style medical text contraction.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2013

Same journal

Improving Retrieval-Augmented Generation without Taxonomy-based Error Categorization.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2026

Same journal

RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2026

Same journal

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2026

Same journal

Improving Formality Style Transfer with Context-Aware Rule Injection.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2026

Same journal

SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2025

Same journal

GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Calibrating Structured Output Predictors for Natural Language Processing.

Abhyuday Jagannatha¹, Hong Yu^1,2

¹College of Information and Computer Sciences, University of Massachusetts Amherst.

Proceedings of the Conference. Association for Computational Linguistics. Meeting

|February 22, 2021

Summary

This summary is machine-generated.

This study introduces a new calibration method for natural language processing (NLP) models to ensure accurate confidence scores in critical applications. The technique improves model performance and calibration without extra training data.

More Related Videos

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Decoding Natural Behavior from Neuroethological Embedding

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

Related Experiment Videos

Last Updated: Nov 16, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Decoding Natural Behavior from Neuroethological Embedding

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

Area of Science:

Natural Language Processing (NLP)
Machine Learning
Computational Linguistics

Background:

Calibrated confidence scores are crucial for NLP applications, especially in safety-critical domains like healthcare.
Existing calibration methods struggle with the large output spaces of structured prediction models.
The need for reliable uncertainty quantification in AI decision-making is growing.

Purpose of the Study:

To propose a generalizable calibration scheme for output entities in neural network-based structured prediction models.
To enhance the reliability of confidence scores in NLP tasks such as named entity recognition and question answering.
To develop a method that improves model performance without additional training costs.

Main Methods:

A novel calibration scheme designed for any binary class calibration method and neural network architecture.
Integration of the calibration method as an uncertainty-aware, entity-specific decoding step.
Empirical evaluation across multiple NLP tasks and benchmark datasets.

Main Results:

The proposed method significantly outperforms existing calibration techniques for named entity recognition, part-of-speech tagging, and question answering.
Improved model performance was observed across various tasks and datasets when using the decoding step.
Enhanced calibration and model performance were demonstrated even on out-of-domain test scenarios.

Conclusions:

The developed calibration scheme effectively addresses the challenge of confidence score calibration in large-scale NLP models.
The method offers a dual benefit of improved calibration and enhanced model performance, acting as an effective decoding strategy.
This approach provides a valuable tool for deploying NLP applications reliably in sensitive domains.