Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

5-Number Summary

5-Number Summary

In a dataset, the 5-number summary includes the minimum data value, the data value of the first quartile, the median data value or data value of the second quartile, the data value of the third quartile, and the maximum data value. These 5 data values can be visualized as a box and whisker plot.
In a box plot, the minimum and maximum data values represent the lower and upper whiskers in the graph, and the median is designated as the center of the box in the chart. The first quartile and third...

Accuracy, limits, and approximation

Accuracy, limits, and approximation

Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...

Techniques of Therapeutic Communication II: Focusing, Paraphrasing, and Summarizing

Techniques of Therapeutic Communication II: Focusing, Paraphrasing, and Summarizing

Focusing involves centering a conversation on a message's critical elements or concepts. Focusing is valuable if the talk is vague or patients begin to repeat themselves. Sometimes, when patients are asked about their symptoms, they may go off-topic and try to tell their entire life story. Respectfully, the nurse should bring the conversation back into focus.
This therapeutic technique can also be used when a patient brings up pertinent information during a health-related conversation. The...

Guidelines for Writing Outcome

Guidelines for Writing Outcome

When developing expected outcomes for a patient care plan, the nurse should adhere to the following recommendations:
Patient outcomes reflect the patient's response to the goal rather than what the nurse aims to achieve. Terminology should be observable and measurable to avoid the reader's interpretation. The desired outcome should be realistic and achievable in the designated care timeframe. Expected outcomes should align with adjunctive therapies. The outcome should enhance care...

Measures of Central Tendency

Measures of Central Tendency

The "center" of a data set is also a way of describing location. The two most widely used measures of the "center" of the data are the mean (average) and the median. The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean" and "average" is technically a center location. However, in practice among non-statisticians,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Pediatric Autism Diagnosis Accuracy and Confidence: A Comparison of Experienced and Inexperienced Clinicians Making Decisions with and without AI Decision Support.

Research square·2026

Same author

Enhancing Text Datasets With Scaling and Targeting Data Augmentation to Improve BERT-Based Machine Learners.

Expert systems with applications·2026

Same author

Generative Transformers for Pharmacovigilance Signal Detection using Electronic Health Records.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same author

Reading between the lines: Combining pause dynamics and semantic coherence for automated assessment of thought disorder.

Neuropsychologia·2026

Same author

Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation.

Journal of biomedical informatics·2026

Same author

Comparative Evaluation of Text and Audio Simplification: A Methodological Replication Study.

Communications of the Association for Information Systems·2026

Same journal

VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 20, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

APPLS: Evaluating Evaluation Metrics for Plain Language Summarization.

Yue Guo¹, Tal August¹, Gondy Leroy²

¹University of Illinois Urbana-Champaign.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

|March 27, 2025

Summary

This summary is machine-generated.

Evaluating plain language summarization (PLS) is difficult. Our study created APPLS, a testbed to assess PLS metrics, finding no single metric captures all quality criteria.

More Related Videos

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Published on: January 10, 2014

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

Related Experiment Videos

Last Updated: May 20, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Published on: January 10, 2014

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

Area of Science:

Natural Language Processing
Computational Linguistics
Artificial Intelligence

Background:

Plain Language Summarization (PLS) models are advancing, but reliable evaluation is lacking.
Existing text generation metrics may not suit PLS due to unique transformations like jargon removal and added explanations.
There is no dedicated assessment metric for PLS quality.

Purpose of the Study:

To introduce APPLS, a granular meta-evaluation testbed for assessing Plain Language Summarization metrics.
To identify and define criteria (informativeness, simplification, coherence, faithfulness) crucial for PLS.
To create perturbations sensitive to these PLS criteria for testbed development.

Main Methods:

Developed APPLS by applying defined perturbations to two PLS datasets.
Evaluated 14 diverse metrics, including automated scores, lexical features, and LLM prompt-based evaluations, using APPLS.
Assessed metric sensitivity to informativeness, simplification, coherence, and faithfulness.

Main Results:

No single evaluated metric effectively captured all four PLS quality criteria simultaneously.
Some metrics demonstrated sensitivity to specific PLS criteria.
Current metrics show limitations in comprehensively evaluating PLS.

Conclusions:

A suite of automated metrics is recommended for robust PLS quality assessment.
APPLS serves as the first meta-evaluation testbed for PLS.
Further research is needed to develop metrics that holistically evaluate PLS.