Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

2.5K
2.5K
5-Number Summary01:04

5-Number Summary

4.1K
In a dataset, the 5-number summary includes the minimum data value, the data value of the first quartile, the median data value or data value of the second quartile, the data value of the third quartile, and the maximum data value. These 5 data values can be visualized as a box and whisker plot.
In a box plot, the minimum and maximum data values represent the lower and upper whiskers in the graph, and the median is designated as the center of the box in the chart. The first quartile and third...
4.1K
Accuracy, limits, and approximation01:28

Accuracy, limits, and approximation

419
Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...
419
Techniques of Therapeutic Communication II: Focusing, Paraphrasing, and Summarizing01:23

Techniques of Therapeutic Communication II: Focusing, Paraphrasing, and Summarizing

7.7K
Focusing involves centering a conversation on a message's critical elements or concepts. Focusing is valuable if the talk is vague or patients begin to repeat themselves. Sometimes, when patients are asked about their symptoms, they may go off-topic and try to tell their entire life story. Respectfully, the nurse should bring the conversation back into focus.
This therapeutic technique can also be used when a patient brings up pertinent information during a health-related conversation. The...
7.7K
Guidelines for Writing Outcome01:11

Guidelines for Writing Outcome

2.6K
When developing expected outcomes for a patient care plan, the nurse should adhere to the following recommendations:
Patient outcomes reflect the patient's response to the goal rather than what the nurse aims to achieve. Terminology should be observable and measurable to avoid the reader's interpretation. The desired outcome should be realistic and achievable in the designated care timeframe. Expected outcomes should align with adjunctive therapies. The outcome should enhance care...
2.6K
Measures of Central Tendency02:16

Measures of Central Tendency

15.8K
The "center" of a data set is also a way of describing location. The two most widely used measures of the "center" of the data are the mean (average) and the median. The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean" and "average" is technically a center location. However, in practice among non-statisticians,...
15.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Pediatric Autism Diagnosis Accuracy and Confidence: A Comparison of Experienced and Inexperienced Clinicians Making Decisions with and without AI Decision Support.

Research square·2026
Same author

Enhancing Text Datasets With Scaling and Targeting Data Augmentation to Improve BERT-Based Machine Learners.

Expert systems with applications·2026
Same author

Generative Transformers for Pharmacovigilance Signal Detection using Electronic Health Records.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026
Same author

Reading between the lines: Combining pause dynamics and semantic coherence for automated assessment of thought disorder.

Neuropsychologia·2026
Same author

Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation.

Journal of biomedical informatics·2026
Same author

Comparative Evaluation of Text and Audio Simplification: A Methodological Replication Study.

Communications of the Association for Information Systems·2026
Same journal

VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026
Same journal

Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026
Same journal

X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026
Same journal

DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026
Same journal

From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026
Same journal

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026
See all related articles

Related Experiment Video

Updated: May 20, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474

APPLS: Evaluating Evaluation Metrics for Plain Language Summarization.

Yue Guo1, Tal August1, Gondy Leroy2

  • 1University of Illinois Urbana-Champaign.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing
|March 27, 2025
PubMed
Summary
This summary is machine-generated.

Evaluating plain language summarization (PLS) is difficult. Our study created APPLS, a testbed to assess PLS metrics, finding no single metric captures all quality criteria.

More Related Videos

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension
06:49

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Published on: January 10, 2014

27.0K
A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS
12:43

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

34.3K

Related Experiment Videos

Last Updated: May 20, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474
Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension
06:49

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Published on: January 10, 2014

27.0K
A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS
12:43

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

34.3K

Area of Science:

  • Natural Language Processing
  • Computational Linguistics
  • Artificial Intelligence

Background:

  • Plain Language Summarization (PLS) models are advancing, but reliable evaluation is lacking.
  • Existing text generation metrics may not suit PLS due to unique transformations like jargon removal and added explanations.
  • There is no dedicated assessment metric for PLS quality.

Purpose of the Study:

  • To introduce APPLS, a granular meta-evaluation testbed for assessing Plain Language Summarization metrics.
  • To identify and define criteria (informativeness, simplification, coherence, faithfulness) crucial for PLS.
  • To create perturbations sensitive to these PLS criteria for testbed development.

Main Methods:

  • Developed APPLS by applying defined perturbations to two PLS datasets.
  • Evaluated 14 diverse metrics, including automated scores, lexical features, and LLM prompt-based evaluations, using APPLS.
  • Assessed metric sensitivity to informativeness, simplification, coherence, and faithfulness.

Main Results:

  • No single evaluated metric effectively captured all four PLS quality criteria simultaneously.
  • Some metrics demonstrated sensitivity to specific PLS criteria.
  • Current metrics show limitations in comprehensively evaluating PLS.

Conclusions:

  • A suite of automated metrics is recommended for robust PLS quality assessment.
  • APPLS serves as the first meta-evaluation testbed for PLS.
  • Further research is needed to develop metrics that holistically evaluate PLS.