Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Improving Translational Accuracy

Improving Translational Accuracy

5-Number Summary

5-Number Summary

In a dataset, the 5-number summary includes the minimum data value, the data value of the first quartile, the median data value or data value of the second quartile, the data value of the third quartile, and the maximum data value. These 5 data values can be visualized as a box and whisker plot.
In a box plot, the minimum and maximum data values represent the lower and upper whiskers in the graph, and the median is designated as the center of the box in the chart. The first quartile and third...

Accuracy, limits, and approximation

Accuracy, limits, and approximation

Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...

Techniques of Therapeutic Communication II: Focusing, Paraphrasing, and Summarizing

Techniques of Therapeutic Communication II: Focusing, Paraphrasing, and Summarizing

Focusing involves centering a conversation on a message's critical elements or concepts. Focusing is valuable if the talk is vague or patients begin to repeat themselves. Sometimes, when patients are asked about their symptoms, they may go off-topic and try to tell their entire life story. Respectfully, the nurse should bring the conversation back into focus.
This therapeutic technique can also be used when a patient brings up pertinent information during a health-related conversation. The...

Guidelines for Writing Outcome

Guidelines for Writing Outcome

When developing expected outcomes for a patient care plan, the nurse should adhere to the following recommendations:
Patient outcomes reflect the patient's response to the goal rather than what the nurse aims to achieve. Terminology should be observable and measurable to avoid the reader's interpretation. The desired outcome should be realistic and achievable in the designated care timeframe. Expected outcomes should align with adjunctive therapies. The outcome should enhance care...

Measures of Central Tendency

Measures of Central Tendency

The "center" of a data set is also a way of describing location. The two most widely used measures of the "center" of the data are the mean (average) and the median. The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean" and "average" is technically a center location. However, in practice among non-statisticians,...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Pediatric Autism Diagnosis Accuracy and Confidence: A Comparison of Experienced and Inexperienced Clinicians Making Decisions with and without AI Decision Support.

Research square·2026

Same author

Enhancing Text Datasets With Scaling and Targeting Data Augmentation to Improve BERT-Based Machine Learners.

Expert systems with applications·2026

Same author

Generative Transformers for Pharmacovigilance Signal Detection using Electronic Health Records.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same author

Reading between the lines: Combining pause dynamics and semantic coherence for automated assessment of thought disorder.

Neuropsychologia·2026

Same author

Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation.

Journal of biomedical informatics·2026

Same author

Comparative Evaluation of Text and Audio Simplification: A Methodological Replication Study.

Communications of the Association for Information Systems·2026

Same journal

VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: May 20, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

应用程序:评估评估指标的简单语言总结.

Yue Guo¹, Tal August¹, Gondy Leroy²

¹University of Illinois Urbana-Champaign.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

|March 27, 2025

概括

此摘要是机器生成的。

评估普通语言总结 (PLS) 是很困难的. 我们的研究创建了APPLS,这是一个测试平台来评估PLS指标,发现没有一个指标能够捕捉到所有质量标准.

更多相关视频

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Published on: January 10, 2014

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

相关实验视频

Last Updated: May 20, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Using Eye Movements to Evaluate the Cognitive Processes Involved in Text Comprehension

Published on: January 10, 2014

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis ALS

Published on: February 21, 2011

科学领域:

自然语言处理自然语言处理.
计算语言学计算语言学
人工智能的人工智能

背景情况:

简单语言总结 (PLS) 模型正在进步,但缺乏可靠的评估.
现有的文本生成指标可能不适合PLS,因为它具有独特的转换,例如删除术语和添加解释.
对于PLS质量没有专门的评估指标.

研究的目的:

介绍APPLS,一个细粒度的元评估测试台,用于评估普通语言总结指标.
确定和定义对PLS至关重要的标准 (信息性,简化,连贯性,忠实性).
创建对这些PLS测试床开发标准敏感的干扰.

主要方法:

通过将定义的扰动应用于两个PLS数据集来开发APPLS.
评估了14个不同的指标,包括自动分数,词汇特征和基于LLM提示的评估,使用APPLS.
评估指标对信息性,简化,连贯性和可信度的敏感性.

主要成果:

没有一个单一的评估指标有效地同时捕捉了所有四个PLS质量标准.
一些指标表现出对特定PLS标准的敏感性.
目前的指标显示在全面评估PLS方面存在局限性.

结论:

建议使用一套自动化指标来进行强大的PLS质量评估.
APPLS作为PLS的第一个元评估测试台.
需要进一步的研究来开发能够全面评估PLS的指标.