Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sources of Law01:26

Sources of Law

Laws form the essential rules set by governing authorities to shape and control societal behavior. In nursing, laws guide actions, safeguard patient rights, define nurses' scope of practice, and maintain professional standards. Understanding the legal framework governing nursing involves recognizing four primary sources of law: constitutional, statutory, administrative (regulatory), and common law.
Constitutional law is foundational, deriving from federal and state constitutions, and...
Types of Surveys01:27

Types of Surveys

Surveys are essential for marking property boundaries near water bodies. Different types of surveys are defined, each with its own function. Land surveys mark the property boundaries, while route surveys determine the position of properties on nearby highways. Topographic surveys create maps by capturing the three-dimensional features of the land. Hydrographic surveys focus on the shapes of underwater areas and the movement of streams through the properties. Mine surveys determine the relative...
Surveys02:16

Surveys

Often, psychologists develop surveys as a means of gathering data. Surveys are lists of questions to be answered by research participants, and can be delivered as paper-and-pencil questionnaires, administered electronically, or conducted verbally. Generally, the survey itself can be completed in a short time, and the ease of administering a survey makes it easy to collect data from a large number of people.
Torts III01:26

Torts III

Types of Quasi-intentional Torts in Healthcare
Quasi-intentional torts in healthcare involve acts where intent is not directed to harm an individual but results in harm due to careless or reckless speech.
Torts II01:13

Torts II

Intentional torts in healthcare refer to deliberate actions that cause harm or infringe on the rights of others. Understanding these torts is crucial for healthcare professionals to avoid legal liabilities and maintain ethical standards in patient care.
Survey Safety01:28

Survey Safety

Surveying near highways, rough terrain, or power lines involves significant risks. Working along highways is particularly dangerous and requires the use of warning signs and flagmen. It is safest to avoid working directly on roads and use offsets whenever possible. When highway work is unavoidable, it must follow all safety guidelines. Surveyors should wear bright clothing, such as orange reflective vests, to ensure visibility to motorists, coworkers, and hunters. In construction zones, wearing...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Toward stakeholders' understanding of media reporting on doctor-patient relationship issues: trust, unfamiliarity and uncertainty in the Chinese context.

Frontiers in public health·2024
Same author

Large Language Model Enhanced Logic Tensor Network for Stance Detection.

Neural networks : the official journal of the International Neural Network Society·2024
Same author

Correction to "A Double Network Composite Hydrogel with Self-Regulating Cu<sup>2+</sup>/Luteolin Release and Mechanical Modulation for Enhanced Wound Healing".

ACS nano·2024
Same author

Comparison of robotic-assisted and laparoscopic partial nephrectomy based on the PADUA score and the predictive value of the PADUA score and the Mayo Adhesive Probability score for postoperative complications: a single-center retrospective study.

Journal of cancer research and clinical oncology·2024
Same author

LSD1 Demethylates and Destabilizes Autophagy Protein LC3B in Ovarian Cancer.

Biomolecules·2024
Same author

An Innovative Neighbor Attention Mechanism Based on Coordinates for the Recognition of Facial Expressions.

Sensors (Basel, Switzerland)·2024
Same journal

Surface-ligand-triggered synthetic control of defects in nanocrystals toward high-efficiency blue electroluminescence.

Innovation (Cambridge (Mass.))·2026
Same journal

Satellite radar and AIS reveal a 97% decline in shipping traffic through the Strait of Hormuz.

Innovation (Cambridge (Mass.))·2026
Same journal

Hallmarks of health: A Chinese medicine perspective.

Innovation (Cambridge (Mass.))·2026
Same journal

HBV-driven expansion of CXCR6<sup>+</sup>-exhausted T cells and CXCL16<sup>+</sup> macrophage interaction: Implications for immunotherapy in HCC.

Innovation (Cambridge (Mass.))·2026
Same journal

Making the invisible audible: Soft biodegradable implants redefine deep-tissue sensing.

Innovation (Cambridge (Mass.))·2026
Same journal

Dynamic controls on subsurface water chemistry and habitability on icy moons.

Innovation (Cambridge (Mass.))·2026
See all related articles

Related Experiment Video

Updated: Jun 9, 2026

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning
10:39

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Published on: August 29, 2025

A survey on LLM-as-a-judge.

Jiawei Gu1,2, Xuhui Jiang1,3, Zhichao Shi1,3,4

  • 1IDEA Research, Shenzhen, China.

Innovation (Cambridge (Mass.))
|June 8, 2026
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) can now act as judges for complex tasks, offering scalable evaluations. This paper surveys LLM-as-a-judge systems, focusing on building reliable and trustworthy AI evaluation methods.

Keywords:
LLM-as-a-judgeautomated evaluationlarge language modelsreliability assessmenttrustworthy AI

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: Jun 9, 2026

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning
10:39

Qualitative and Quantitative Validation of Tools with Rating Scales Aimed at Assessing the Quality of University Service-Learning

Published on: August 29, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

  • Artificial Intelligence
  • Natural Language Processing
  • Machine Learning

Background:

  • Accurate evaluation is vital but challenged by subjectivity and scale.
  • Large Language Models (LLMs) show promise for automated evaluation.
  • Ensuring the reliability of LLM-as-a-judge systems is a key challenge.

Purpose of the Study:

  • To provide a comprehensive survey of LLM-as-a-judge systems.
  • To address the core question of building reliable LLM-as-a-judge systems.
  • To offer a unified framework and research agenda for trustworthy AI evaluation.

Main Methods:

  • Formal definition and detailed classification of LLM-as-a-judge.
  • Exploration of strategies to enhance LLM evaluation reliability (consistency, bias mitigation).
  • Development of methodologies and a novel benchmark for evaluating LLM reliability.

Main Results:

  • A structured survey and unified framework for LLM-as-a-judge.
  • Proposed strategies and methodologies for improving LLM evaluation reliability.
  • A novel benchmark for assessing the trustworthiness of LLM-based evaluations.

Conclusions:

  • LLM-as-a-judge offers a scalable alternative to traditional evaluations.
  • Careful design and standardization are crucial for reliable LLM-based assessment.
  • This work provides theoretical foundations and practical guidance for developing trustworthy LLM evaluators.