Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Effects of feedback01:24

Effects of feedback

496
Feedback in control systems plays a critical role in shaping various operational parameters, extending beyond simple error reduction to influence stability, bandwidth, gain, impedance, and sensitivity. Understanding these effects requires examining a basic feedback system characterized by defined input, output, error, and feedback signals.
Feedback significantly modifies the gain of a control system. The gain of a system without feedback is altered by a factor of one plus GH, where G represents...
496
Load-frequency control01:28

Load-frequency control

106
Load-frequency control (LFC) is vital for maintaining power system stability, ensuring that frequency and power flows remain within acceptable limits during load changes. Turbine-governor control eliminates rotor accelerations and decelerations following load changes. However, a steady-state frequency error persists when the change in the turbine-governor reference setting is zero. In an interconnected power system, each area agrees to export or import a scheduled amount of power through...
106
Reinforcement01:23

Reinforcement

169
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
169
Feedback control systems01:26

Feedback control systems

262
Feedback control systems are categorized in various ways based on their design, analysis, and signal types.
Linear feedback systems are theoretical models that simplify analysis and design. These systems operate under the principle that their output is directly proportional to their input within certain ranges. For instance, an amplifier in a control system behaves linearly as long as the input signal remains within a specific range. However, most physical systems exhibit inherent nonlinearity...
262
Confirmation Biases01:31

Confirmation Biases

5.4K
The confirmation bias is the tendency to focus on information that confirms our existing beliefs and ignore information that is inconsistent with our expectations. For example, if you think that your professor is not very nice, you notice all of the instances of rude behavior exhibited by the professor while ignoring the countless pleasant interactions he is involved in on a daily basis. Have you ever fallen prey to the confirmation bias, either as the source or target of such bias?
5.4K
Law of Effect01:06

Law of Effect

1.3K
B.F. Skinner, a prominent figure in behavioral psychology, introduced operant conditioning by emphasizing the role of consequences in shaping behavior. This theory builds upon the law of effect proposed by Edward Thorndike, which posits that behaviors followed by satisfying outcomes are likely to be repeated. In contrast, those followed by unsatisfying outcomes are less likely to recur.
Edward Thorndike's foundational work involved studying learning in animals, particularly using puzzle...
1.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Network and Factor Structure of Depression and Anxiety Symptoms in Telemental Healthcare Patients From Bangladesh: Evidence for Precision Mental Healthcare.

Depression and anxiety·2026
Same author

Draft genome sequence of <i>Pseudomonas aeruginosa</i> SAU_MI_1F1 isolated from feces of cattle in Dhaka, Bangladesh.

Microbiology resource announcements·2026
Same author

Integrated in silico and in vitro assessment of Azadirachta indica leaf extract against multi-drug resistant Citrobacter koseri and Staphylococcus saprophyticus.

Scientific reports·2026
Same author

Early feasibility of telemedicine-based mental health wellbeing centers: an implementation study in district and sub-district health facilities in Bangladesh.

BMC health services research·2026
Same author

Tele-mental health for frail older adults in rural Bangladesh: a phenomenological study.

BMC psychology·2026
Same author

Draft genome sequence of <i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Typhimurium SBI_US10_MRI_BD isolated from broiler chicken in Bangladesh.

Microbiology resource announcements·2026
Same journal

Clinical crown height changes in mandibular anterior teeth retained with two types of fixed retainers over two years: findings from a randomized clinical trial.

Scientific reports·2026
Same journal

Rethinking water governance through indigenous systems: A comparative assessment of qanat and well irrigation productivity in Sabzevar County, Iran.

Scientific reports·2026
Same journal

Distributed Nash equilibrium seeking for second-order systems with finite/fixed-time convergence in the absence of velocity measurement.

Scientific reports·2026
Same journal

Determinants of pregnancy termination among ever-married women of reproductive age in Bangladesh.

Scientific reports·2026
Same journal

Occurrence and human health risk assessment of organochlorine pesticides in irrigated and non-irrigated agricultural soils of Wondogenet District, Ethiopia.

Scientific reports·2026
Same journal

High angular resolution diffusion imaging of neurodevelopment in children through data creation with deep learning.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474

A framework for mitigating malicious RLHF feedback in LLM training using consensus based reward.

Zafaryab Haider1, Md Hafizur Rahman2, Vijay Devabhaktuni3

  • 1Department of Electrical and Computer Engineering (ECE), University of Maine, Orono, ME, USA. zafaryab.haider@maine.edu.

Scientific Reports
|March 18, 2025
PubMed
Summary
This summary is machine-generated.

A new framework called COBRA addresses security risks in training Large Language Models (LLMs) using Reinforcement Learning from Human Feedback (RLHF). COBRA effectively filters out malicious human feedback, improving LLM performance and safety in real-world applications.

Keywords:
Reinforcement learning via human feedbackSecure artificial intelligenceTrustworthy large language models

More Related Videos

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
08:18

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

4.9K
A Protocol for the Administration of Real-Time fMRI Neurofeedback Training
07:05

A Protocol for the Administration of Real-Time fMRI Neurofeedback Training

Published on: August 24, 2017

10.9K

Related Experiment Videos

Last Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474
WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
08:18

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

4.9K
A Protocol for the Administration of Real-Time fMRI Neurofeedback Training
07:05

A Protocol for the Administration of Real-Time fMRI Neurofeedback Training

Published on: August 24, 2017

10.9K

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Natural Language Processing

Background:

  • Large Language Models (LLMs) are increasingly adopted across industries, but face security and privacy challenges.
  • Reinforcement Learning from Human Feedback (RLHF) is crucial for LLM training, imparting human-like qualities.
  • The RLHF process is vulnerable to malicious feedback, potentially degrading LLM performance and causing harmful outputs.

Purpose of the Study:

  • To propose a novel framework, COBRA (COnsensus-Based RewArd), to mitigate malicious feedback in RLHF.
  • To enhance LLM training performance and robustness in mixed-trust environments.
  • To validate COBRA's effectiveness against state-of-the-art methods.

Main Methods:

  • Developed the COBRA framework, a consensus-based technique for filtering noisy human feedback during RLHF.
  • Evaluated COBRA on Sentiment Analysis and Conversational Task use cases using various LLM models (e.g., GPT-2 XL).
  • Compared COBRA's performance against standard RLHF and a prior method (Coste et al.).

Main Results:

  • COBRA significantly improved LLM performance, outperforming unprotected reward generation by [Formula: see text] for conversational tasks and [Formula: see text] for sentiment analysis.
  • Quantitative comparisons showed COBRA achieved state-of-the-art performance, especially with fewer reward models.
  • COBRA demonstrated increased reward accuracy ([Formula: see text]) at a lower number of reward models ([Formula: see text]).

Conclusions:

  • COBRA effectively neutralizes malicious feedback in RLHF, enhancing LLM training outcomes.
  • The proposed framework offers a robust solution for secure and reliable LLM development in critical applications.
  • COBRA presents a significant advancement in ensuring the integrity and quality of LLM training data.