Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Effects of feedback

Effects of feedback

Feedback in control systems plays a critical role in shaping various operational parameters, extending beyond simple error reduction to influence stability, bandwidth, gain, impedance, and sensitivity. Understanding these effects requires examining a basic feedback system characterized by defined input, output, error, and feedback signals.
Feedback significantly modifies the gain of a control system. The gain of a system without feedback is altered by a factor of one plus GH, where G represents...

Load-frequency control

Load-frequency control

Load-frequency control (LFC) is vital for maintaining power system stability, ensuring that frequency and power flows remain within acceptable limits during load changes. Turbine-governor control eliminates rotor accelerations and decelerations following load changes. However, a steady-state frequency error persists when the change in the turbine-governor reference setting is zero. In an interconnected power system, each area agrees to export or import a scheduled amount of power through...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Feedback control systems

Feedback control systems

Feedback control systems are categorized in various ways based on their design, analysis, and signal types.
Linear feedback systems are theoretical models that simplify analysis and design. These systems operate under the principle that their output is directly proportional to their input within certain ranges. For instance, an amplifier in a control system behaves linearly as long as the input signal remains within a specific range. However, most physical systems exhibit inherent nonlinearity...

Confirmation Biases

Confirmation Biases

The confirmation bias is the tendency to focus on information that confirms our existing beliefs and ignore information that is inconsistent with our expectations. For example, if you think that your professor is not very nice, you notice all of the instances of rude behavior exhibited by the professor while ignoring the countless pleasant interactions he is involved in on a daily basis. Have you ever fallen prey to the confirmation bias, either as the source or target of such bias?

Law of Effect

Law of Effect

B.F. Skinner, a prominent figure in behavioral psychology, introduced operant conditioning by emphasizing the role of consequences in shaping behavior. This theory builds upon the law of effect proposed by Edward Thorndike, which posits that behaviors followed by satisfying outcomes are likely to be repeated. In contrast, those followed by unsatisfying outcomes are less likely to recur.
Edward Thorndike's foundational work involved studying learning in animals, particularly using puzzle...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Network and Factor Structure of Depression and Anxiety Symptoms in Telemental Healthcare Patients From Bangladesh: Evidence for Precision Mental Healthcare.

Depression and anxiety·2026

Same author

Draft genome sequence of <i>Pseudomonas aeruginosa</i> SAU_MI_1F1 isolated from feces of cattle in Dhaka, Bangladesh.

Microbiology resource announcements·2026

Same author

Integrated in silico and in vitro assessment of Azadirachta indica leaf extract against multi-drug resistant Citrobacter koseri and Staphylococcus saprophyticus.

Scientific reports·2026

Same author

Early feasibility of telemedicine-based mental health wellbeing centers: an implementation study in district and sub-district health facilities in Bangladesh.

BMC health services research·2026

Same author

Tele-mental health for frail older adults in rural Bangladesh: a phenomenological study.

BMC psychology·2026

Same author

Draft genome sequence of <i>Salmonella enterica</i> subsp. <i>enterica</i> serovar Typhimurium SBI_US10_MRI_BD isolated from broiler chicken in Bangladesh.

Microbiology resource announcements·2026

Same journal

Clinical crown height changes in mandibular anterior teeth retained with two types of fixed retainers over two years: findings from a randomized clinical trial.

Scientific reports·2026

Same journal

Rethinking water governance through indigenous systems: A comparative assessment of qanat and well irrigation productivity in Sabzevar County, Iran.

Scientific reports·2026

Same journal

Distributed Nash equilibrium seeking for second-order systems with finite/fixed-time convergence in the absence of velocity measurement.

Scientific reports·2026

Same journal

Determinants of pregnancy termination among ever-married women of reproductive age in Bangladesh.

Scientific reports·2026

Same journal

Occurrence and human health risk assessment of organochlorine pesticides in irrigated and non-irrigated agricultural soils of Wondogenet District, Ethiopia.

Scientific reports·2026

Same journal

High angular resolution diffusion imaging of neurodevelopment in children through data creation with deep learning.

Scientific reports·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

A framework for mitigating malicious RLHF feedback in LLM training using consensus based reward.

Zafaryab Haider¹, Md Hafizur Rahman², Vijay Devabhaktuni³

¹Department of Electrical and Computer Engineering (ECE), University of Maine, Orono, ME, USA. zafaryab.haider@maine.edu.

Scientific Reports

|March 18, 2025

Summary

This summary is machine-generated.

A new framework called COBRA addresses security risks in training Large Language Models (LLMs) using Reinforcement Learning from Human Feedback (RLHF). COBRA effectively filters out malicious human feedback, improving LLM performance and safety in real-world applications.

Keywords:

Reinforcement learning via human feedback Secure artificial intelligence Trustworthy large language models

More Related Videos

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

A Protocol for the Administration of Real-Time fMRI Neurofeedback Training

A Protocol for the Administration of Real-Time fMRI Neurofeedback Training

Published on: August 24, 2017

Related Experiment Videos

Last Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

A Protocol for the Administration of Real-Time fMRI Neurofeedback Training

A Protocol for the Administration of Real-Time fMRI Neurofeedback Training

Published on: August 24, 2017

Area of Science:

Artificial Intelligence
Machine Learning
Natural Language Processing

Background:

Large Language Models (LLMs) are increasingly adopted across industries, but face security and privacy challenges.
Reinforcement Learning from Human Feedback (RLHF) is crucial for LLM training, imparting human-like qualities.
The RLHF process is vulnerable to malicious feedback, potentially degrading LLM performance and causing harmful outputs.

Purpose of the Study:

To propose a novel framework, COBRA (COnsensus-Based RewArd), to mitigate malicious feedback in RLHF.
To enhance LLM training performance and robustness in mixed-trust environments.
To validate COBRA's effectiveness against state-of-the-art methods.

Main Methods:

Developed the COBRA framework, a consensus-based technique for filtering noisy human feedback during RLHF.
Evaluated COBRA on Sentiment Analysis and Conversational Task use cases using various LLM models (e.g., GPT-2 XL).
Compared COBRA's performance against standard RLHF and a prior method (Coste et al.).

Main Results:

COBRA significantly improved LLM performance, outperforming unprotected reward generation by [Formula: see text] for conversational tasks and [Formula: see text] for sentiment analysis.
Quantitative comparisons showed COBRA achieved state-of-the-art performance, especially with fewer reward models.
COBRA demonstrated increased reward accuracy ([Formula: see text]) at a lower number of reward models ([Formula: see text]).

Conclusions:

COBRA effectively neutralizes malicious feedback in RLHF, enhancing LLM training outcomes.
The proposed framework offers a robust solution for secure and reliable LLM development in critical applications.
COBRA presents a significant advancement in ensuring the integrity and quality of LLM training data.