Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

BIBO stability of continuous and discrete -time systems

BIBO stability of continuous and discrete -time systems

System stability is a fundamental concept in signal processing, often assessed using convolution. For a system to be considered bounded-input bounded-output (BIBO) stable, any bounded input signal must produce a bounded output signal. A bounded input signal is one where the modulus does not exceed a certain constant at any point in time.
To determine the BIBO stability, the convolution integral is utilized when a bounded continuous-time input is applied to a Linear Time-Invariant (LTI) system....

Optimal Foraging

Optimal Foraging

How animals obtain and eat their food is called foraging behavior. Foraging can include searching for plants and hunting for prey and depends on the species and environment.

Feedback control systems

Feedback control systems

Feedback control systems are categorized in various ways based on their design, analysis, and signal types.
Linear feedback systems are theoretical models that simplify analysis and design. These systems operate under the principle that their output is directly proportional to their input within certain ranges. For instance, an amplifier in a control system behaves linearly as long as the input signal remains within a specific range. However, most physical systems exhibit inherent nonlinearity...

Linear time-invariant Systems

Linear time-invariant Systems

A system is linear if it displays the characteristics of homogeneity and additivity, together termed the superposition property. This principle is fundamental in all linear systems. Linear time-invariant (LTI) systems include systems with linear elements and constant parameters.
The input-output behavior of an LTI system can be fully defined by its response to an impulsive excitation at its input. Once this impulse response is known, the system's reaction to any other input can be...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Hybrid Event-Triggered Tracking Control With Critic Learning for Nonlinear Networked Systems.

IEEE transactions on cybernetics·2026

Same author

Tacit mechanism: Bridging pre-training of individuality to multi-agent adversarial coordination.

Neural networks : the official journal of the International Neural Network Society·2025

Same author

Balancing State Exploration and Skill Diversity in Unsupervised Skill Discovery.

IEEE transactions on cybernetics·2025

Same author

Last-Iterate Convergence to Approximate Nash Equilibria in Multiplayer Imperfect Information Games.

IEEE transactions on neural networks and learning systems·2025

Same author

Meta Learning Task Representation in Multiagent Reinforcement Learning: From Global Inference to Local Inference.

IEEE transactions on neural networks and learning systems·2025

Same author

Plinabulin exerts an anti-proliferative effect via the PI3K/AKT/mTOR signaling pathways in glioblastoma.

Iranian journal of basic medical sciences·2025

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Videos

MEC--a near-optimal online reinforcement learning algorithm for continuous deterministic systems.

Dongbin Zhao, Yuanheng Zhu

IEEE Transactions on Neural Networks and Learning Systems

|December 5, 2014

Summary

This summary is machine-generated.

This study introduces a novel probably approximately correct (PAC) algorithm for continuous deterministic systems, offering efficient exploration and near-optimal policies without system dynamics knowledge. The algorithm demonstrates superior performance and reduced complexity compared to existing PAC methods.

Related Experiment Videos

Area of Science:

Machine Learning
Control Theory
Reinforcement Learning

Background:

Continuous deterministic systems pose challenges for learning optimal control policies.
Existing methods often require system dynamics knowledge or are computationally intensive.
Efficient exploration and sample utilization are critical for effective learning in these systems.

Purpose of the Study:

To propose the first probably approximately correct (PAC) algorithm for continuous deterministic systems that does not require prior system dynamics knowledge.
To develop an algorithm that efficiently utilizes online observed samples and balances exploration-exploitation.
To achieve near-optimal policies within a PAC framework with provable performance bounds.

Main Methods:

State aggregation using a grid to partition the continuous state space.
Definition of a near-upper Q operator for generating a near-upper Q function within each state cell.
Implementation of a greedy policy that balances exploration and exploitation.
Rigorous analysis to establish polynomial time bounds for non-optimal actions.

Main Results:

The proposed algorithm achieves a polynomial time bound for executing non-optimal actions.
The algorithm converges to a near-optimal policy in finite steps under the PAC framework.
The implementation requires no system dynamics knowledge and exhibits lower computational complexity.
Simulation studies indicate superior performance compared to other similar PAC algorithms.

Conclusions:

The developed PAC algorithm offers an effective and efficient approach for learning control policies in continuous deterministic systems.
The method's independence from system dynamics and reduced complexity make it broadly applicable.
The algorithm provides a strong theoretical guarantee of convergence to near-optimal solutions.