Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Visual Agnosia

Visual Agnosia

Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Structural insights into cationic amino acid transport and viral receptor engagement by CAT1.

Nature communications·2025

Same author

Brown adipose tissue activation and cardiovascular risk following PD-1 antibody therapy in cancer patients: a retrospective cohort study.

European journal of medical research·2025

Same author

Whole-genome methylation profiling of extracellular vesicle DNA in gastric cancer identifies intercellular communication features.

Nature communications·2025

Same author

Structural basis for substrate recognition mechanism of human SLC26A7.

Nature communications·2025

Same journal

Granular Ball-Based Noise-Resistant Fuzzy Multineighborhood Feature Selection via Label Enhancement and Feature Graph.

IEEE transactions on neural networks and learning systems·2026

Same journal

Fighting Evolving Spam With ARTMAP Models: A Noise-Resilient Online Detection Framework.

IEEE transactions on neural networks and learning systems·2026

Same journal

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

Negation of Basic Belief Assignment in Multisource Information Fusion on Dempster-Shafer Theory With Applications in Pattern Classification.

IEEE transactions on neural networks and learning systems·2026

Same journal

Intervention Feasible Region and Driver Risk Capacity Aware Human-Machine Collaborative Safe Trajectory Planning.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Unified Differential Denoising Learning Framework With a Pre-Trained Model and Fuzzy Graph Networks for Drug-Drug Interaction Prediction.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 11, 2026

Practical Methodology of Cognitive Tasks Within a Navigational Assessment

Practical Methodology of Cognitive Tasks Within a Navigational Assessment

Published on: June 1, 2015

Unseen From Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language

Ziming Wei, Bingqian Lin, Yunshuang Nie

IEEE Transactions on Neural Networks and Learning Systems

|November 10, 2025

Summary

This summary is machine-generated.

Data scarcity in vision-language navigation (VLN) is addressed by Rewriting-driven AugMentation (RAM). RAM generates new training data by rewriting existing examples, improving agent generalization to unseen environments without simulators or extensive manual labor.

More Related Videos

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Published on: March 27, 2013

A Standardized Obstacle Course for Assessment of Visual Function in Ultra Low Vision and Artificial Vision

A Standardized Obstacle Course for Assessment of Visual Function in Ultra Low Vision and Artificial Vision

Published on: February 11, 2014

Related Experiment Videos

Last Updated: Jan 11, 2026

Practical Methodology of Cognitive Tasks Within a Navigational Assessment

Practical Methodology of Cognitive Tasks Within a Navigational Assessment

Published on: June 1, 2015

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Development of an Audio-based Virtual Gaming Environment to Assist with Navigation Skills in the Blind

Published on: March 27, 2013

A Standardized Obstacle Course for Assessment of Visual Function in Ultra Low Vision and Artificial Vision

A Standardized Obstacle Course for Assessment of Visual Function in Ultra Low Vision and Artificial Vision

Published on: February 11, 2014

Area of Science:

Artificial Intelligence
Robotics
Computer Vision

Background:

Data scarcity is a major limitation in vision-language navigation (VLN), hindering agent generalization to new environments.
Existing methods use simulator or web data, which have limited diversity or require significant manual cleaning.
This limits the ability of agents to navigate effectively in real-world, unseen scenarios.

Purpose of the Study:

To introduce a novel Rewriting-driven AugMentation (RAM) paradigm for VLN.
To overcome data scarcity by generating diverse, unseen observation-instruction pairs from existing data.
To improve the generalization capabilities of VLN agents in a simulator-free and labor-saving manner.

Main Methods:

Object-enriched observation rewriting using vision-language models (VLMs) and large language models (LLMs) to create diverse scene descriptions.
Text-to-image generation models (T2IMs) synthesize new observations based on rewritten descriptions.
Observation-contrast instruction rewriting uses LLMs to align new instructions with synthesized observations.
A mixing-then-focusing training strategy with random observation cropping enhances data diversity and reduces noise.

Main Results:

The RAM paradigm successfully generates novel observation-instruction pairs for VLN training.
Experiments demonstrate superior performance and enhanced generalization on discrete (R2R, REVERIE, R4R) and continuous (R2R-CE) VLN datasets.
The method effectively improves agent performance in unseen environments.

Conclusions:

Rewriting-driven AugMentation (RAM) is an effective approach to address data scarcity in VLN.
The proposed method offers a simulator-free and labor-saving solution for data augmentation.
RAM significantly improves the generalization ability of VLN agents, paving the way for more robust navigation systems.