Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Visual Agnosia

Visual Agnosia

Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation.

Sensors (Basel, Switzerland)·2021

Same author

Hybrid Imitation Learning Framework for Robotic Manipulation Tasks.

Sensors (Basel, Switzerland)·2021

Same author

Vision-Language-Knowledge Co-Embedding for Visual Commonsense Reasoning.

Sensors (Basel, Switzerland)·2021

Same author

NMN-VD: A Neural Module Network for Visual Dialog.

Sensors (Basel, Switzerland)·2021

Same author

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals.

Sensors (Basel, Switzerland)·2019

Same author

A Robotic Context Query-Processing Framework Based on Spatio-Temporal Context Ontology.

Sensors (Basel, Switzerland)·2018

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 18, 2025

Modeling the Functional Network for Spatial Navigation in the Human Brain

Modeling the Functional Network for Spatial Navigation in the Human Brain

Published on: October 13, 2023

Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation.

Jisu Hwang¹, Incheol Kim¹

¹Department of Computer Science, Kyonggi University, Suwon-si 16227, Korea.

Sensors (Basel, Switzerland)

|February 5, 2021

Summary

This summary is machine-generated.

This study introduces JMEBS, a novel deep neural network for vision-and-language navigation (VLN). It enhances navigation success rates and path optimization using joint multimodal embedding and backtracking search.

Keywords:

backtracking-enabled greedy local search deep neural network multimodal embedding natural language instruction panoramic image pretrained model three-dimensional simulated indoor environment vision-and-language navigation task

More Related Videos

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Related Experiment Videos

Last Updated: Nov 18, 2025

Modeling the Functional Network for Spatial Navigation in the Human Brain

Modeling the Functional Network for Spatial Navigation in the Human Brain

Published on: October 13, 2023

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

Published on: January 18, 2020

Area of Science:

Artificial Intelligence
Computer Vision
Natural Language Processing

Background:

Multimodal intelligent tasks integrating vision and language are gaining prominence.
Vision-and-language navigation (VLN) requires aligning and grounding image and text data for real-time task status perception.

Purpose of the Study:

To propose a novel deep neural network model, JMEBS, for enhanced performance in vision-and-language navigation tasks.
To improve task success rates and optimize navigation paths through advanced embedding and search algorithms.

Main Methods:

Developed a transformer-based joint multimodal embedding module for JMEBS, utilizing both multimodal and temporal contexts.
Implemented backtracking-enabled greedy local search (BGLS) with a novel global scoring method for action selection and trajectory evaluation.
Evaluated the model using the Matterport3D Simulator and room-to-room (R2R) benchmark datasets.

Main Results:

The JMEBS model demonstrated improved task success rates and optimized navigation paths compared to existing models.
The novel global scoring method effectively improved performance by comparing partial trajectories with natural language instructions.
Experimental results validated the model's effectiveness across various operations on benchmark datasets.

Conclusions:

The proposed JMEBS model offers a significant advancement in vision-and-language navigation.
The integration of joint multimodal embedding and backtracking search effectively addresses key challenges in VLN.
JMEBS provides a robust framework for future research in embodied AI and multimodal understanding.