Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Visual Agnosia

Visual Agnosia

Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...

Prosopagnosia

Prosopagnosia

Prosopagnosia, also known as face blindness, is the inability to recognize faces. In severe cases, individuals with prosopagnosia may not recognize close family members, including parents and spouses, by their faces. For instance, someone with prosopagnosia might walk past their child in a crowd, only realizing their mistake upon noticing their child's distinctive backpack or favorite jacket. Prosopagnosia specifically impairs facial recognition, while the recognition of other objects or...

Visual System

Visual System

Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Stereotype Content Model

Stereotype Content Model

The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...

Photoreceptors and Visual Pathways

Photoreceptors and Visual Pathways

At the molecular level, visual signals trigger transformations in photopigment molecules, resulting in changes in the photoreceptor cell's membrane potential. The photon's energy level is denoted by its wavelength, with each specific wavelength of visible light associated with a distinct color. The spectral range of visible light, classified as electromagnetic radiation, spans from 380 to 720 nm. Electromagnetic radiation wavelengths exceeding 720 nm fall under the infrared category,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Single-Pixel Near-Infrared 3D Image Reconstruction in Outdoor Conditions.

Micromachines·2022

Same author

Towards Autonomous Drone Racing without GPU Using an OAK-D Smart Camera.

Sensors (Basel, Switzerland)·2021

Same author

A Review on Auditory Perception for Unmanned Aerial Vehicles.

Sensors (Basel, Switzerland)·2020

Same author

DeepPilot: A CNN for Autonomous Drone Racing.

Sensors (Basel, Switzerland)·2020

Same author

A Monocular SLAM-based Controller for Multirotors with Sensor Faults under Ground Effect.

Sensors (Basel, Switzerland)·2019

Same author

On the Use of the AIRA-UAS Corpus to Evaluate Audio Processing Algorithms in Unmanned Aerial Systems.

Sensors (Basel, Switzerland)·2019

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 11, 2025

Generating Strictly Controlled Stimuli for Figure Recognition Experiments

Generating Strictly Controlled Stimuli for Figure Recognition Experiments

Published on: March 18, 2019

A Study on Generative Models for Visual Recognition of Unknown Scenes Using a Textual Description.

Jose Martinez-Carranza¹, Delia Irazú Hernández-Farías¹, Victoria Eugenia Vazquez-Meza¹

¹Department of Computational Science, Instituto Nacional de Astrofisica, Optica y Electronica (INAOE), Puebla 72840, Mexico.

Sensors (Basel, Switzerland)

|November 14, 2023

Summary

This summary is machine-generated.

Generative models and multi-modal embeddings help artificial agents like delivery drones visualize unknown locations from text descriptions. This technology enhances scene recognition for robots navigating unfamiliar environments.

Keywords:

CLIP diffusion model generative models textual descriptions visual scene recognition visualBERT

More Related Videos

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Related Experiment Videos

Last Updated: Jul 11, 2025

Generating Strictly Controlled Stimuli for Figure Recognition Experiments

Generating Strictly Controlled Stimuli for Figure Recognition Experiments

Published on: March 18, 2019

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Area of Science:

Artificial Intelligence
Computer Vision
Robotics

Background:

Artificial agents require robust methods for navigating and recognizing unfamiliar environments.
Current methods often rely on pre-existing maps or direct visual input, limiting their utility in novel situations.

Purpose of the Study:

To investigate the use of generative models and multi-modal embedding representations to enable artificial agents to visualize unfamiliar destinations from textual descriptions.
To assess the effectiveness of combining image generation, text generation, and text enhancement strategies for scene recognition.

Main Methods:

Utilized generative models like Stable Diffusion for image creation from text.
Employed embedding representations such as CLIP and VisualBERT for comparing generated and real-world scene images.
Implemented text enhancement techniques, including ChatGPT, to refine textual descriptions for evaluation.

Main Results:

Demonstrated the capability of generative models to produce relevant visual representations from textual scene descriptions.
Showcased the synergy between generative tools and multi-modal embeddings in improving scene recognition accuracy for artificial agents.
Validated the effectiveness of text enhancement in creating concise and informative descriptions for robot navigation.

Conclusions:

Combining generative models with multi-modal embeddings significantly enhances an artificial agent's ability to recognize unknown scenes based on text.
This approach offers practical solutions for autonomous systems, particularly in drone parcel delivery and service robots operating in unmapped areas.
Future applications include enabling robots to navigate and interact with environments using only textual guidance.