Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Visual Agnosia01:12

Visual Agnosia

210
Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...
210
Prosopagnosia01:24

Prosopagnosia

181
Prosopagnosia, also known as face blindness, is the inability to recognize faces. In severe cases, individuals with prosopagnosia may not recognize close family members, including parents and spouses, by their faces. For instance, someone with prosopagnosia might walk past their child in a crowd, only realizing their mistake upon noticing their child's distinctive backpack or favorite jacket. Prosopagnosia specifically impairs facial recognition, while the recognition of other objects or...
181
Visual System01:26

Visual System

591
Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...
591
Vision01:24

Vision

53.5K
Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
53.5K
Stereotype Content Model02:16

Stereotype Content Model

14.7K
The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...
14.7K
Photoreceptors and Visual Pathways01:22

Photoreceptors and Visual Pathways

6.1K
At the molecular level, visual signals trigger transformations in photopigment molecules, resulting in changes in the photoreceptor cell's membrane potential. The photon's energy level is denoted by its wavelength, with each specific wavelength of visible light associated with a distinct color. The spectral range of visible light, classified as electromagnetic radiation, spans from 380 to 720 nm. Electromagnetic radiation wavelengths exceeding 720 nm fall under the infrared category,...
6.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Single-Pixel Near-Infrared 3D Image Reconstruction in Outdoor Conditions.

Micromachines·2022
Same author

Towards Autonomous Drone Racing without GPU Using an OAK-D Smart Camera.

Sensors (Basel, Switzerland)·2021
Same author

A Review on Auditory Perception for Unmanned Aerial Vehicles.

Sensors (Basel, Switzerland)·2020
Same author

DeepPilot: A CNN for Autonomous Drone Racing.

Sensors (Basel, Switzerland)·2020
Same author

A Monocular SLAM-based Controller for Multirotors with Sensor Faults under Ground Effect.

Sensors (Basel, Switzerland)·2019
Same author

On the Use of the AIRA-UAS Corpus to Evaluate Audio Processing Algorithms in Unmanned Aerial Systems.

Sensors (Basel, Switzerland)·2019
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Jul 11, 2025

Generating Strictly Controlled Stimuli for Figure Recognition Experiments
05:39

Generating Strictly Controlled Stimuli for Figure Recognition Experiments

Published on: March 18, 2019

5.3K

A Study on Generative Models for Visual Recognition of Unknown Scenes Using a Textual Description.

Jose Martinez-Carranza1, Delia Irazú Hernández-Farías1, Victoria Eugenia Vazquez-Meza1

  • 1Department of Computational Science, Instituto Nacional de Astrofisica, Optica y Electronica (INAOE), Puebla 72840, Mexico.

Sensors (Basel, Switzerland)
|November 14, 2023
PubMed
Summary
This summary is machine-generated.

Generative models and multi-modal embeddings help artificial agents like delivery drones visualize unknown locations from text descriptions. This technology enhances scene recognition for robots navigating unfamiliar environments.

Keywords:
CLIPdiffusion modelgenerative modelstextual descriptionsvisual scene recognitionvisualBERT

More Related Videos

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K
Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
07:36

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

15.7K

Related Experiment Videos

Last Updated: Jul 11, 2025

Generating Strictly Controlled Stimuli for Figure Recognition Experiments
05:39

Generating Strictly Controlled Stimuli for Figure Recognition Experiments

Published on: March 18, 2019

5.3K
Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.0K
Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
07:36

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

15.7K

Area of Science:

  • Artificial Intelligence
  • Computer Vision
  • Robotics

Background:

  • Artificial agents require robust methods for navigating and recognizing unfamiliar environments.
  • Current methods often rely on pre-existing maps or direct visual input, limiting their utility in novel situations.

Purpose of the Study:

  • To investigate the use of generative models and multi-modal embedding representations to enable artificial agents to visualize unfamiliar destinations from textual descriptions.
  • To assess the effectiveness of combining image generation, text generation, and text enhancement strategies for scene recognition.

Main Methods:

  • Utilized generative models like Stable Diffusion for image creation from text.
  • Employed embedding representations such as CLIP and VisualBERT for comparing generated and real-world scene images.
  • Implemented text enhancement techniques, including ChatGPT, to refine textual descriptions for evaluation.

Main Results:

  • Demonstrated the capability of generative models to produce relevant visual representations from textual scene descriptions.
  • Showcased the synergy between generative tools and multi-modal embeddings in improving scene recognition accuracy for artificial agents.
  • Validated the effectiveness of text enhancement in creating concise and informative descriptions for robot navigation.

Conclusions:

  • Combining generative models with multi-modal embeddings significantly enhances an artificial agent's ability to recognize unknown scenes based on text.
  • This approach offers practical solutions for autonomous systems, particularly in drone parcel delivery and service robots operating in unmapped areas.
  • Future applications include enabling robots to navigate and interact with environments using only textual guidance.