Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Meta-lens-based circularly polarized fixed-beam antenna for high-gain wideband millimeter-wave applications.

Scientific reports·2026

Same author

Wafer-scale uniform non-ferroelectric κ-phase In<sub>2</sub>Se<sub>3</sub> transistors.

Nature communications·2026

Same author

LADNET: An MRI-based deep learning approach for Alzheimer's disease detection.

Computers in biology and medicine·2026

Same author

Electromagnetic Performance Characterization and Circuit-Level Modeling of a Miniaturized Meander-Line Antenna for Implantable and Wearable RFID Applications.

Sensors (Basel, Switzerland)·2026

Same author

MiR-18a-5p Attenuates Oxidative Stress and Inhibits Lipid Accumulation in Alcoholic Fatty Liver by Activating the CYP1A1-PPAR Axis.

Immunity, inflammation and disease·2026

Same author

Expanding the Toolbox of Carbohydrate Recognizing Moieties: Demonstrating the Interaction of Carbohydrates with Surface-Immobilized RB221 and Translating It into a Tapered Optical Fiber Cavity Ring-Down Sensor.

Langmuir : the ACS journal of surfaces and colloids·2026

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 27, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint.

Ubaid Ullah¹, Jeong-Sik Lee¹, Chang-Hyeon An¹

¹Intelligent Computer Vision Software Laboratory (ICVSLab), Department of Electronic Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan 38541, Gyeongbuk, Korea.

Sensors (Basel, Switzerland)

|September 23, 2022

Summary

This summary is machine-generated.

This review explores text-guided visual output (T2Vo), expanding beyond text-to-image synthesis. It proposes a new taxonomy to identify research gaps and guide future advancements in multimodal AI.

Keywords:

Text-to-Image Text-to-Visual computer vision neural networks

More Related Videos

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

Related Experiment Videos

Last Updated: Aug 27, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

Area of Science:

Artificial Intelligence
Computer Vision
Machine Learning

Background:

Cross-domain data correlation, particularly between text and visual data, is crucial for advancing machine capabilities.
Neural networks have shown significant promise in processing diverse data types, including natural language with images and videos.
Current research often focuses on Generative Adversarial Networks (GANs) for text-to-image synthesis, potentially limiting the field's scope.

Purpose of the Study:

To provide a comprehensive review of text-guided visual output (T2Vo) beyond traditional text-to-image synthesis.
To propose a novel taxonomy for categorizing T2Vo methods and identify existing research gaps.
To analyze state-of-the-art models and suggest future research directions in multimodal AI.

Main Methods:

Systematic literature review of top-tier computer vision and related fields (machine learning, human-computer interaction).
Critical examination and comparative analysis of existing text-guided visual output models.
Development of a comprehensive taxonomy for categorizing T2Vo approaches.

Main Results:

Identified limitations in current research, particularly the over-reliance on GANs for text-to-image synthesis.
Proposed a broader categorization of text-guided visual output, encompassing diverse visual modalities.
Highlighted shortcomings of existing methods and provided a structured overview of the field.

Conclusions:

The field of text-guided visual output requires a more comprehensive framework beyond current text-to-image synthesis limitations.
A proposed taxonomy can better guide future research and development in multimodal AI.
Further investigation into diverse generative models and visual outputs is essential for advancing the field.