Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Meta-lens-based circularly polarized fixed-beam antenna for high-gain wideband millimeter-wave applications.

Scientific reports·2026
Same author

Wafer-scale uniform non-ferroelectric κ-phase In<sub>2</sub>Se<sub>3</sub> transistors.

Nature communications·2026
Same author

LADNET: An MRI-based deep learning approach for Alzheimer's disease detection.

Computers in biology and medicine·2026
Same author

Electromagnetic Performance Characterization and Circuit-Level Modeling of a Miniaturized Meander-Line Antenna for Implantable and Wearable RFID Applications.

Sensors (Basel, Switzerland)·2026
Same author

MiR-18a-5p Attenuates Oxidative Stress and Inhibits Lipid Accumulation in Alcoholic Fatty Liver by Activating the CYP1A1-PPAR Axis.

Immunity, inflammation and disease·2026
Same author

Expanding the Toolbox of Carbohydrate Recognizing Moieties: Demonstrating the Interaction of Carbohydrates with Surface-Immobilized RB221 and Translating It into a Tapered Optical Fiber Cavity Ring-Down Sensor.

Langmuir : the ACS journal of surfaces and colloids·2026
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Aug 27, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
07:36

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

15.8K

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint.

Ubaid Ullah1, Jeong-Sik Lee1, Chang-Hyeon An1

  • 1Intelligent Computer Vision Software Laboratory (ICVSLab), Department of Electronic Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan 38541, Gyeongbuk, Korea.

Sensors (Basel, Switzerland)
|September 23, 2022
PubMed
Summary
This summary is machine-generated.

This review explores text-guided visual output (T2Vo), expanding beyond text-to-image synthesis. It proposes a new taxonomy to identify research gaps and guide future advancements in multimodal AI.

Keywords:
Text-to-ImageText-to-Visualcomputer visionneural networks

More Related Videos

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
09:27

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

10.1K
Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss
07:12

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

531

Related Experiment Videos

Last Updated: Aug 27, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects
07:36

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

15.8K
Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language
09:27

Using Eye Movements Recorded in the Visual World Paradigm to Explore the Online Processing of Spoken Language

Published on: October 13, 2018

10.1K
Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss
07:12

Development of a Gaze-Contingent Display Framework Designed for Perceptual and Oculomotor Research with Simulated Central Vision Loss

Published on: April 11, 2025

531

Area of Science:

  • Artificial Intelligence
  • Computer Vision
  • Machine Learning

Background:

  • Cross-domain data correlation, particularly between text and visual data, is crucial for advancing machine capabilities.
  • Neural networks have shown significant promise in processing diverse data types, including natural language with images and videos.
  • Current research often focuses on Generative Adversarial Networks (GANs) for text-to-image synthesis, potentially limiting the field's scope.

Purpose of the Study:

  • To provide a comprehensive review of text-guided visual output (T2Vo) beyond traditional text-to-image synthesis.
  • To propose a novel taxonomy for categorizing T2Vo methods and identify existing research gaps.
  • To analyze state-of-the-art models and suggest future research directions in multimodal AI.

Main Methods:

  • Systematic literature review of top-tier computer vision and related fields (machine learning, human-computer interaction).
  • Critical examination and comparative analysis of existing text-guided visual output models.
  • Development of a comprehensive taxonomy for categorizing T2Vo approaches.

Main Results:

  • Identified limitations in current research, particularly the over-reliance on GANs for text-to-image synthesis.
  • Proposed a broader categorization of text-guided visual output, encompassing diverse visual modalities.
  • Highlighted shortcomings of existing methods and provided a structured overview of the field.

Conclusions:

  • The field of text-guided visual output requires a more comprehensive framework beyond current text-to-image synthesis limitations.
  • A proposed taxonomy can better guide future research and development in multimodal AI.
  • Further investigation into diverse generative models and visual outputs is essential for advancing the field.