Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Three-Dimensional Force System:Problem Solving

Three-Dimensional Force System:Problem Solving

A three-dimensional force system refers to a scenario in which three forces act simultaneously in three different directions. This type of problem is commonly encountered in physics and engineering, where it is necessary to calculate the resultant force on the system, which can then be used to predict or analyze the behavior of the object or structure under consideration.
To solve a three-dimensional force system, first resolve each force into its respective scalar components. Do this using...

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

Inductive Reasoning

Inductive Reasoning

Inductive reasoning is a form of logical thinking that uses related observations to arrive at a general conclusion. It is uncertain and operates in degrees to which the conclusions are credible. As such, inductive arguments can be weak or strong, rather than valid or invalid, and conclusions can be used to formulate testable, falsifiable hypotheses.
Inductive reasoning is common in descriptive science. A life scientist makes observations and records them. This data can be qualitative or...

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Deductive Reasoning

Deductive Reasoning

Deductive reasoning, or deduction, is the type of logic used in hypothesis-based science. In deductive reasoning, the pattern of thinking moves in the opposite direction as compared to inductive reasoning, which means that it uses a general principle or law to predict specific results. From those general principles, a scientist can deduce and predict the specific results that would be valid as long as the general principles are valid.
For example, a researcher can deduce specific predictions...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Genomic reconstruction of upland cotton domestication uncovers staged selection, gene flow, and flowering-time adaptation.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Genome-wide identification and expression analysis of IDD gene family in Gossypium hirsutum L.

Functional & integrative genomics·2026

Same author

Engineering seed-specific gossypol-free cotton for human-safe consumption by harnessing the dominant-negative effect of the Gl<sub>2</sub><sup>e</sup> mutation.

Plant communications·2025

Same author

Genome-wide identification and functional annotation of the ascorbate peroxidase (APX) gene family in cotton: putative roles in reactive oxygen species (ROS) homeostasis within pigment glands.

Planta·2025

Same author

GluA2 palmitoylation by SELENOK modulates AMPAR assembly and synaptic plasticity in Alzheimer's disease.

Redox biology·2025

Same author

Two B-Box Proteins, GhBBX21 and GhBBX24, Antagonistically Modulate Anthocyanin Biosynthesis in <i>R1</i> Cotton.

Plants (Basel, Switzerland)·2025

Same journal

Style-Aware Contrastive Test-Time Adaptation: A Dual-Cache Model for Robust Vision-Language Alignment.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Semantic Frame Interpolation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Physics-Guided Cross-Modal Decoupling with Test-Time Adaptation for Hyperspectral Image Restoration.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 28, 2025

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering.

Hao Li, Jinfa Huang, Peng Jin

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|May 31, 2023

Summary

This summary is machine-generated.

This study enhances Text-based Visual Question Answering (TextVQA) by incorporating 3D geometric information for improved spatial reasoning between objects and text. The new approach significantly boosts performance on TextVQA and ST-VQA datasets.

More Related Videos

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

Published on: February 19, 2018

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Related Experiment Videos

Last Updated: Jul 28, 2025

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Author Spotlight: Investigating the Effects of Mind-Body-Movement Practices on Brain Function

Published on: January 26, 2024

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

The Spatial Memory Game: Testing the Relationship Between Spatial Language, Object Knowledge, and Spatial Cognition

Published on: February 19, 2018

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Area of Science:

Computer Vision
Artificial Intelligence

Background:

Text-based Visual Question Answering (TextVQA) requires understanding spatial relationships between scene text and objects.
Current 2D-based methods struggle with fine-grained spatial reasoning, limiting interpretability and performance.

Purpose of the Study:

To improve TextVQA by integrating 3D geometric information for more robust spatial reasoning.
To enhance the model's ability to interpret and utilize spatial context between visual elements and text.

Main Methods:

Introduction of 3D geometric information into the spatial reasoning process.
Proposal of a relation prediction module for precise object localization.
Design of a depth-aware attention calibration module to refine OCR token attention based on object context.

Main Results:

Achieved state-of-the-art performance on TextVQA and ST-VQA datasets.
Demonstrated significant performance gains of 5.7% and 12.1% on spatial reasoning questions in TextVQA and ST-VQA, respectively.
Validated generalizability on the text-based image captioning task.

Conclusions:

Integrating 3D geometric information offers a superior approach to spatial reasoning in TextVQA.
The proposed modules effectively enhance the model's understanding of object-text spatial relationships.
The method shows promise for advancing visual question answering and related multimodal AI tasks.