Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Cross Product

Cross Product

The cross product is a fundamental concept in vector algebra that is a vector operation on two different vectors to obtain a third vector. Unlike the scalar product, the cross product results in a vector quantity perpendicular to both the original vectors.
The magnitude of the cross product is obtained by multiplying the magnitude of both the vectors and the sine of the angle between them. This means that a larger angle between the vectors will lead to a greater magnitude of the cross product.

Gestalt Principles of Perception

Gestalt Principles of Perception

Gestalt principles provide a framework for understanding how humans perceive objects as unified wholes within their context. These principles are essential in explaining the cognitive processes that make sense of complex visual stimuli by organizing them into coherent groups. One fundamental principle is proximity, which posits that objects located close to each other are perceived as a collective group. For instance, when dots are positioned near one another, the visual system interprets them...

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Cpeb4 regulates cardiomyocyte apoptosis in heart failure with association to Eif4a2 splicing modulation.

Scientific reports·2026

Same author

The N‑Glycoproteomic Landscape of the Lung in Monocrotaline-Induced Pulmonary Arterial Hypertension.

ACS omega·2026

Same author

ACE2 ameliorates DOX-induced cardiotoxicity by suppressing excessive autophagy via the AMPK/mTOR signaling pathway.

Biochemical pharmacology·2026

Same author

Analysis of the epidemiological features and factors associated with falls among the elderly in urban and rural areas of Chongqing, China: a cross-sectional study.

BMC public health·2026

Same author

Global, regional, and national trends in blindness and vision loss, 1990-2021: a secondary ecological trend analysis based on modelled population estimates.

Journal of global health·2026

Same author

SignMoD: Sign Language Video Generation via Mixture of Diffusion.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 8, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval.

Feifei Zhang, Mingliang Xu, Changsheng Xu

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|December 31, 2021

Summary

This summary is machine-generated.

This study introduces a novel geometry-sensitive network for Composed Query Based Image Retrieval (CQBIR). The method effectively bridges the semantic gap in complex image retrieval queries, outperforming existing approaches.

More Related Videos

Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities

Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities

Published on: October 27, 2023

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Related Experiment Videos

Last Updated: Oct 8, 2025

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities

Author Spotlight: An Efficient and Robust Software for Automated Fusion of Multiple Preclinical Imaging Modalities

Published on: October 27, 2023

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Area of Science:

Computer Science
Artificial Intelligence
Machine Learning

Background:

Composed Query Based Image Retrieval (CQBIR) involves retrieving images based on a reference image and a textual modification.
Existing CQBIR methods struggle with the semantic gap between image and text, often failing to model query interactions or spatial-visual-semantic relationships.

Purpose of the Study:

To propose a geometry-sensitive cross-modal reasoning network to address the challenges in CQBIR.
To effectively model the geometric information and visual-semantic relationships within composed queries.

Main Methods:

Developed a geometry-sensitive inter-modal attention module (GS-IMA) to incorporate spatial structure into attention mechanisms.
Introduced a text-guided visual reasoning module (TG-VR) to handle semantics not present in the reference image.
Jointly modeled geometric information and visual-semantic relationships for effective feature learning.

Main Results:

The proposed network learns effective features for composed queries, even without literal alignment.
Achieved favorable performance compared to state-of-the-art methods on three standard benchmarks.
Demonstrated the efficacy of jointly modeling geometric and visual-semantic information.

Conclusions:

The proposed geometry-sensitive cross-modal reasoning network offers a significant advancement in CQBIR.
The method successfully bridges the semantic gap by considering spatial structure and visual-semantic relationships.
This approach provides a more robust solution for complex image retrieval tasks.