Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Gestalt Principles of Perception01:21

Gestalt Principles of Perception

363
Gestalt principles provide a framework for understanding how humans perceive objects as unified wholes within their context. These principles are essential in explaining the cognitive processes that make sense of complex visual stimuli by organizing them into coherent groups. One fundamental principle is proximity, which posits that objects located close to each other are perceived as a collective group. For instance, when dots are positioned near one another, the visual system interprets them...
363
Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

776
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
776
Visual System01:26

Visual System

632
Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...
632
Parallel Processing01:20

Parallel Processing

191
The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...
191
Vision01:24

Vision

53.9K
Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
53.9K
Visual Agnosia01:12

Visual Agnosia

259
Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...
259

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

LoRASculpt: Harmonious Low-Rank Adaptation for Multimodal Large Language Models.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Towards clinical-level interpretation of dental panoramic radiography using an instance-guided vision-language model.

Nature biomedical engineering·2026
Same author

Systemic immune-inflammation index predicts post-thrombectomy outcomes and reveals a mediating role in the association between neurocardiac stress and prognosis: a multicenter study.

Frontiers in neurology·2026
Same author

HiSymGeo: Hierarchical Context Symbiosis for Cross-View Object-Level Image Geo-Localization.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Holistic Invariant Retracing for Distortion-Resilient Multi-Modal Learning in Spatial Transcriptomics.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Differentiable Clustering Graph Convolutional Network for Hyperspectral Unmixing: Methodology and Benchmark.

IEEE transactions on neural networks and learning systems·2026
Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
See all related articles

Related Experiment Video

Updated: Aug 4, 2025

Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

20.0K

Cross-Modality Pyramid Alignment for Visual Intention Understanding.

Mang Ye, Qinghongya Shi, Kehua Su

    IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society
    |April 5, 2023
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces Cross-modality Pyramid Alignment with Dynamic optimization (CPAD) for visual intention understanding. CPAD enhances image meaning comprehension by leveraging hierarchical visual and textual data, outperforming existing methods.

    More Related Videos

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    7.7K
    Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
    07:11

    Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

    Published on: December 8, 2023

    1.6K

    Related Experiment Videos

    Last Updated: Aug 4, 2025

    Cross-Modal Multivariate Pattern Analysis
    13:51

    Cross-Modal Multivariate Pattern Analysis

    Published on: November 9, 2011

    20.0K
    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers
    12:39

    A Methodology for Capturing Joint Visual Attention Using Mobile Eye-Trackers

    Published on: January 18, 2020

    7.7K
    Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
    07:11

    Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

    Published on: December 8, 2023

    1.6K

    Area of Science:

    • Computer Vision
    • Artificial Intelligence
    • Machine Learning

    Background:

    • Visual intention understanding aims to interpret the meaning within images.
    • Relying solely on object or background information causes comprehension bias.
    • Existing methods struggle with nuanced interpretation of visual content.

    Purpose of the Study:

    • To propose a novel method, Cross-modality Pyramid Alignment with Dynamic optimization (CPAD), for enhanced visual intention understanding.
    • To address comprehension bias by incorporating hierarchical modeling of visual and textual data.
    • To improve the global understanding of visual meaning in images.

    Main Methods:

    • CPAD utilizes hierarchical modeling, treating visual intention understanding as a hierarchical classification problem.
    • It captures multi-granular visual features across different layers, aligning with hierarchical intention labels.
    • A cross-modality pyramid alignment module bridges the gap between visual and textual modalities through joint learning.

    Main Results:

    • The proposed CPAD method demonstrates superior performance in visual intention understanding.
    • Experiments show that CPAD effectively enhances the global comprehension of visual meaning.
    • The method outperforms existing approaches in benchmark evaluations.

    Conclusions:

    • CPAD offers a robust framework for visual intention understanding by exploiting hierarchical relationships.
    • The approach effectively mitigates comprehension bias through cross-modality alignment.
    • This method represents a significant advancement in interpreting the underlying meaning of images.