Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

Spherical Coordinates

Spherical Coordinates

Spherical coordinate systems are preferred over Cartesian, polar, or cylindrical coordinates for systems with spherical symmetry. For example, to describe the surface of a sphere, Cartesian coordinates require all three coordinates. On the other hand, the spherical coordinate system requires only one parameter: the sphere's radius. As a result, the complicated mathematical calculations become simple. Spherical coordinates are used in science and engineering applications like electric and...

Relative Motion Analysis using Rotating Axes

Relative Motion Analysis using Rotating Axes

Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Dynamic nested hierarchies: self-evolving machine learning architectures for lifelong learning.

Frontiers in artificial intelligence·2026

Same author

The rubber tool illusion reveals how body image modifies body schema.

Journal of experimental psychology. Human perception and performance·2025

Same author

Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper.

Entropy (Basel, Switzerland)·2024

Same author

HyperE2VID: Improving Event-Based Video Reconstruction via Hypernetworks.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2024

Same author

Multi3Generation: Multitask, Multilingual, and Multimodal Language Generation.

Open research Europe·2023

Same author

REC-NN: A reconstruction error compensation neural network for Magnetic Resonance Electrical Property Tomography (MREPT).

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2023

Same journal

Raising the Bar in Graph OOD Generalization: Invariant Learning beyond Explicit Environment Modeling.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

LoRASculpt: Harmonious Low-Rank Adaptation for Multimodal Large Language Models.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Linearly Solving Robust Rotation Estimation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adapting Dense Vision-Language Relationships for Multi-label Classification with Partial Label.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

MoE-Enhanced Explainable Deep Manifold Transformation for Complex Data Embedding and Visualization.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 9, 2025

Author Spotlight: Insights into the Analysis of Human Interaction with 3D Virtual Objects

Author Spotlight: Insights into the Analysis of Human Interaction with 3D Virtual Objects

Published on: October 18, 2024

Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360$^{\circ }$∘ Videos.

Mert Cokelek, Halit Ozsoy, Nevrez Imamoglu

IEEE Transactions on Pattern Analysis and Machine Intelligence

|August 29, 2025

Summary

This summary is machine-generated.

This study introduces new models for predicting visual attention in 360-degree videos, incorporating spatial audio. Integrating audio cues significantly improves the accuracy of saliency prediction in omnidirectional videos.

More Related Videos

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Published on: August 12, 2021

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Related Experiment Videos

Last Updated: Sep 9, 2025

Author Spotlight: Insights into the Analysis of Human Interaction with 3D Virtual Objects

Author Spotlight: Insights into the Analysis of Human Interaction with 3D Virtual Objects

Published on: October 18, 2024

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Robotized Testing of Camera Positions to Determine Ideal Configuration for Stereo 3D Visualization of Open-Heart Surgery

Published on: August 12, 2021

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Area of Science:

Computer Vision
Virtual Reality
Human-Computer Interaction

Background:

Omnidirectional videos (ODVs) offer immersive virtual reality (VR) experiences with a full field-of-view (FOV).
Predicting visual saliency in 360° environments presents unique challenges due to spherical distortion and the integration of spatial audio.
Existing datasets lack comprehensive audio-visual data for 360° saliency prediction.

Purpose of the Study:

To extend saliency prediction to 360° video environments by addressing spherical distortion and spatial audio integration.
To develop and evaluate novel models for audio-visual saliency prediction in ODVs.
To introduce a new dataset, YT360-EyeTracking, for training and evaluating 360° saliency prediction models.

Main Methods:

Curated the YT360-EyeTracking dataset comprising 81 ODVs with varying audio-visual conditions.
Proposed SalViT360, a vision-transformer model with spherical geometry-aware attention for ODVs.
Developed SalViT360-AV, an extension incorporating transformer adapters conditioned on audio input.

Main Results:

SalViT360 and SalViT360-AV significantly outperform existing methods on benchmark datasets, including YT360-EyeTracking.
Demonstrated the effectiveness of incorporating spatial audio cues for enhanced saliency prediction accuracy.
Validated the models' ability to predict viewer attention in complex 360° scenes.

Conclusions:

Integrating spatial audio is crucial for accurate saliency prediction in omnidirectional videos.
The proposed SalViT360 and SalViT360-AV models represent significant advancements in 360° visual attention prediction.
The YT360-EyeTracking dataset facilitates further research in audio-visual saliency for immersive media.