360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos
View abstract on PubMed
Summary
This summary is machine-generated.Researchers developed a new method for tracking and segmenting objects in 360° videos, addressing challenges like wide fields-of-view. The extended bounding field-of-view (eBFoV) representation and a new dataset improve omnidirectional visual object tracking and segmentation.
Area Of Science
- Computer Vision
- Image Processing
- Machine Learning
Background
- Omnidirectional videos present unique challenges for object tracking and segmentation due to their wide field-of-view and significant spherical distortion.
- Existing methods struggle to accurately localize and track objects in 360° imagery.
Purpose Of The Study
- To introduce a novel representation and framework for robust visual object tracking and segmentation in omnidirectional videos.
- To establish a comprehensive dataset and benchmark for evaluating 360° video object segmentation (360VOS) algorithms.
Main Methods
- A new representation, extended bounding field-of-view (eBFoV), was developed for target localization.
- A general 360 tracking framework was proposed, building upon prior omnidirectional visual object tracking (360VOT) work.
- A new dataset, 360VOS, comprising 290 sequences with pixel-wise masks, was created and divided into training (170 sequences) and testing (120 sequences) subsets.
Main Results
- The proposed eBFoV representation and 360 tracking framework demonstrate effectiveness for both omnidirectional tracking and segmentation tasks.
- Extensive experiments benchmark state-of-the-art approaches on the new 360VOS dataset.
- Tailored evaluation metrics were developed for rigorous assessment of omnidirectional tracking and segmentation performance.
Conclusions
- The novel eBFoV representation and the proposed 360 tracking framework significantly advance omnidirectional visual object tracking and segmentation.
- The 360VOS dataset and benchmark provide essential resources for future research and development in this domain.
- The study highlights the effectiveness of the proposed methods and dataset in addressing the complexities of 360° video analysis.
Related Concept Videos
Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...
Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

