Subjective and Objective Audio-Visual Quality Assessment for Omnidirectional Videos
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a new method for assessing the quality of audio-visual content in virtual reality (VR). The developed OmniAVNet model effectively predicts the overall quality of omnidirectional videos (ODVs), improving user experience.
Area Of Science
- Computer Science
- Multimedia Systems
- Signal Processing
Background
- Virtual Reality (VR) and Omnidirectional Videos (ODVs) are increasingly popular for immersive experiences.
- Existing Quality of Experience (QoE) studies for ODVs primarily focus on visual aspects, neglecting audio's impact.
- Optimizing ODV quality across production and transmission is crucial for user satisfaction.
Purpose Of The Study
- To comprehensively study omnidirectional audio-visual quality assessment (OD-AVQA).
- To develop a large-scale database for OD-AVQA.
- To propose and validate novel objective OD-AVQA models.
Main Methods
- Established OAVQAD+, the largest database for ODV audio-visual quality assessment, with 625 distorted sequences and MOS scores.
- Constructed a benchmark including Type I, II, and III OD-AVQA models.
- Proposed OmniAVNet, a novel network integrating audio, visual, and motion features for full-reference (FR) and no-reference (NR) OD-AVQA.
Main Results
- OmniAVNet significantly outperforms existing benchmark OD-AVQA models on multiple datasets.
- The proposed model demonstrates strong performance in predicting audio-visual quality for ODVs.
- The study provides valuable resources (database and code) for advancing OD-AVQA research.
Conclusions
- Audio modality significantly impacts the perceived quality of ODVs.
- OmniAVNet offers an effective solution for objective OD-AVQA, supporting both FR and NR modes.
- This work contributes to enhancing the QoE of immersive VR experiences through better ODV quality assessment.
Related Concept Videos
Pulse amplitude is a crucial indicator of cardiac health because it provides valuable insights into the strength of left ventricular contractions and the overall uniformity of blood circulation within the vasculature. The strength of the pulse is directly related to the force with which the heart contracts and the volume of blood being pumped.
A weak or absent pulse may indicate reduced cardiac output or poor left ventricular contraction, which can be signs of cardiovascular dysfunction or...
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
The human ear is not equally sensitive to all frequencies in the audible range. It may perceive sound waves with the same pressure but different frequencies as having different loudness. Moreover, the perception of sound waves depends on the health of an individual's ears, which decays with age. The health of one's ears may also be affected by regular exposure to loud noises.
The pitch of a sound depends on the frequency and the pressure amplitude of the source. Two sounds of the same...
The auditory system is essential for sound perception, utilizing various critical structures. When sound waves enter the outer ear, they travel through the ear canal and cause the eardrum to vibrate. These vibrations are then transmitted to the middle ear, where three tiny bones – the malleus, incus, and stapes – amplify the sound. This amplification is crucial, as it ensures that the sound vibrations are strong enough to be conveyed to the inner ear. These vibrations then reach the...
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...

