Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Visual System

Visual System

Light enters the eye through the cornea, a transparent, dome-shaped surface covering the surface of the eyeball that helps to direct and focus incoming light. This light is then channeled toward the pupil, an adjustable opening whose size is controlled by the iris. The iris, a pigmented muscle, regulates the amount of light entering the eye by contracting or dilating the pupil, thereby ensuring optimal light levels for clear vision.
Once through the pupil, the light passes through the lens, a...

Visual Agnosia

Visual Agnosia

Visual agnosia is a condition characterized by the inability to recognize visually presented objects despite having normal vision. For instance, a person with visual agnosia can describe the shape and color of an object but cannot identify or name it. This impairment does not affect their visual field, acuity, color vision, brightness discrimination, language, or memory. An example of this condition in a social setting is someone at a dinner party asking for "that silver thing with a round...

Photoreceptors and Visual Pathways

Photoreceptors and Visual Pathways

At the molecular level, visual signals trigger transformations in photopigment molecules, resulting in changes in the photoreceptor cell's membrane potential. The photon's energy level is denoted by its wavelength, with each specific wavelength of visible light associated with a distinct color. The spectral range of visible light, classified as electromagnetic radiation, spans from 380 to 720 nm. Electromagnetic radiation wavelengths exceeding 720 nm fall under the infrared category,...

Hearing

Hearing

When we hear a sound, our nervous system is detecting sound waves—pressure waves of mechanical energy traveling through a medium. The frequency of the wave is perceived as pitch, while the amplitude is perceived as loudness.

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

Base Excision Repair

Base Excision Repair

One of the common DNA damages is the chemical alteration of single bases by alkylation, oxidation, or deamination. The altered bases cause mispairing and strand breakage during replication. This type of damage causes minimal change to the DNA double helix structure and can be repaired by the base excision repair (BER) pathways. BER corrects damaged DNA sequences by removing the damaged base and restoring the original base sequence using the complementary strand as a template.
The first step of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The Sound of Water: Inferring Physical Properties from Pouring Liquids.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Advancing regulatory variant effect prediction with AlphaGenome.

Nature·2026

Same author

Identifying scoliosis in a population-based adult cohort: automation of a validated method based on total body dual energy X-ray absorptiometry scans.

European spine journal : official publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society·2026

Same author

Detect+Track: robust and flexible software tools for improved tracking and behavioural analysis of fish.

Royal Society open science·2025

Same author

EPIC-SOUNDS: A Large-Scale Dataset of Actions That Sound.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

Automated detection of spinal bone marrow oedema in axial spondyloarthritis: training and validation using two large phase 3 trial datasets.

Rheumatology (Oxford, England)·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 31, 2026

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Published on: September 13, 2019

Deep Audio-Visual Speech Recognition.

Triantafyllos Afouras, Joon Son Chung, Andrew Senior

IEEE Transactions on Pattern Analysis and Machine Intelligence

|December 25, 2018

Summary

This summary is machine-generated.

This study advances lip reading AI for unconstrained sentences in natural videos. New models and a large dataset significantly improve performance, showing lip reading complements noisy audio recognition.

More Related Videos

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Ultrasound Images of the Tongue: A Tutorial for Assessment and Remediation of Speech Sound Errors

Ultrasound Images of the Tongue: A Tutorial for Assessment and Remediation of Speech Sound Errors

Published on: January 3, 2017

Related Experiment Videos

Last Updated: Jan 31, 2026

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Investigating the Effect of Visual Imagery and Learning Shape-Audio Regularities on Bouba and Kiki

Published on: September 13, 2019

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

Ultrasound Images of the Tongue: A Tutorial for Assessment and Remediation of Speech Sound Errors

Ultrasound Images of the Tongue: A Tutorial for Assessment and Remediation of Speech Sound Errors

Published on: January 3, 2017

Area of Science:

Computer Science
Artificial Intelligence
Machine Learning

Background:

Lip reading, or visual speech recognition, traditionally focused on limited vocabularies.
Existing methods struggle with natural, unconstrained language and real-world video conditions.

Purpose of the Study:

To develop and evaluate advanced lip reading models for open-world, natural language sentence recognition.
To assess the complementary role of lip reading alongside audio in noisy environments.
To introduce a novel, large-scale dataset for audio-visual speech recognition research.

Main Methods:

Comparison of two transformer-based self-attention models utilizing CTC loss and sequence-to-sequence loss.
Training and evaluation on a new, extensive dataset (LRS2-BBC) of natural sentences from broadcast television.
Investigating the synergy between visual speech recognition and noisy audio speech recognition.

Main Results:

Trained models significantly outperformed previous benchmarks on lip reading tasks.
Demonstrated the effectiveness of lip reading as a complementary modality to audio, especially under acoustic interference.
The LRS2-BBC dataset provides a valuable resource for advancing audio-visual speech recognition.

Conclusions:

Open-world lip reading is feasible with advanced deep learning architectures.
Lip reading offers substantial benefits in noisy conditions, enhancing overall speech recognition accuracy.
The LRS2-BBC dataset facilitates future research in robust audio-visual speech recognition.