Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

2.7K
2.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Machine learning early warning for urban heat risk with CMIP6 projections.

Journal of environmental management·2026
Same author

Mitochondrial dynamics in environmental neurotoxicity: beyond oxidative stress toward spatial and functional reorganization.

Archives of toxicology·2026
Same author

The CXCL9/SPP1 polarity axis in tumor-associated macrophages: immunoregulatory and prognostic significance in non-small cell lung cancer.

Frontiers in immunology·2026
Same author

Subspecialty-specific foundation model for intelligent gastrointestinal pathology.

NPJ digital medicine·2026
Same author

Youth perceptions of urban waterfront environments for stress relief: a social media text analysis study in Beijing.

Frontiers in public health·2026
Same author

Multi-omics integration reveals that pyrimidine metabolism in lung adenocarcinoma drives an immunosuppressive microenvironment.

iScience·2026
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Sep 11, 2025

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.6K

Scaling up Multimodal Pre-Training for Sign Language Understanding.

Wengang Zhou, Weichao Zhao, Hezhen Hu

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |August 14, 2025
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a multimodal sign language pre-training (SLP) framework using a large dataset (SL-1.5M) to improve sign language understanding (SLU) models. The new method enhances model generalization by integrating visual and textual cues for better sign language video representation.

    More Related Videos

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
    03:14

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

    Published on: December 6, 2024

    681
    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
    09:09

    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

    Published on: September 27, 2024

    523

    Related Experiment Videos

    Last Updated: Sep 11, 2025

    Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
    05:48

    Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

    Published on: August 9, 2024

    1.6K
    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
    03:14

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

    Published on: December 6, 2024

    681
    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
    09:09

    Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

    Published on: September 27, 2024

    523

    Area of Science:

    • Computer Science
    • Artificial Intelligence
    • Natural Language Processing

    Background:

    • Sign language pre-training (SLP) enhances sign language understanding (SLU) but faces limitations in model generalization and neglecting textual cues.
    • Existing methods often use task-specific pre-training on small datasets or focus only on visual information, reducing model representational capacity.

    Purpose of the Study:

    • To develop a multimodal SLP framework that leverages visual context and vision-language consistency for improved sign language video representation.
    • To address data scarcity by curating a large-scale text-labeled sign pose dataset (SL-1.5M).

    Main Methods:

    • Curated a large-scale text-labeled sign pose dataset (SL-1.5M) from diverse sources.
    • Proposed a pre-training framework integrating sign-text contrastive learning and masked pose modeling.
    • Concurrently modeled manual and non-manual sign language information for holistic visual content representation.

    Main Results:

    • The framework effectively captures contextual cues in sign pose sequences and aligns semantic text features.
    • Achieved new state-of-the-art performance on diverse SLU tasks, demonstrating superior generalization and effectiveness.
    • Validated the framework's ability to enhance the representative capability of sign language videos.

    Conclusions:

    • The proposed multimodal SLP framework significantly improves sign language understanding by integrating visual and textual information.
    • The approach overcomes limitations of existing methods, offering better generalization and representational capacity for sign language models.
    • This work sets a new benchmark for sign language understanding through advanced pre-training techniques.