Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

TransXNet: Learning Both Global and Local Dynamics With a Dual Dynamic Token Mixer for Visual Recognition.

Meng Lou, Shu Zhang, Hong-Yu Zhou

    IEEE Transactions on Neural Networks and Learning Systems
    |April 3, 2025
    PubMed
    Summary
    This summary is machine-generated.

    Related Concept Videos

    Vision01:24

    Vision

    52.5K
    Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
    52.5K

    You might also read

    Related Articles

    Articles linked to this work by shared authors, journal, and citation graph.

    Sort by
    Same author

    Toward Practical Solid-State Lithium Batteries With High-Nickel Cathodes: An Interface-Centered Perspective.

    Advanced materials (Deerfield Beach, Fla.)·2026
    Same author

    Generative Artificial Intelligence and Large Language Models in Clinical Oncology.

    MedComm·2026
    Same author

    Large reasoning models as thinking machines for medicine.

    Nature biomedical engineering·2026
    Same author

    Investigation on the unsteady aerodynamic coefficients of iced conductors and the applicability of quasi-static assumptions.

    Scientific reports·2026
    Same author

    GPR15-guided CD8<sup>+</sup> T regulatory cells control intestinal inflammation.

    Nature·2026
    Same author

    Self-Limiting Covalent Ligation Mechanism Enabling Anomalously High Interfacial Compatibility in Organic-in-Sulfide All-Solid-State Lithium Batteries.

    Angewandte Chemie (International ed. in English)·2026
    Same journal

    Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    A Survey on Human-Centric Voice-Face Multimodal Learning.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

    IEEE transactions on neural networks and learning systems·2026
    Same journal

    FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

    IEEE transactions on neural networks and learning systems·2026
    See all related articles

    This study introduces a novel dual dynamic token mixer (D-Mixer) for vision networks, enhancing performance by enabling dynamic adaptation to input data. The proposed TransXNet model achieves superior accuracy and efficiency in image classification and dense prediction tasks.

    Area of Science:

    • Computer Vision
    • Deep Learning
    • Artificial Intelligence

    Background:

    • Integrating convolutions with transformers aims to improve generalization via inductive bias.
    • Static convolutions in hybrid networks limit dynamic adaptation and feature fusion with self-attention.
    • This leads to suboptimal representation capacity in current CNN-transformer architectures.

    Purpose of the Study:

    • To address the limitations of static convolutions in hybrid vision networks.
    • To propose a novel, lightweight dual dynamic token mixer (D-Mixer) for enhanced feature representation.
    • To develop a new hybrid CNN-transformer backbone, TransXNet, for improved performance and efficiency.

    Main Methods:

    • Introduced a dual dynamic token mixer (D-Mixer) that learns global and local dynamics in an input-dependent manner.

    Related Experiment Videos

  • D-Mixer utilizes an efficient global attention module and an input-dependent depthwise convolution on split feature segments.
  • Constructed TransXNet, a hybrid CNN-transformer vision backbone, using D-Mixer as the fundamental building block.
  • Main Results:

    • TransXNet-T achieved 0.3% higher top-1 accuracy than Swin-T on ImageNet-1K with less than half the computational cost.
    • TransXNet-S and TransXNet-B demonstrated strong scalability, reaching 83.8% and 84.6% top-1 accuracy, respectively.
    • The architecture showed superior generalization on dense prediction tasks compared to state-of-the-art methods at lower computational costs.

    Conclusions:

    • The proposed D-Mixer effectively overcomes the limitations of static convolutions in hybrid networks.
    • TransXNet offers a compelling balance of high accuracy, efficiency, and strong generalization capabilities.
    • The D-Mixer approach presents a promising direction for designing efficient and effective vision backbone networks.