Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Vision01:24

Vision

53.2K
Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
53.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Towards controllable video synthesis of routine and rare OR events.

International journal of computer assisted radiology and surgery·2026
Same author

CT-override: endoscopic updates to preoperative anatomical models during ablative surgery.

International journal of computer assisted radiology and surgery·2026
Same author

Detecting dataset bias in medical AI using a generalized and modality agnostic auditing approach.

NPJ digital medicine·2026
Same author

Investigating robot control policy learning for autonomous x-ray-guided spine procedures.

International journal of computer assisted radiology and surgery·2026
Same author

Dissecting acute neuronal responses to glioblastoma using a dual-interface human iPSC neuronal culture platform.

Acta neuropathologica communications·2026
Same author

Vision-based augmented reality guidance for setting up robot-assisted spine surgery.

International journal of computer assisted radiology and surgery·2026
Same journal

AMD-Mamba: A Phenotype-Aware Multi-modal Framework for Robust AMD Prognosis.

Machine learning in medical imaging. MLMI (Workshop)·2026
Same journal

Pseudo-Rendering for Resolution and Topology-Invariant Cortical Parcellation.

Machine learning in medical imaging. MLMI (Workshop)·2025
Same journal

Probabilistic 3D Correspondence Prediction from Sparse Unsegmented Images.

Machine learning in medical imaging. MLMI (Workshop)·2024
Same journal

Privacy-preserving Federated Brain Tumour Segmentation.

Machine learning in medical imaging. MLMI (Workshop)·2024
Same journal

Robust Unsupervised Super-Resolution of Infant MRI via Dual-Modal Deep Image Prior.

Machine learning in medical imaging. MLMI (Workshop)·2024
Same journal

Class-Balanced Deep Learning with Adaptive Vector Scaling Loss for Dementia Stage Detection.

Machine learning in medical imaging. MLMI (Workshop)·2024
See all related articles

Related Experiment Video

Updated: Jun 28, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
04:48

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

394

MoViT: Memorizing Vision Transformers for Medical Image Analysis.

Yiqing Shen1, Pengfei Guo1, Jingpu Wu1

  • 1Johns Hopkins University, Baltimore, USA.

Machine Learning in Medical Imaging. MLMI (Workshop)
|April 15, 2024
PubMed
Summary
This summary is machine-generated.

Memorizing Vision Transformer (MoViT) reduces the need for large datasets in medical imaging AI. This approach uses external memory to train transformer models effectively, even with limited data, achieving competitive performance with significantly less training.

Keywords:
External MemoryInsufficient DataPrototype LearningVision Transformer

More Related Videos

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
04:23

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

1.8K
Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
07:11

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

1.5K

Related Experiment Videos

Last Updated: Jun 28, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
04:48

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

394
A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
04:23

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

1.8K
Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping
07:11

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

1.5K

Area of Science:

  • Artificial Intelligence
  • Medical Image Analysis
  • Computer Vision

Background:

  • Transformers and CNNs offer complementary benefits in medical image analysis.
  • Transformers require substantial training data, posing challenges in medical imaging due to data limitations.

Purpose of the Study:

  • To propose a novel Memorizing Vision Transformer (MoViT) to reduce the reliance on large datasets for training transformer-based medical image analysis models.
  • To enhance the efficiency and applicability of transformer architectures in data-scarce medical imaging scenarios.

Main Methods:

  • MoViT employs an external memory to cache attention snapshots during training.
  • An attention temporal moving average scheme is used to prevent overfitting by updating memories.
  • Prototypical attention learning is utilized for inference speedup by distilling memory into smaller subsets.

Main Results:

  • MoViT outperforms vanilla transformer models on histology and MRI datasets, particularly with limited annotated data.
  • The proposed method achieves competitive performance comparable to Vision Transformer (ViT) using only 3.0% of the training data.
  • Demonstrated effectiveness across varied medical image analysis tasks.

Conclusions:

  • MoViT serves as a plug-in solution to significantly decrease the training data requirements for transformer architectures in medical image analysis.
  • The approach facilitates the development of effective AI models for medical imaging even when data availability is constrained.