Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Vision

Vision

Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Towards controllable video synthesis of routine and rare OR events.

International journal of computer assisted radiology and surgery·2026

Same author

CT-override: endoscopic updates to preoperative anatomical models during ablative surgery.

International journal of computer assisted radiology and surgery·2026

Same author

Detecting dataset bias in medical AI using a generalized and modality agnostic auditing approach.

NPJ digital medicine·2026

Same author

Investigating robot control policy learning for autonomous x-ray-guided spine procedures.

International journal of computer assisted radiology and surgery·2026

Same author

Dissecting acute neuronal responses to glioblastoma using a dual-interface human iPSC neuronal culture platform.

Acta neuropathologica communications·2026

Same author

Vision-based augmented reality guidance for setting up robot-assisted spine surgery.

International journal of computer assisted radiology and surgery·2026

Same journal

AMD-Mamba: A Phenotype-Aware Multi-modal Framework for Robust AMD Prognosis.

Machine learning in medical imaging. MLMI (Workshop)·2026

Same journal

Pseudo-Rendering for Resolution and Topology-Invariant Cortical Parcellation.

Machine learning in medical imaging. MLMI (Workshop)·2025

Same journal

Probabilistic 3D Correspondence Prediction from Sparse Unsegmented Images.

Machine learning in medical imaging. MLMI (Workshop)·2024

Same journal

Privacy-preserving Federated Brain Tumour Segmentation.

Machine learning in medical imaging. MLMI (Workshop)·2024

Same journal

Robust Unsupervised Super-Resolution of Infant MRI via Dual-Modal Deep Image Prior.

Machine learning in medical imaging. MLMI (Workshop)·2024

Same journal

Class-Balanced Deep Learning with Adaptive Vector Scaling Loss for Dementia Stage Detection.

Machine learning in medical imaging. MLMI (Workshop)·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 28, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

MoViT: Memorizing Vision Transformers for Medical Image Analysis.

Yiqing Shen¹, Pengfei Guo¹, Jingpu Wu¹

¹Johns Hopkins University, Baltimore, USA.

Machine Learning in Medical Imaging. MLMI (Workshop)

|April 15, 2024

Summary

This summary is machine-generated.

Memorizing Vision Transformer (MoViT) reduces the need for large datasets in medical imaging AI. This approach uses external memory to train transformer models effectively, even with limited data, achieving competitive performance with significantly less training.

Keywords:

External Memory Insufficient Data Prototype Learning Vision Transformer

More Related Videos

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

Related Experiment Videos

Last Updated: Jun 28, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Author Spotlight: Insights into Visual Cortex Research Through Wide-View fMRI Mapping

Published on: December 8, 2023

Area of Science:

Artificial Intelligence
Medical Image Analysis
Computer Vision

Background:

Transformers and CNNs offer complementary benefits in medical image analysis.
Transformers require substantial training data, posing challenges in medical imaging due to data limitations.

Purpose of the Study:

To propose a novel Memorizing Vision Transformer (MoViT) to reduce the reliance on large datasets for training transformer-based medical image analysis models.
To enhance the efficiency and applicability of transformer architectures in data-scarce medical imaging scenarios.

Main Methods:

MoViT employs an external memory to cache attention snapshots during training.
An attention temporal moving average scheme is used to prevent overfitting by updating memories.
Prototypical attention learning is utilized for inference speedup by distilling memory into smaller subsets.

Main Results:

MoViT outperforms vanilla transformer models on histology and MRI datasets, particularly with limited annotated data.
The proposed method achieves competitive performance comparable to Vision Transformer (ViT) using only 3.0% of the training data.
Demonstrated effectiveness across varied medical image analysis tasks.

Conclusions:

MoViT serves as a plug-in solution to significantly decrease the training data requirements for transformer architectures in medical image analysis.
The approach facilitates the development of effective AI models for medical imaging even when data availability is constrained.