Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Molecular Models02:00

Molecular Models

Physical models representing molecular architectures of chemical compounds play essential roles in understanding chemistry. The use of molecular models makes it easier to visualize the structures and shapes of atoms and molecules.
Assembly of Cytoskeletal Filaments01:18

Assembly of Cytoskeletal Filaments

Cytoskeletal filaments are polymeric forms of smaller protein subunits. However, individual cytoskeletal filaments may easily disassemble or associate with other similar filaments to form rigid structures. Microfilaments, made of actin monomers, rely on actin-binding proteins to form bundles and create networks of individual actin filaments. Microtubules rely on microtubule-associated proteins (MAPs) to form sturdy cylindrical structures. However, the proteins involved in forming complex...
Assembly of Complex Microtubule Structures01:32

Assembly of Complex Microtubule Structures

Complex microtubule structures are present in resting cells and in dividing cells. In resting cells, they are responsible for maintaining the cellular architecture, tracks for intracellular transport, positioning of organelles, assembly of cilia and flagella. They mediate the bipolar spindle assembly for chromosomal segregation and positioning of the cell division plate in dividing cells. The formation of microtubule complex structures depends on the cell type, cell stage, and cell function.
Basal Lamina are the Specialized Form of ECM01:03

Basal Lamina are the Specialized Form of ECM

The basal lamina is a thin extracellular layer that lies underneath the cells and separates them from other tissues. The three layers of the basal lamina are lamina lucida, lamina densa and lamina reticularis. The basal lamina, a mixture of glycoproteins and collagen, provides an attachment site for the epithelium, separating it from underlying connective tissue. The framework of basal lamina has other essential proteins such as laminins mesh, perlecan, entactin, and type IV collagen.
Proteins...
Mechanistic Models: Overview of Compartment Models01:21

Mechanistic Models: Overview of Compartment Models

Mechanistic models, a category encompassing both physiological and compartmental modeling, differ from empirical models' approaches to incorporating known factors about the systems being modeled. Empirical models describe data with minimal assumptions, while mechanistic models aim to provide a robust description of available data by specifying assumptions and integrating known factors about the system. Compartmental analysis is a key example of a mechanistic model in pharmacokinetics and...
Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least squares (OLS)...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Natural killer cell-derived extracellular vesicles reprogram cellular human immunity to enhance tumor cytotoxicity.

Molecular therapy. Oncology·2026
Same author

Germline hypomethylation shapes dynamic CpG reservoirs in ape genomes.

bioRxiv : the preprint server for biology·2026
Same author

Night running and internet addiction among university students: a serial mediation model of stress/anxiety and rumination.

Frontiers in psychology·2026
Same author

Iodide Anion Anchoring by Silver Nanoparticles Enables Shuttle-Free Zinc-Iodine Batteries.

Angewandte Chemie (International ed. in English)·2026
Same author

Uncertainty-aware synthetic lethality prediction with pretrained foundation models.

bioRxiv : the preprint server for biology·2026
Same author

Anion Coordination Transition Enabled by Ion-Dipole Interactions At Low Temperatures.

Journal of the American Chemical Society·2026
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: Jun 14, 2026

Simple, Affordable, and Modular Patterning of Cells using DNA
08:59

Simple, Affordable, and Modular Patterning of Cells using DNA

Published on: February 24, 2021

4.6K

Heimdall: A Modular Framework for Tokenization in Single-Cell Foundation Models.

Ellie Haber1, Shahul Alam2, Nicholas Ho2

  • 1Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Biorxiv : the Preprint Server for Biology
|November 26, 2025
PubMed
Summary
This summary is machine-generated.

Heimdall systematically evaluates tokenization strategies for single-cell foundation models (scFMs), revealing that gene identity and order are crucial for generalization, especially under distribution shifts in single-cell RNA sequencing analysis.

More Related Videos

Author Spotlight: Integrating Organoid Models with Single-Cell and Spatial Transcriptomics Technologies
05:45

Author Spotlight: Integrating Organoid Models with Single-Cell and Spatial Transcriptomics Technologies

Published on: March 29, 2024

3.3K
Standardized Modular Assembly of Polycistronic Operons with Modular Cloning (MoClo) using the In-Cloning toolkit
06:28

Standardized Modular Assembly of Polycistronic Operons with Modular Cloning (MoClo) using the In-Cloning toolkit

Published on: September 2, 2025

653

Related Experiment Videos

Last Updated: Jun 14, 2026

Simple, Affordable, and Modular Patterning of Cells using DNA
08:59

Simple, Affordable, and Modular Patterning of Cells using DNA

Published on: February 24, 2021

4.6K
Author Spotlight: Integrating Organoid Models with Single-Cell and Spatial Transcriptomics Technologies
05:45

Author Spotlight: Integrating Organoid Models with Single-Cell and Spatial Transcriptomics Technologies

Published on: March 29, 2024

3.3K
Standardized Modular Assembly of Polycistronic Operons with Modular Cloning (MoClo) using the In-Cloning toolkit
06:28

Standardized Modular Assembly of Polycistronic Operons with Modular Cloning (MoClo) using the In-Cloning toolkit

Published on: September 2, 2025

653

Area of Science:

  • Computational Biology
  • Genomics
  • Machine Learning

Background:

  • Foundation models are increasingly used for single-cell RNA sequencing (scRNA-seq) data analysis.
  • The performance of these models heavily relies on cell tokenization strategies, which are not well understood.
  • Developing effective tokenization methods is critical for advancing scRNA-seq analysis.

Purpose of the Study:

  • To introduce Heimdall, a framework and toolkit for evaluating tokenization strategies in single-cell foundation models (scFMs).
  • To systematically assess the impact of different tokenization components on model performance.
  • To provide a foundation for reproducible research and development of next-generation scFMs.

Main Methods:

  • Decomposition of scFMs into modular components: gene identity encoder (F_G), expression encoder (F_E), and cell sentence constructor (F_C).
  • Evaluation of tokenization strategies using a transformer model trained from scratch.
  • Assessment across challenging transfer learning scenarios: cross-tissue, cross-species, and spatial gene-panel shifts, and reverse perturbation prediction.

Main Results:

  • Tokenization choices have minimal impact on in-distribution data but are critical under distribution shifts.
  • Gene identity (F_G) and token order significantly improve generalization.
  • Expression encoder (F_E) provides additional performance gains.
  • Recombining existing strategies can enhance model generalization.

Conclusions:

  • Heimdall provides a standardized approach for evaluating single-cell tokenization strategies.
  • Tokenization design is a key determinant of scFM performance, particularly in out-of-distribution settings.
  • Heimdall accelerates the development of more robust and generalizable scFMs for single-cell analysis.