Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Next-generation Sequencing03:00

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
Higher Mental Functions of the Brain: Language01:10

Higher Mental Functions of the Brain: Language

Language is a system of communication that allows the expression of thoughts, ideas, and feelings. The brain processes language in both hemispheres.
Language formation and comprehension take place in the dominant hemisphere. The dominant hemisphere is responsible for understanding the meaning of spoken, written, or sign language, as well as the ability to communicate. For most people, the left hemisphere is the dominant one. The right hemisphere, then, gives tone and emotional context to the...
Components of Language01:24

Components of Language

Language, whether spoken, signed, or written, consists of specific components: lexicon and grammar. The lexicon is the vocabulary of a language, comprising its words. Grammar is the set of rules used to convey meaning through the lexicon. For example, English grammar adds “-ed” to most verbs to indicate past tense. Words are formed by combining phonemes, which are the basic sound units of a language. Different languages have different sets of phonemes (e.g., “ah” vs. “eh”). Phonemes combine to...
Language Development01:22

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
Language and Cognition01:27

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A Study of Otolaryngology Residency Match Outcomes from the National Residency Matching Program.

OTO open·2026
Same author

Aging disrupts spatiotemporal coordination in the cycling murine ovary.

Nature aging·2026
Same author

CpG Atlas: A centralized multi-layer database and AI interface for DNA methylation research.

bioRxiv : the preprint server for biology·2026
Same author

Detection of mitochondrial bioenergetics using a novel bimodal 3D microelectrode array (MEA)-based biosensor.

Microsystems & nanoengineering·2026
Same author

Long-Term Outcomes and Safety of Robotic Bariatric Surgery.

JSLS : Journal of the Society of Laparoendoscopic Surgeons·2026
Same author

Short Term Outcomes and Resource Utilization in Patients Admitted for Infective Endocarditis Who Develop Pericardial Effusion: A Retrospective Cohort Study.

Journal of community hospital internal medicine perspectives·2026
Same journal

Layered social competition coordinates reproductive hierarchy formation in ants.

bioRxiv : the preprint server for biology·2026
Same journal

Combination epigenetic-targeted therapy increases the immunogenicity of poorly immunogenic sarcomas.

bioRxiv : the preprint server for biology·2026
Same journal

Loss of LanC-like proteins delays post-injury regeneration of aging skeletal muscles.

bioRxiv : the preprint server for biology·2026
Same journal

Integrative Transfer Network: Deep Transfer Learning Across Populations and Prediction Targets.

bioRxiv : the preprint server for biology·2026
Same journal

Confidence-supported label-free metabolic imaging with FPhaS phase autofluorescence microscopy.

bioRxiv : the preprint server for biology·2026
Same journal

Sequence-encoded autoinhibition couples mRNA decapping activity to phase separation.

bioRxiv : the preprint server for biology·2026
See all related articles

Related Experiment Video

Updated: Jun 18, 2026

Droplet Barcoding-Based Single Cell Transcriptomics of Adult Mammalian Tissues
10:12

Droplet Barcoding-Based Single Cell Transcriptomics of Adult Mammalian Tissues

Published on: January 10, 2019

19.0K

Scaling Large Language Models for Next-Generation Single-Cell Analysis.

Syed Asad Rizvi1, Daniel Levine2, Aakash Patel2

  • 1Yale University, Google Research.

Biorxiv : the Preprint Server for Biology
|November 24, 2025
PubMed
Summary
This summary is machine-generated.

We developed C2S-Scale, a large language model (LLM) trained on over a billion tokens of transcriptomic and biological text data. This advanced model enhances single-cell RNA sequencing analysis and guides biological discovery by integrating diverse data types.

More Related Videos

A Robust Method for the Large-Scale Production of Spheroids for High-Content Screening and Analysis Applications
06:40

A Robust Method for the Large-Scale Production of Spheroids for High-Content Screening and Analysis Applications

Published on: December 28, 2021

3.8K
Analysis of Multidimensional Microscopy Data Using Cell-ACDC
06:17

Analysis of Multidimensional Microscopy Data Using Cell-ACDC

Published on: November 7, 2025

434

Related Experiment Videos

Last Updated: Jun 18, 2026

Droplet Barcoding-Based Single Cell Transcriptomics of Adult Mammalian Tissues
10:12

Droplet Barcoding-Based Single Cell Transcriptomics of Adult Mammalian Tissues

Published on: January 10, 2019

19.0K
A Robust Method for the Large-Scale Production of Spheroids for High-Content Screening and Analysis Applications
06:40

A Robust Method for the Large-Scale Production of Spheroids for High-Content Screening and Analysis Applications

Published on: December 28, 2021

3.8K
Analysis of Multidimensional Microscopy Data Using Cell-ACDC
06:17

Analysis of Multidimensional Microscopy Data Using Cell-ACDC

Published on: November 7, 2025

434

Area of Science:

  • Computational Biology
  • Genomics
  • Artificial Intelligence

Background:

  • Single-cell RNA sequencing (scRNA-seq) reveals cellular diversity but current models lack scalability and text integration.
  • Existing single-cell foundation models (scFMs) struggle with diverse tasks and combining transcriptomic data with biological text.
  • The Cell2Sentence (C2S) framework represents scRNA-seq profiles as text, offering a foundation for integrating diverse data.

Purpose of the Study:

  • To develop a scalable and versatile foundation model for single-cell analysis by integrating transcriptomic and textual data.
  • To enhance the capabilities of LLMs for biological research by training on a large corpus of biological data.
  • To demonstrate the utility of the C2S-Scale model in predicting biological responses and guiding experimental discovery.

Main Methods:

  • Trained a 27-billion-parameter LLM on over one billion tokens of transcriptomic data, biological text, and metadata using the C2S framework.
  • Employed reinforcement learning for targeted fine-tuning to improve performance on specific biological tasks.
  • Utilized a dual-context virtual screen to nominate drug candidates for context-selective biological effects.

Main Results:

  • C2S-Scale demonstrated consistent improvements in predictive and generative capabilities across various downstream tasks.
  • The model achieved strong performance in perturbation response prediction, natural language interpretation, and biological reasoning.
  • A virtual screen nominated silmitasertib (CX-4945) for context-selective antigen presentation upregulation, which was experimentally validated in human cell models.
  • C2S-Scale effectively integrates transcriptomic and textual data at an unprecedented scale, outperforming specialized scFMs and general LLMs.

Conclusions:

  • C2S-Scale represents a significant advancement in single-cell analysis, unifying diverse data types for next-generation research.
  • The model's ability to guide context-conditioned biological discovery opens new avenues for drug development and biological understanding.
  • C2S-Scale provides a powerful platform for creating "virtual cells" and pushing the boundaries of computational biology.