Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Types of Surveys01:27

Types of Surveys

35
Surveys are essential for marking property boundaries near water bodies. Different types of surveys are defined, each with its own function. Land surveys mark the property boundaries, while route surveys determine the position of properties on nearby highways. Topographic surveys create maps by capturing the three-dimensional features of the land. Hydrographic surveys focus on the shapes of underwater areas and the movement of streams through the properties. Mine surveys determine the relative...
35
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

96
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
96
Higher Mental Functions of the Brain: Language01:10

Higher Mental Functions of the Brain: Language

732
Language is a system of communication that allows the expression of thoughts, ideas, and feelings. The brain processes language in both hemispheres.
Language formation and comprehension take place in the dominant hemisphere. The dominant hemisphere is responsible for understanding the meaning of spoken, written, or sign language, as well as the ability to communicate. For most people, the left hemisphere is the dominant one. The right hemisphere, then, gives tone and emotional context to the...
732
Typical Model Studies01:30

Typical Model Studies

340
Fluid mechanics model studies often utilize scaled-down systems to predict fluid behavior in full-scale environments, such as river flows, dam spillways, and structures interacting with open surfaces. Maintaining Froude number similarity in river models is crucial, as it replicates surface flow features like wave patterns and velocities.
340
Multicompartment Models: Overview01:14

Multicompartment Models: Overview

93
Multicompartment models are mathematical constructs that depict how drugs are distributed and eliminated within the body. They segment the body into several compartments, symbolizing various physiological or anatomical areas connected through drug transfer processes such as absorption, metabolism, distribution, and elimination.
These models offer a more comprehensive representation of drug behavior in the body than one-compartment models. They accommodate the complexity of drug distribution,...
93
Data Collection by Survey01:07

Data Collection by Survey

6.4K
The systematic method of obtaining and analyzing accurate information of a population is called data collection. A survey is a standard method of data collection that involves collecting information from a target human population about their experience, opinion, or knowledge of a product, service, or process. The responses are recorded and interpreted. The most common survey examples are written questionnaires, face-to-face or telephonic conversations, focus groups, and electronic (e-mail or...
6.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

DisenTS: Disentangled Channel Evolving Pattern Modeling for Multivariate Time Series Forecasting.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

ActionX: pre-training action experts with reinforcement learning for vision-language action models.

Frontiers in neurorobotics·2026
Same author

Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Survey of Computerized Adaptive Testing: a Machine Learning Perspective.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Beyond LLaVA-HD: Diving Into High-Resolution Multimodal Large Language Models.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Learning Deep Tree-Based Retriever for Efficient Recommendation: Theory and Method.

IEEE transactions on pattern analysis and machine intelligence·2025
Same journal

Post-Moore two-dimensional integrated electronics for angstrom-nodes.

National science review·2026
Same journal

A multienzyme-mimicking nanoplatform induces disulfidptosis/cuproptosis/apoptosis for tumor therapy.

National science review·2026
Same journal

Nanogalvanic cell catalysts: bridging electrochemical and thermal catalysis.

National science review·2026
Same journal

Occupancy as a key attribute linking saprotrophic fungi to soil carbon decomposition.

National science review·2026
Same journal

Oxygen-mediated tandem polyethylene upcycling for selective aromatic synthesis.

National science review·2026
Same journal

Toxicity-informed control of global PM<sub>2.5</sub> emissions.

National science review·2026
See all related articles

Related Experiment Video

Updated: Jun 5, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

500

A survey on multimodal large language models.

Shukang Yin1, Chaoyou Fu2,3, Sirui Zhao1

  • 1School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei 230026, China.

National Science Review
|December 16, 2024
PubMed
Summary
This summary is machine-generated.

Multimodal large language models (MLLMs) show emergent capabilities, advancing towards artificial general intelligence. This paper surveys recent MLLM progress, covering architectures, training, and future research directions.

Keywords:
large language modelmultimodal large language modelvision language model

More Related Videos

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
08:05

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

7.5K
Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks
08:32

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

5.6K

Related Experiment Videos

Last Updated: Jun 5, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

500
Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
08:05

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

7.5K
Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks
08:32

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

5.6K

Area of Science:

  • Artificial Intelligence
  • Computer Vision
  • Natural Language Processing

Background:

  • Multimodal Large Language Models (MLLMs), exemplified by GPT-4V, are a rapidly advancing research area.
  • MLLMs leverage Large Language Models (LLMs) for complex multimodal tasks, exhibiting emergent capabilities beyond traditional methods.
  • The development of MLLMs is accelerating, with significant contributions from both academia and industry.

Purpose of the Study:

  • To provide a comprehensive overview and summary of recent advancements in Multimodal Large Language Models (MLLMs).
  • To delineate the fundamental formulation, architecture, training strategies, data, and evaluation metrics of MLLMs.
  • To explore extensions and challenges within the MLLM domain, including multimodality, multilingualism, and advanced reasoning techniques.

Main Methods:

  • Systematic review and summarization of current MLLM research.
  • Analysis of MLLM architectures, training methodologies, and datasets.
  • Exploration of emerging research topics such as enhanced granularity, modality support, language capabilities, and scenario applications.
  • Investigation of multimodal hallucination and advanced techniques like in-context learning, chain-of-thought, and LLM-aided visual reasoning.

Main Results:

  • Recent MLLMs demonstrate surprising emergent abilities, including image-based story generation and optical character recognition-free mathematical reasoning.
  • Significant progress has been made in developing MLLMs that rival or surpass existing benchmarks like GPT-4V.
  • The field is rapidly evolving, with ongoing efforts to enhance MLLM capabilities across various dimensions.

Conclusions:

  • MLLMs represent a significant step towards artificial general intelligence due to their emergent capabilities.
  • This survey provides a structured understanding of the MLLM landscape, highlighting key concepts and research trajectories.
  • Future research should address current challenges and explore promising directions to further advance MLLM technology.