Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Types of Surveys

Types of Surveys

Surveys are essential for marking property boundaries near water bodies. Different types of surveys are defined, each with its own function. Land surveys mark the property boundaries, while route surveys determine the position of properties on nearby highways. Topographic surveys create maps by capturing the three-dimensional features of the land. Hydrographic surveys focus on the shapes of underwater areas and the movement of streams through the properties. Mine surveys determine the relative...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Higher Mental Functions of the Brain: Language

Higher Mental Functions of the Brain: Language

Language is a system of communication that allows the expression of thoughts, ideas, and feelings. The brain processes language in both hemispheres.
Language formation and comprehension take place in the dominant hemisphere. The dominant hemisphere is responsible for understanding the meaning of spoken, written, or sign language, as well as the ability to communicate. For most people, the left hemisphere is the dominant one. The right hemisphere, then, gives tone and emotional context to the...

Typical Model Studies

Typical Model Studies

Fluid mechanics model studies often utilize scaled-down systems to predict fluid behavior in full-scale environments, such as river flows, dam spillways, and structures interacting with open surfaces. Maintaining Froude number similarity in river models is crucial, as it replicates surface flow features like wave patterns and velocities.

Multicompartment Models: Overview

Multicompartment Models: Overview

Multicompartment models are mathematical constructs that depict how drugs are distributed and eliminated within the body. They segment the body into several compartments, symbolizing various physiological or anatomical areas connected through drug transfer processes such as absorption, metabolism, distribution, and elimination.
These models offer a more comprehensive representation of drug behavior in the body than one-compartment models. They accommodate the complexity of drug distribution,...

Data Collection by Survey

Data Collection by Survey

The systematic method of obtaining and analyzing accurate information of a population is called data collection. A survey is a standard method of data collection that involves collecting information from a target human population about their experience, opinion, or knowledge of a product, service, or process. The responses are recorded and interpreted. The most common survey examples are written questionnaires, face-to-face or telephonic conversations, focus groups, and electronic (e-mail or...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

DisenTS: Disentangled Channel Evolving Pattern Modeling for Multivariate Time Series Forecasting.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

ActionX: pre-training action experts with reinforcement learning for vision-language action models.

Frontiers in neurorobotics·2026

Same author

Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Survey of Computerized Adaptive Testing: a Machine Learning Perspective.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Beyond LLaVA-HD: Diving Into High-Resolution Multimodal Large Language Models.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Learning Deep Tree-Based Retriever for Efficient Recommendation: Theory and Method.

IEEE transactions on pattern analysis and machine intelligence·2025

Same journal

Post-Moore two-dimensional integrated electronics for angstrom-nodes.

National science review·2026

Same journal

A multienzyme-mimicking nanoplatform induces disulfidptosis/cuproptosis/apoptosis for tumor therapy.

National science review·2026

Same journal

Nanogalvanic cell catalysts: bridging electrochemical and thermal catalysis.

National science review·2026

Same journal

Occupancy as a key attribute linking saprotrophic fungi to soil carbon decomposition.

National science review·2026

Same journal

Oxygen-mediated tandem polyethylene upcycling for selective aromatic synthesis.

National science review·2026

Same journal

Toxicity-informed control of global PM<sub>2.5</sub> emissions.

National science review·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 5, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

A survey on multimodal large language models.

Shukang Yin¹, Chaoyou Fu^2,3, Sirui Zhao¹

¹School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei 230026, China.

National Science Review

|December 16, 2024

Summary

This summary is machine-generated.

Multimodal large language models (MLLMs) show emergent capabilities, advancing towards artificial general intelligence. This paper surveys recent MLLM progress, covering architectures, training, and future research directions.

Keywords:

large language model multimodal large language model vision language model

More Related Videos

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

Related Experiment Videos

Last Updated: Jun 5, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Examining Online Syntactic Processing of Spoken Complex Sentences in Chinese Using Dual-Modal Interference Tasks

Published on: September 5, 2019

Area of Science:

Artificial Intelligence
Computer Vision
Natural Language Processing

Background:

Multimodal Large Language Models (MLLMs), exemplified by GPT-4V, are a rapidly advancing research area.
MLLMs leverage Large Language Models (LLMs) for complex multimodal tasks, exhibiting emergent capabilities beyond traditional methods.
The development of MLLMs is accelerating, with significant contributions from both academia and industry.

Purpose of the Study:

To provide a comprehensive overview and summary of recent advancements in Multimodal Large Language Models (MLLMs).
To delineate the fundamental formulation, architecture, training strategies, data, and evaluation metrics of MLLMs.
To explore extensions and challenges within the MLLM domain, including multimodality, multilingualism, and advanced reasoning techniques.

Main Methods:

Systematic review and summarization of current MLLM research.
Analysis of MLLM architectures, training methodologies, and datasets.
Exploration of emerging research topics such as enhanced granularity, modality support, language capabilities, and scenario applications.
Investigation of multimodal hallucination and advanced techniques like in-context learning, chain-of-thought, and LLM-aided visual reasoning.

Main Results:

Recent MLLMs demonstrate surprising emergent abilities, including image-based story generation and optical character recognition-free mathematical reasoning.
Significant progress has been made in developing MLLMs that rival or surpass existing benchmarks like GPT-4V.
The field is rapidly evolving, with ongoing efforts to enhance MLLM capabilities across various dimensions.

Conclusions:

MLLMs represent a significant step towards artificial general intelligence due to their emergent capabilities.
This survey provides a structured understanding of the MLLM landscape, highlighting key concepts and research trajectories.
Future research should address current challenges and explore promising directions to further advance MLLM technology.