Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Stereotype Content Model

Stereotype Content Model

The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...

The Anchoring-and-Adjustment Heuristic

The Anchoring-and-Adjustment Heuristic

In order to make good decisions, we use our knowledge and our reasoning. Often, this knowledge and reasoning is sound and solid. However, sometimes, we are swayed by biases or by others manipulating a situation. For example, let’s say you and three friends wanted to rent a house and had a combined target budget of $1,600. The realtor shows you only very run-down houses for $1,600 and then shows you a very nice house for $2,000. Might you ask each person to pay more in rent to get the...

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

Lateralization

Lateralization

Brain lateralization refers to the division of mental processes and functions between the two hemispheres of the brain, a phenomenon that optimizes neural efficiency and underpins complex abilities in humans. This specialization allows each hemisphere to perform tasks where it has a comparative advantage, facilitating more refined cognitive capabilities across different domains.

Cohesion

Cohesion

Cohesion is the attraction between molecules of the same type, such as water molecules. Water molecules have an overall neutral charge but are polar molecule. An oxygen atom in one water molecule has a partial negative charge that can bind to a hydrogen atom with a partial positive charge in a second water molecule, forming a hydrogen bond. Each water molecule can form up to four hydrogen bonds with other water molecules. Hydrogen bonds are responsible for water's cohesive nature.
On a...

Cooperative Allosteric Transitions

Cooperative Allosteric Transitions

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Quantum-inspired interpretable deep learning architecture for text sentiment analysis.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

Constructing Layered Double Hydroxide-Based Micro-Nano Reactors for Enhanced Nitrogen Photofixation.

Advanced materials (Deerfield Beach, Fla.)·2026

Same author

Ammonia-Assisted Photosynthesis of Ethylene Glycol.

Journal of the American Chemical Society·2025

Same author

From performance to prediction: extracting aging data from the effects of base load aging on washing machines for a machine learning model.

Scientific reports·2025

Same author

A Comprehensive Survey on Evidential Deep Learning and its Applications.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

History-Guided Prompt Generation for Vision-and-Language Navigation.

IEEE transactions on cybernetics·2025

Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

FedCAD: Cross-modal semantic alignment and distillation for cross-domain heterogeneous federated learning.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Partial-encryption-decryption-based secure state estimation of singularly perturbed complex networks: A Paillier encryption approach.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

ResVaRe: Parameter-efficient fine-tuning for large language models via cross-layer residual vector adaptation and representation editing.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Brain network construction and analysis for epilepsy: A methodology review.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 12, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Center-enhanced video captioning model with multimodal semantic alignment.

Benhui Zhang¹, Junyu Gao², Yuan Yuan³

¹School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China.

Neural Networks : the Official Journal of the International Neural Network Society

|September 26, 2024

Summary

This summary is machine-generated.

This study introduces a novel video captioning model that unifies feature extraction and caption generation. The center-enhanced approach improves multimodal alignment, leading to higher-quality video descriptions.

Keywords:

Center enhancement Multimodal semantic alignment Video captioning

More Related Videos

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Related Experiment Videos

Last Updated: Jun 12, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Area of Science:

Artificial Intelligence
Computer Vision
Natural Language Processing

Background:

Video captioning aims to generate descriptive text from video content, bridging visual and textual domains.
Existing methods often fail to effectively align multimodal features and may extract features offline, limiting their adaptability for caption generation.
Multimodal feature misalignment and inadequate feature representation hinder the performance of current video captioning systems.

Purpose of the Study:

To propose an end-to-end video captioning model that integrates feature extraction and caption generation.
To enhance the completeness of semantic features using a center enhancement strategy.
To improve multimodal semantic alignment and alleviate misalignment issues in video captioning.

Main Methods:

An end-to-end framework that unifies video feature extraction and caption generation.
A center enhancement strategy employing incremental clustering to capture deep joint semantic features, using cluster centers for caption generation guidance.
Learning visual and textual representations in a shared latent semantic space to promote multimodal alignment fusion.

Main Results:

The proposed model achieved superior performance compared to state-of-the-art methods on the MSVD and MSR-VTT datasets.
Experimental results indicate higher-quality caption generation through improved multimodal semantic alignment.
The integrated approach demonstrated enhanced applicability of extracted features for the downstream captioning task.

Conclusions:

The proposed center-enhanced video captioning model effectively addresses multimodal misalignment and improves feature representation.
The unified framework and center enhancement strategy lead to significant advancements in video captioning quality.
This research offers a promising direction for developing more accurate and robust automated video description systems.