Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Stereotype Content Model02:16

Stereotype Content Model

14.0K
The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...
14.0K
The Anchoring-and-Adjustment Heuristic01:25

The Anchoring-and-Adjustment Heuristic

7.2K
In order to make good decisions, we use our knowledge and our reasoning. Often, this knowledge and reasoning is sound and solid. However, sometimes, we are swayed by biases or by others manipulating a situation. For example, let’s say you and three friends wanted to rent a house and had a combined target budget of $1,600. The realtor shows you only very run-down houses for $1,600 and then shows you a very nice house for $2,000. Might you ask each person to pay more in rent to get the...
7.2K
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

3.9K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
3.9K
Lateralization01:28

Lateralization

314
Brain lateralization refers to the division of mental processes and functions between the two hemispheres of the brain, a phenomenon that optimizes neural efficiency and underpins complex abilities in humans. This specialization allows each hemisphere to perform tasks where it has a comparative advantage, facilitating more refined cognitive capabilities across different domains.
314
Cohesion01:07

Cohesion

54.1K
Cohesion is the attraction between molecules of the same type, such as water molecules. Water molecules have an overall neutral charge but are polar molecule. An oxygen atom in one water molecule has a partial negative charge that can bind to a hydrogen atom with a partial positive charge in a second water molecule, forming a hydrogen bond. Each water molecule can form up to four hydrogen bonds with other water molecules. Hydrogen bonds are responsible for water's cohesive nature.
On a...
54.1K
Cooperative Allosteric Transitions01:58

Cooperative Allosteric Transitions

2.5K
2.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Quantum-inspired interpretable deep learning architecture for text sentiment analysis.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

Constructing Layered Double Hydroxide-Based Micro-Nano Reactors for Enhanced Nitrogen Photofixation.

Advanced materials (Deerfield Beach, Fla.)·2026
Same author

Ammonia-Assisted Photosynthesis of Ethylene Glycol.

Journal of the American Chemical Society·2025
Same author

From performance to prediction: extracting aging data from the effects of base load aging on washing machines for a machine learning model.

Scientific reports·2025
Same author

A Comprehensive Survey on Evidential Deep Learning and its Applications.

IEEE transactions on pattern analysis and machine intelligence·2025
Same author

History-Guided Prompt Generation for Vision-and-Language Navigation.

IEEE transactions on cybernetics·2025
Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

FedCAD: Cross-modal semantic alignment and distillation for cross-domain heterogeneous federated learning.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Partial-encryption-decryption-based secure state estimation of singularly perturbed complex networks: A Paillier encryption approach.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

ResVaRe: Parameter-efficient fine-tuning for large language models via cross-layer residual vector adaptation and representation editing.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Brain network construction and analysis for epilepsy: A methodology review.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: Jun 12, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

516

Center-enhanced video captioning model with multimodal semantic alignment.

Benhui Zhang1, Junyu Gao2, Yuan Yuan3

  • 1School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China; School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an 710072, China.

Neural Networks : the Official Journal of the International Neural Network Society
|September 26, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces a novel video captioning model that unifies feature extraction and caption generation. The center-enhanced approach improves multimodal alignment, leading to higher-quality video descriptions.

Keywords:
Center enhancementMultimodal semantic alignmentVideo captioning

More Related Videos

Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K
Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

8.9K

Related Experiment Videos

Last Updated: Jun 12, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

516
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K
Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

8.9K

Area of Science:

  • Artificial Intelligence
  • Computer Vision
  • Natural Language Processing

Background:

  • Video captioning aims to generate descriptive text from video content, bridging visual and textual domains.
  • Existing methods often fail to effectively align multimodal features and may extract features offline, limiting their adaptability for caption generation.
  • Multimodal feature misalignment and inadequate feature representation hinder the performance of current video captioning systems.

Purpose of the Study:

  • To propose an end-to-end video captioning model that integrates feature extraction and caption generation.
  • To enhance the completeness of semantic features using a center enhancement strategy.
  • To improve multimodal semantic alignment and alleviate misalignment issues in video captioning.

Main Methods:

  • An end-to-end framework that unifies video feature extraction and caption generation.
  • A center enhancement strategy employing incremental clustering to capture deep joint semantic features, using cluster centers for caption generation guidance.
  • Learning visual and textual representations in a shared latent semantic space to promote multimodal alignment fusion.

Main Results:

  • The proposed model achieved superior performance compared to state-of-the-art methods on the MSVD and MSR-VTT datasets.
  • Experimental results indicate higher-quality caption generation through improved multimodal semantic alignment.
  • The integrated approach demonstrated enhanced applicability of extracted features for the downstream captioning task.

Conclusions:

  • The proposed center-enhanced video captioning model effectively addresses multimodal misalignment and improves feature representation.
  • The unified framework and center enhancement strategy lead to significant advancements in video captioning quality.
  • This research offers a promising direction for developing more accurate and robust automated video description systems.