Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Encoding

Encoding

Information enters the brain through encoding, which is the input of information into the memory system. Once sensory information is received from the environment, the brain labels or codes it. The information is then organized with similar information and connected to existing concepts. Encoding occurs through automatic processing and effortful processing.
Automatic processing involves the encoding of details like time, space, frequency, and the meaning of words, usually done without conscious...

Stereotype Content Model

Stereotype Content Model

The Stereotype Content Model (SCM) was first proposed by Susan Fiske and her colleagues (Fiske, Cuddy, Glick & Xu, 2002; see also Fiske, 2012 and Fiske, 2017). The SCM specifies that when someone encounters a new group, they will stereotype them based on two metrics: warmth—or that group’s perceived intent, and how likely they are to provide help or inflict harm—and competence—or their ability to carry out that objective. Depending on the warmth-competence...

Maxam-Gilbert Sequencing

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...

The Photochemical Reaction Center

The Photochemical Reaction Center

Reaction centers are pigment-protein complexes that initiate energy conversion from photons to chemical entities. Therefore, photochemical reaction center is a more appropriate term that describes these complexes. The Nobel laureates Robert Emerson and William Arnold provided the first experimental evidence of photochemical reaction centers by demonstrating the participation of nearly 2,500 chlorophyll molecules for the release of just one molecule of oxygen. Despite thousands of photosynthetic...

State Space Representation

State Space Representation

The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...

Gestalt Principles of Perception

Gestalt Principles of Perception

Gestalt principles provide a framework for understanding how humans perceive objects as unified wholes within their context. These principles are essential in explaining the cognitive processes that make sense of complex visual stimuli by organizing them into coherent groups. One fundamental principle is proximity, which posits that objects located close to each other are perceived as a collective group. For instance, when dots are positioned near one another, the visual system interprets them...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Meshed Context-Aware Beam Search for Image Captioning.

Entropy (Basel, Switzerland)·2024

Same author

[Prevalence and prognostic factors for postoperative complications of uvulopalatopharyngoplasty in patients with obstructive sleep apnea hypopnea syndrome].

Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery·2008

Same author

[Transurethral electrotomy for cystis vesicular seminalis induced by obstruction of the distal end of the ejaculatory duct].

Zhonghua nan ke xue = National journal of andrology·2008

Same author

[Effects of testosterone on the proliferation of rat corpus cavernosum cells in vitro].

Zhonghua nan ke xue = National journal of andrology·2008

Same author

Identification of 4-aminopyrazolylpyrimidines as potent inhibitors of Trk kinases.

Journal of medicinal chemistry·2008

Same author

Increased dialysate levels of phospholipids containing unsaturated fatty acid are associated with increased peritoneal transport rate.

American journal of nephrology·2008

Same journal

Research on a Regional Availability Evaluation Model for Road-Area High-Entropy Energy Based on Synergy Factors.

Entropy (Basel, Switzerland)·2026

Same journal

Atmospheric Turbulence Channel Modeling and Performance Analysis of a CO-ZP-OFDM Coherent Optical Communication System for UAV Air-to-Ground Scenarios.

Entropy (Basel, Switzerland)·2026

Same journal

Information Geometry and Asymptotic Theory for SMML Estimators.

Entropy (Basel, Switzerland)·2026

Same journal

Correlation Entropy and Power-Law Kinetics.

Entropy (Basel, Switzerland)·2026

Same journal

Research on the Contagion of Systemic Financial Risk Under the Impact of Climate Risks-From the Perspective of Complex Networks and Machine Learning.

Entropy (Basel, Switzerland)·2026

Same journal

The Statistical-Mechanical Meaning of the Wave Function of Quantum Mechanics.

Entropy (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 9, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Image Captioning Based on Semantic Scenes.

Fengzhi Zhao^1,2, Zhezhou Yu^1,2,3, Tao Wang^1,2

¹College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Entropy (Basel, Switzerland)

|October 25, 2024

Summary

This summary is machine-generated.

The Semantic Scenes Encoder (SSE) improves image captioning by integrating scene and semantic graphs, generating more accurate and comprehensive descriptions for complex visual data.

Keywords:

attention mechanism graph image captioning semantic scenes encoder

More Related Videos

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Related Experiment Videos

Last Updated: Jun 9, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Application of Deep Learning-Based Medical Image Segmentation via Orbital Computed Tomography

Published on: November 30, 2022

Area of Science:

Computer Vision
Natural Language Processing
Artificial Intelligence

Background:

Image captioning generates textual descriptions for images, crucial for applications like image retrieval and autonomous driving.
Existing region-based methods often focus on local features, neglecting overall scene understanding and leading to inaccurate captions for complex scenes.
Current methods struggle to extract complete semantic information, resulting in biased or deficient captions.

Purpose of the Study:

To address the limitations of existing image captioning methods.
To propose a novel Semantic Scenes Encoder (SSE) for generating comprehensive and accurate image captions.
To enhance the understanding of both image content and semantic relationships for improved caption generation.

Main Methods:

The Semantic Scenes Encoder (SSE) extracts a scene graph from images and integrates it into image information encoding.
A semantic graph is extracted from captions, preserving information via a learnable attention mechanism termed the 'dictionary'.
The model combines encoded image information and learned semantic information for caption generation.

Main Results:

The SSE model was evaluated on the MSCOCO dataset.
Experimental results demonstrated a significant improvement in the overall quality of generated captions.
The SSE achieved higher scores across multiple evaluation metrics, indicating superior performance in image captioning.

Conclusions:

The proposed Semantic Scenes Encoder (SSE) effectively enhances image captioning by incorporating scene and semantic graph information.
The SSE overcomes limitations of previous methods by considering global scene context and complete semantic information.
The model shows significant advantages in generating accurate and coherent captions, particularly for complex visual scenes.