Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Encoding

Encoding

Information enters the brain through encoding, which is the input of information into the memory system. Once sensory information is received from the environment, the brain labels or codes it. The information is then organized with similar information and connected to existing concepts. Encoding occurs through automatic processing and effortful processing.
Automatic processing involves the encoding of details like time, space, frequency, and the meaning of words, usually done without conscious...

Chunking and Rehearsal in Sensory Memory

Chunking and Rehearsal in Sensory Memory

Improving short-term memory can be achieved through techniques like chunking and rehearsal. Chunking involves organizing information into larger, more manageable units. This technique is particularly useful for information that exceeds the typical memory span of between five and nine items. For instance, logging into an online account with a password like "ta89vq0179gz" involves grouping letters and numbers into three chunks—ta89, vq01, and 79gz. It makes large amounts of...

Photoreceptors and Visual Pathways

Photoreceptors and Visual Pathways

At the molecular level, visual signals trigger transformations in photopigment molecules, resulting in changes in the photoreceptor cell's membrane potential. The photon's energy level is denoted by its wavelength, with each specific wavelength of visible light associated with a distinct color. The spectral range of visible light, classified as electromagnetic radiation, spans from 380 to 720 nm. Electromagnetic radiation wavelengths exceeding 720 nm fall under the infrared category,...

Sensory Modalities

Sensory Modalities

Sensation typically is the process by which the sensory receptors and sense organs detect stimuli from the internal and external environment and transmit this information to the central nervous system for processing.
General senses refer to the broad category of sensory information detected by receptors in the body and can be further grouped into somatic and visceral senses. Somatic sensations include touch, pressure, temperature, and pain and are essential for navigating our environment and...

Nonconscious Mimicry

Nonconscious Mimicry

Nonconscious mimicry occurs when individuals alter their mannerisms to match the behaviors and expressions of those nearby, without intention.

Maxam-Gilbert Sequencing

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Metal-Induced Trap States: The Roles of Interface and Border Traps in HfO<sub>2</sub>/InGaAs.

Micromachines·2023

Same author

Action Anticipation Using Pairwise Human-Object Interactions and Transformers.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2021

Same author

Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2020

Same author

Semantic Face Hallucination: Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes.

IEEE transactions on pattern analysis and machine intelligence·2019

Same author

Visual Permutation Learning.

IEEE transactions on pattern analysis and machine intelligence·2018

Same author

Action Recognition with Dynamic Image Networks.

IEEE transactions on pattern analysis and machine intelligence·2018

Same journal

Change-Prior-Guided Unsupervised Change Detection of Heterogeneous Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

AgonicDreamer: Enhancing Multi-View Consistency in Text-to-3D Generation via Rectified Score Distillation.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

BiCM-Prompt: Bidirectional Cross-Modal Prompt Tuning for Class-Incremental Learning on Multisource Remote Sensing Images.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

GoP-based Quality Enhancement on Video Compression.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Align then Tensorize: Multi-Level Consistent Anchor Graph Learning for Scalable Multi-View Clustering.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same journal

Beyond Fidelity: Diverse Image Synthesis via Retrieval-Augmented Diffusion.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 26, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Effective Multimodal Encoding for Image Paragraph Captioning.

Thanh-Son Nguyen, Basura Fernando

IEEE Transactions on Image Processing : a Publication of the IEEE Signal Processing Society

|October 10, 2022

Summary

This summary is machine-generated.

This study introduces a new method for generating image descriptions using multimodal encoding. The proposed model significantly improves captioning performance and achieves state-of-the-art results on a benchmark dataset.

More Related Videos

Interaction between Phonological and Semantic Processes in Visual Word Recognition using Electrophysiology

Interaction between Phonological and Semantic Processes in Visual Word Recognition using Electrophysiology

Published on: June 29, 2021

Using the Visual World Paradigm to Study Sentence Comprehension in Mandarin-Speaking Children with Autism

Using the Visual World Paradigm to Study Sentence Comprehension in Mandarin-Speaking Children with Autism

Published on: October 3, 2018

Related Experiment Videos

Last Updated: Aug 26, 2025

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Interaction between Phonological and Semantic Processes in Visual Word Recognition using Electrophysiology

Interaction between Phonological and Semantic Processes in Visual Word Recognition using Electrophysiology

Published on: June 29, 2021

Using the Visual World Paradigm to Study Sentence Comprehension in Mandarin-Speaking Children with Autism

Using the Visual World Paradigm to Study Sentence Comprehension in Mandarin-Speaking Children with Autism

Published on: October 3, 2018

Area of Science:

Computer Science
Artificial Intelligence
Natural Language Processing

Background:

Image captioning and paragraph generation are challenging tasks in artificial intelligence.
Existing methods often struggle to capture complex visual and sequential information for detailed descriptions.

Purpose of the Study:

To propose a novel regularization-based method for image paragraph generation.
To introduce a multimodal encoding generator (MEG) for improved contextual understanding.
To enhance image captioning model performance using MEG-generated encodings.

Main Methods:

Developed a multimodal encoding generator (MEG) to capture sentence, visual, and sequential information.
Utilized MEG-generated encoding to regularize a paragraph generation model.
Optimized the paragraph generation model with reinforcement learning.
Conducted empirical analysis, including t-distributed stochastic neighbor embedding (t-SNE) visualization and multimodal retrieval tasks.

Main Results:

The proposed MEG-regularized paragraph generation model achieved state-of-the-art results on the Stanford paragraph dataset.
The model demonstrated improvements across all evaluation metrics for image captioning.
Empirical analysis confirmed that MEG encoding captures semantic, textual, and visual information.

Conclusions:

The multimodal encoding generator (MEG) effectively enhances image paragraph generation.
Regularization with MEG encoding leads to significant improvements in captioning accuracy and relevance.
The method shows promise for generating more comprehensive and contextually rich image descriptions.