Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Deconvolution

Deconvolution

Deconvolution, also known as inverse filtering, is the process of extracting the impulse response from known input and output signals. This technique is vital in scenarios where the system's characteristics are unknown, and they must be inferred from the observable signals.
Deconvolution involves several mathematical techniques to derive the impulse response. One common approach is polynomial division. In this method, the input and output sequences are treated as coefficients of...

Encoding

Encoding

Information enters the brain through encoding, which is the input of information into the memory system. Once sensory information is received from the environment, the brain labels or codes it. The information is then organized with similar information and connected to existing concepts. Encoding occurs through automatic processing and effortful processing.
Automatic processing involves the encoding of details like time, space, frequency, and the meaning of words, usually done without conscious...

Extraction: Advanced Methods

Extraction: Advanced Methods

Metal ions can be separated from one another by complexation with organic ligands–the chelating agent– to form uncharged chelates. Here, the chelating agent must contain hydrophobic groups and behave as a weak acid, losing a proton to bind with the metal. Since most organic ligands used in this process are insoluble or undergo oxidation in the aqueous phase, the chelating agent is initially added to the organic phase and extracted into the aqueous phase. The metal-ligand complex is...

Upsampling

Upsampling

Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...

Downsampling

Downsampling

When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...

State Space Representation

State Space Representation

The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Generalized Kullback-Leibler Divergence Loss.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Semi-Supervised VQA Multi-Modal Explanation via Self-Critical Learning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Hybrid Granularity Distribution Estimation for Few-Shot Learning: Statistics Transfer From Categories and Instances.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

Phase 1 dose-escalation trial of sub-endometrial injection of human embryonic stem cells-derived immunity-and-matrix-regulatory cells to promote endometrial angiogenesis in refractory intrauterine adhesion.

Molecular therapy : the journal of the American Society of Gene Therapy·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 27, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Auto-Encoding and Distilling Scene Graphs for Image Captioning.

Xu Yang, Hanwang Zhang, Jianfei Cai

IEEE Transactions on Pattern Analysis and Machine Intelligence

|December 3, 2020

Summary

This summary is machine-generated.

Scene Graph Auto-Encoder (SGAE) enhances image captioning by integrating language inductive bias, leading to more human-like descriptions. This approach achieves state-of-the-art results on the MS-COCO benchmark.

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Related Experiment Videos

Last Updated: Nov 27, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Area of Science:

Computer Vision
Natural Language Processing
Artificial Intelligence

Background:

Conventional encoder-decoder image captioning models often lack human-like reasoning and descriptive capabilities.
Human language utilizes inductive bias for composing collocations and contextual inferences, enabling richer understanding and generation.
The need for image captioning models that can generate more nuanced and contextually relevant descriptions is crucial.

Purpose of the Study:

To develop a Scene Graph Auto-Encoder (SGAE) that incorporates language inductive bias into image captioning.
To enable encoder-decoder models to generate more human-like and descriptive image captions.
To transfer language inductive bias effectively across vision and language domains.

Main Methods:

Proposed Scene Graph Auto-Encoder (SGAE) framework utilizing scene graphs to represent image and sentence structures.
Employed an auto-encoding pipeline (S→ G_S → D → S) to learn language priors from a dictionary set (D).
Implemented a vision-language pipeline (I→ G_I → D → S) sharing the dictionary (D) and using knowledge distillation to transfer inductive bias to an encoder-decoder captioner.

Main Results:

Achieved a new state-of-the-art 129.6 CIDEr-D score on the MS-COCO dataset (Karpathy split) with a single SGAE model.
Attained a competitive 126.6 CIDEr-D (c40) on the official MS-COCO server, comparable to ensemble models.
Demonstrated transferability and superiority of SGAE in transferring inductive bias from other language corpora and in unpaired image captioning settings.

Conclusions:

SGAE effectively transfers language inductive bias to image captioning models, significantly improving caption quality and human-likeness.
The combination of scene graph representation, shared dictionary, and knowledge distillation is key to cross-domain bias transfer.
SGAE represents a significant advancement in image captioning, offering superior performance and adaptability.