Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Deconvolution01:20

Deconvolution

425
Deconvolution, also known as inverse filtering, is the process of extracting the impulse response from known input and output signals. This technique is vital in scenarios where the system's characteristics are unknown, and they must be inferred from the observable signals.
Deconvolution involves several mathematical techniques to derive the impulse response. One common approach is polynomial division. In this method, the input and output sequences are treated as coefficients of...
425
Encoding01:19

Encoding

596
Information enters the brain through encoding, which is the input of information into the memory system. Once sensory information is received from the environment, the brain labels or codes it. The information is then organized with similar information and connected to existing concepts. Encoding occurs through automatic processing and effortful processing.
Automatic processing involves the encoding of details like time, space, frequency, and the meaning of words, usually done without conscious...
596
Extraction: Advanced Methods00:56

Extraction: Advanced Methods

850
Metal ions can be separated from one another by complexation with organic ligands–the chelating agent– to form uncharged chelates. Here, the chelating agent must contain hydrophobic groups and behave as a weak acid, losing a proton to bind with the metal. Since most organic ligands used in this process are insoluble or undergo oxidation in the aqueous phase, the chelating agent is initially added to the organic phase and extracted into the aqueous phase. The metal-ligand complex is...
850
Upsampling01:22

Upsampling

476
Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...
476
Downsampling01:20

Downsampling

468
When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...
468
State Space Representation01:27

State Space Representation

389
The frequency-domain technique, commonly used in analyzing and designing feedback control systems, is effective for linear, time-invariant systems. However, it falls short when dealing with nonlinear, time-varying, and multiple-input multiple-output systems. The time-domain or state-space approach addresses these limitations by utilizing state variables to construct simultaneous, first-order differential equations, known as state equations, for an nth-order system.
Consider an RLC circuit, a...
389

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Generalized Kullback-Leibler Divergence Loss.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

EvolveNav: Empowering LLM-Based Vision-Language Navigation via Self-Improving Embodied Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Semi-Supervised VQA Multi-Modal Explanation via Self-Critical Learning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same author

Hybrid Granularity Distribution Estimation for Few-Shot Learning: Statistics Transfer From Categories and Instances.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026
Same author

Phase 1 dose-escalation trial of sub-endometrial injection of human embryonic stem cells-derived immunity-and-matrix-regulatory cells to promote endometrial angiogenesis in refractory intrauterine adhesion.

Molecular therapy : the journal of the American Society of Gene Therapy·2025
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Nov 27, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.4K

Auto-Encoding and Distilling Scene Graphs for Image Captioning.

Xu Yang, Hanwang Zhang, Jianfei Cai

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |December 3, 2020
    PubMed
    Summary
    This summary is machine-generated.

    Scene Graph Auto-Encoder (SGAE) enhances image captioning by integrating language inductive bias, leading to more human-like descriptions. This approach achieves state-of-the-art results on the MS-COCO benchmark.

    More Related Videos

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
    03:31

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

    Published on: December 15, 2023

    824

    Related Experiment Videos

    Last Updated: Nov 27, 2025

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
    08:25

    Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

    Published on: May 7, 2019

    9.4K
    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
    03:31

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

    Published on: December 15, 2023

    824

    Area of Science:

    • Computer Vision
    • Natural Language Processing
    • Artificial Intelligence

    Background:

    • Conventional encoder-decoder image captioning models often lack human-like reasoning and descriptive capabilities.
    • Human language utilizes inductive bias for composing collocations and contextual inferences, enabling richer understanding and generation.
    • The need for image captioning models that can generate more nuanced and contextually relevant descriptions is crucial.

    Purpose of the Study:

    • To develop a Scene Graph Auto-Encoder (SGAE) that incorporates language inductive bias into image captioning.
    • To enable encoder-decoder models to generate more human-like and descriptive image captions.
    • To transfer language inductive bias effectively across vision and language domains.

    Main Methods:

    • Proposed Scene Graph Auto-Encoder (SGAE) framework utilizing scene graphs to represent image and sentence structures.
    • Employed an auto-encoding pipeline (S→ GS → D → S) to learn language priors from a dictionary set (D).
    • Implemented a vision-language pipeline (I→ GI → D → S) sharing the dictionary (D) and using knowledge distillation to transfer inductive bias to an encoder-decoder captioner.

    Main Results:

    • Achieved a new state-of-the-art 129.6 CIDEr-D score on the MS-COCO dataset (Karpathy split) with a single SGAE model.
    • Attained a competitive 126.6 CIDEr-D (c40) on the official MS-COCO server, comparable to ensemble models.
    • Demonstrated transferability and superiority of SGAE in transferring inductive bias from other language corpora and in unpaired image captioning settings.

    Conclusions:

    • SGAE effectively transfers language inductive bias to image captioning models, significantly improving caption quality and human-likeness.
    • The combination of scene graph representation, shared dictionary, and knowledge distillation is key to cross-domain bias transfer.
    • SGAE represents a significant advancement in image captioning, offering superior performance and adaptability.