Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Using language to learn structured appearance models for image annotation.

Michael Jamieson¹, Afsaneh Fazly, Suzanne Stevenson

¹Department of Computer Science, University of Toronto, 10 King's College Road, Room 3302, Toronto, Ontario, Canada M5S3G4. jamieson@cs.toronto.edu

IEEE Transactions on Pattern Analysis and Machine Intelligence

|November 21, 2009

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Dental, Oral and Craniofacial Tissue Regeneration Consortium (DOCTRC): An infrastructure for accelerating regenerative therapies from discovery to clinical impact.

Journal of clinical and translational science·2026

Same author

Number of senses effects are modulated by semantic and lexical factors: Evidence from megastudy analyses.

Journal of experimental psychology. Learning, memory, and cognition·2026

Same author

Probabilistic Directed Distance Fields for Ray-Based Shape Representations.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

Mitochondrial background can explain variable costs of immune deployment.

Journal of evolutionary biology·2024

Same author

Mitochondrial background can explain variable costs of immune deployment.

Journal of evolutionary biology·2024

Same author

Shape-Based Measures Improve Scene Categorization.

IEEE transactions on pattern analysis and machine intelligence·2023

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

This study introduces a new algorithm to learn object names and appearances from cluttered images and noisy captions. The method enables robust object recognition and automatic image annotation for improved retrieval.

Area of Science:

Computer Vision
Machine Learning
Natural Language Processing

Background:

Learning object names and appearances from cluttered scenes is challenging.
Existing methods struggle with noisy captions and irrelevant image features.

Purpose of the Study:

To develop a novel algorithm for simultaneously learning object names and appearances from captioned images.
To create an appearance model that captures object structure and is invariant to various transformations.

Main Methods:

A novel algorithm using feature neighborhood repetition and caption correspondence to identify object features.
A graph-based appearance model encoding spatial relationships among visual features.
An iterative language-driven perceptual grouping process to assemble object appearance models.

Related Experiment Videos

Main Results:

Successfully learned object names and appearances from complex, cluttered scenes with noisy captions.
Developed object models invariant to translation, scale, orientation, occlusion, and minor viewpoint/articulation changes.
Enabled automatic annotation of new images using learned object models.

Conclusions:

The proposed method effectively addresses challenges in learning object representations from unconstrained visual data.
The learned invariant object models significantly enhance capabilities for automated image annotation and retrieval.
This approach advances the integration of language and vision for robust scene understanding.