Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Using language to learn structured appearance models for image annotation.

Michael Jamieson1, Afsaneh Fazly, Suzanne Stevenson

  • 1Department of Computer Science, University of Toronto, 10 King's College Road, Room 3302, Toronto, Ontario, Canada M5S3G4. jamieson@cs.toronto.edu

IEEE Transactions on Pattern Analysis and Machine Intelligence
|November 21, 2009
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Dental, Oral and Craniofacial Tissue Regeneration Consortium (DOCTRC): An infrastructure for accelerating regenerative therapies from discovery to clinical impact.

Journal of clinical and translational science·2026
Same author

Number of senses effects are modulated by semantic and lexical factors: Evidence from megastudy analyses.

Journal of experimental psychology. Learning, memory, and cognition·2026
Same author

Probabilistic Directed Distance Fields for Ray-Based Shape Representations.

IEEE transactions on pattern analysis and machine intelligence·2025
Same author

Mitochondrial background can explain variable costs of immune deployment.

Journal of evolutionary biology·2024
Same author

Mitochondrial background can explain variable costs of immune deployment.

Journal of evolutionary biology·2024
Same author

Shape-Based Measures Improve Scene Categorization.

IEEE transactions on pattern analysis and machine intelligence·2023
Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

This study introduces a new algorithm to learn object names and appearances from cluttered images and noisy captions. The method enables robust object recognition and automatic image annotation for improved retrieval.

Area of Science:

  • Computer Vision
  • Machine Learning
  • Natural Language Processing

Background:

  • Learning object names and appearances from cluttered scenes is challenging.
  • Existing methods struggle with noisy captions and irrelevant image features.

Purpose of the Study:

  • To develop a novel algorithm for simultaneously learning object names and appearances from captioned images.
  • To create an appearance model that captures object structure and is invariant to various transformations.

Main Methods:

  • A novel algorithm using feature neighborhood repetition and caption correspondence to identify object features.
  • A graph-based appearance model encoding spatial relationships among visual features.
  • An iterative language-driven perceptual grouping process to assemble object appearance models.

Related Experiment Videos

Main Results:

  • Successfully learned object names and appearances from complex, cluttered scenes with noisy captions.
  • Developed object models invariant to translation, scale, orientation, occlusion, and minor viewpoint/articulation changes.
  • Enabled automatic annotation of new images using learned object models.

Conclusions:

  • The proposed method effectively addresses challenges in learning object representations from unconstrained visual data.
  • The learned invariant object models significantly enhance capabilities for automated image annotation and retrieval.
  • This approach advances the integration of language and vision for robust scene understanding.