Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Structure-Based Drug Discovery of Triazine Derivatives as Potent and Orally Bioavailable AXL Inhibitors for Cancer Therapy.

ACS omega·2026

Same author

Temporal and qualitative analysis of injured decomposed skin tissues using ATR-FTIR combined with chemometrics.

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy·2026

Same author

Fluorine-induced helical assembly of dipeptides and its remote magnetic alignment.

Chemical communications (Cambridge, England)·2026

Same author

Gene cloning and expression analysis based on primary culture of fin cells from Centropyge vrolikii.

Journal of fish biology·2026

Same author

Transcriptome analysis identifies key regulatory genes and temporal expression dynamics during embryonic development in the Japanese eel (Anguilla japonica).

Molecular genetics and genomics : MGG·2026

Same author

Evaluation of a Frustrated Total Internal Reflection (FTIR) based balance sensor for objective fall risk assessment in older adults: a study protocol.

BMC geriatrics·2026

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 10, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera.

Jiatong Bao¹, Yunyi Jia², Yu Cheng³

¹Department of Hydraulic, Energy and Power Engineering, Yangzhou University, Yangzhou 225000, China. jtbao@yzu.edu.cn.

Sensors (Basel, Switzerland)

|December 17, 2016

Summary

This summary is machine-generated.

This study presents a robust method for robots to understand natural language (NL) instructions for object detection using RGB-D cameras. The approach effectively grounds objects by matching NL cues with visual scene information, enabling practical robotic manipulation.

Keywords:

natural language control natural language processing object grounding object recognition robotic manipulation system target object detection

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster Nephrops norvegicus

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster Nephrops norvegicus

Published on: April 8, 2019

Related Experiment Videos

Last Updated: Mar 10, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster Nephrops norvegicus

Long-term Video Tracking of Cohoused Aquatic Animals: A Case Study of the Daily Locomotor Activity of the Norway Lobster Nephrops norvegicus

Published on: April 8, 2019

Area of Science:

Robotics
Computer Vision
Artificial Intelligence

Background:

Natural language (NL) control offers intuitive robot interaction, but object grounding remains a significant challenge.
Enabling robots to accurately identify objects from human instructions is crucial for versatile robotic applications.

Purpose of the Study:

To develop and evaluate a method for precise object grounding in robotic manipulation using natural language instructions and RGB-D camera data.
To enhance the robot's ability to interpret and act upon complex human commands by accurately detecting target objects.

Main Methods:

A vision algorithm segments objects from RGB-D data, extracting attributes and spatial relations.
Natural language instructions are parsed into domain-specific annotations, incorporating multiple object specification cues.
A computational state estimation framework matches linguistic annotations with visual scene information to determine object grounding probabilities.

Main Results:

The proposed method successfully grounds target objects by integrating natural language understanding with visual perception.
Quantitative evaluations on a custom RGB-D dataset demonstrate the method's effectiveness and superiority.
Experiments in natural language-controlled object manipulation and task programming showcase practical viability.

Conclusions:

The developed object grounding technique significantly improves robot comprehension of natural language instructions.
This approach facilitates more intuitive and effective human-robot interaction for manipulation and task programming.
The method proves effective and practical for real-world robotic applications.