Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Relative Motion Analysis using Rotating Axes-Problem Solving

Relative Motion Analysis using Rotating Axes-Problem Solving

Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...

Virtual Work for a System of Connected Rigid Bodies

Virtual Work for a System of Connected Rigid Bodies

Virtual work is a powerful method used to solve problems involving several connected rigid bodies. When the system is in equilibrium, virtual work is zero. This allows the calculation of the resulting forces when a system undergoes a virtual displacement. When attempting to analyze such a system, first, use a free-body diagram, where an independent coordinate represents the configuration of the links, and mark its deflected position resulting from the positive virtual displacement.
Next,...

Relative Motion Analysis using Rotating Axes

Relative Motion Analysis using Rotating Axes

Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...

Retrieval

Retrieval

Retrieval is the process of getting information out of memory storage and back into conscious awareness. This ability is essential for daily tasks like brushing hair and teeth, driving to work, and performing job duties. Retrieval occurs in three ways: recall, recognition, and relearning.
Recall involves accessing information without cues, such as during an essay test, where individuals must retrieve facts and concepts from memory unaided. Another example is remembering the name of a colleague...

Three-Dimensional Force System:Problem Solving

Three-Dimensional Force System:Problem Solving

A three-dimensional force system refers to a scenario in which three forces act simultaneously in three different directions. This type of problem is commonly encountered in physics and engineering, where it is necessary to calculate the resultant force on the system, which can then be used to predict or analyze the behavior of the object or structure under consideration.
To solve a three-dimensional force system, first resolve each force into its respective scalar components. Do this using...

Depth Perception and Spatial Vision

Depth Perception and Spatial Vision

Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

DSTED: decoupling temporal stabilization and discriminative enhancement for surgical workflow recognition.

International journal of computer assisted radiology and surgery·2026

Same author

PDGCN: A progressive dual-branch graph convolution network for EEG emotion recognition.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

The behavior biopsy: Interpreting animal behavior as embodied, situated, and hierarchical.

Current opinion in neurobiology·2026

Same author

An interactive human PROS1 variants database provides novel insights into the genetics and phenotypes of inherited protein S deficiency.

Journal of thrombosis and haemostasis : JTH·2026

Same author

Quality formation in corn kernels during postharvest ripening: the influence of storage conditions on phenolic components and antioxidant activity.

Food chemistry·2026

Same author

Assessing Disorders of Consciousness Using Temporal Sleep Dynamics Extracted From Whole-Night PSG.

IEEE transactions on bio-medical engineering·2026

Same journal

DNA origami snaps into place.

Science robotics·2026

Same journal

A high-endurance DNA origami snap-through switch for functional nanoscale control.

Science robotics·2026

Same journal

Learning flight navigation like a honey bee.

Science robotics·2026

Same journal

Is your robot vacuum cleaner spying on you?

Science robotics·2026

Same journal

Do people feel safe in a robot's presence?

Science robotics·2026

Same journal

Stop chasing identical outcomes in HRI replication: Learn from the differences.

Science robotics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 1, 2026

Photorealistic Learned Landscapes for Augmented Reality

Photorealistic Learned Landscapes for Augmented Reality

Published on: June 27, 2025

A retrieval-augmented framework enabling VLM spatial awareness for object-centric robot manipulation.

Kai Chen¹, Chengkun Li¹, Chang Tu¹

¹Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China.

Science Robotics

|April 29, 2026

Summary

This summary is machine-generated.

Retrieval-Augmented Manipulation (RAM) enables vision-language models to perform precise robotic tasks by grounding language in 3D object representations. This framework bridges semantic understanding and geometric execution for enhanced robot intelligence.

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy

Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy

Published on: August 8, 2011

Related Experiment Videos

Last Updated: May 1, 2026

Photorealistic Learned Landscapes for Augmented Reality

Photorealistic Learned Landscapes for Augmented Reality

Published on: June 27, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy

Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy

Published on: August 8, 2011

Area of Science:

Robotics
Artificial Intelligence
Computer Vision

Background:

Vision-language models (VLMs) struggle with precise spatial reasoning for robotic manipulation.
Existing VLMs lack the intrinsic spatial intelligence for object placement and orientation tasks.

Purpose of the Study:

Introduce Retrieval-Augmented Manipulation (RAM) to bridge the semantic-to-geometric gap in robotic manipulation.
Equip general-purpose vision foundation models with spatial reasoning capabilities for complex tasks.

Main Methods:

Developed an object-centric framework (RAM) grounding abstract concepts into 3D representations.
Augmented VLMs with grounded 3D information to decompose instructions into precise subgoals.
Utilized a real-world robot for zero-shot execution of manipulation tasks.

Main Results:

RAM successfully executed complex spatial language instructions in a zero-shot setting.
Demonstrated spatially aware manipulation from a single 2D image and adaptive replanning.
Validated generalization to unseen objects and robustness to shape variations and occlusions on the CO3D dataset.

Conclusions:

RAM provides a structured bridge between semantic intent and geometric execution for robotic systems.
This framework is a critical step toward developing more physically intelligent and general-purpose robots.
The object-centric approach enhances VLM spatial reasoning for real-world manipulation challenges.