Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Relative Motion Analysis using Rotating Axes-Problem Solving01:29

Relative Motion Analysis using Rotating Axes-Problem Solving

839
Consider a crane whose telescopic boom rotates with an angular velocity of 0.04 rad/s and angular acceleration of 0.02 rad/s2. Along with the rotation, the boom also extends linearly with a uniform speed of 5 m/s. The extension of the boom is measured at point D, which is measured with respect to the fixed point C on the other end of the boom. For the given instant, the distance between points C and D is 60 meters.
Here, in order to determine the magnitude of velocity and acceleration for point...
839
Virtual Work for a System of Connected Rigid Bodies01:06

Virtual Work for a System of Connected Rigid Bodies

858
Virtual work is a powerful method used to solve problems involving several connected rigid bodies. When the system is in equilibrium, virtual work is zero. This allows the calculation of the resulting forces when a system undergoes a virtual displacement. When attempting to analyze such a system, first, use a free-body diagram, where an independent coordinate represents the configuration of the links, and mark its deflected position resulting from the positive virtual displacement.
Next,...
858
Relative Motion Analysis using Rotating Axes01:25

Relative Motion Analysis using Rotating Axes

1.0K
Consider a component AB undergoing a linear motion. Along with a linear motion, point B also rotates around point A. To comprehend this complex movement, position vectors for both points A and B are established using a stationary reference frame.
However, to express the relative position of point B relative to point A, an additional frame of reference, denoted as x'y', is necessary. This additional frame not only translates but also rotates relative to the fixed frame, making it...
1.0K
Retrieval01:12

Retrieval

588
Retrieval is the process of getting information out of memory storage and back into conscious awareness. This ability is essential for daily tasks like brushing hair and teeth, driving to work, and performing job duties. Retrieval occurs in three ways: recall, recognition, and relearning.
Recall involves accessing information without cues, such as during an essay test, where individuals must retrieve facts and concepts from memory unaided. Another example is remembering the name of a colleague...
588
Three-Dimensional Force System:Problem Solving01:30

Three-Dimensional Force System:Problem Solving

1.5K
A three-dimensional force system refers to a scenario in which three forces act simultaneously in three different directions. This type of problem is commonly encountered in physics and engineering, where it is necessary to calculate the resultant force on the system, which can then be used to predict or analyze the behavior of the object or structure under consideration.
To solve a three-dimensional force system, first resolve each force into its respective scalar components. Do this using...
1.5K
Depth Perception and Spatial Vision01:15

Depth Perception and Spatial Vision

2.7K
Depth perception is the ability to perceive objects three-dimensionally. It relies on two types of cues: binocular and monocular. Binocular cues depend on the combination of images from both eyes and how the eyes work together. Since the eyes are in slightly different positions, each eye captures a slightly different image. This disparity between images, known as binocular disparity, helps the brain interpret depth. When the brain compares these images, it determines the distance to an object.
2.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

DSTED: decoupling temporal stabilization and discriminative enhancement for surgical workflow recognition.

International journal of computer assisted radiology and surgery·2026
Same author

PDGCN: A progressive dual-branch graph convolution network for EEG emotion recognition.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

The behavior biopsy: Interpreting animal behavior as embodied, situated, and hierarchical.

Current opinion in neurobiology·2026
Same author

An interactive human PROS1 variants database provides novel insights into the genetics and phenotypes of inherited protein S deficiency.

Journal of thrombosis and haemostasis : JTH·2026
Same author

Quality formation in corn kernels during postharvest ripening: the influence of storage conditions on phenolic components and antioxidant activity.

Food chemistry·2026
Same author

Assessing Disorders of Consciousness Using Temporal Sleep Dynamics Extracted From Whole-Night PSG.

IEEE transactions on bio-medical engineering·2026
Same journal

DNA origami snaps into place.

Science robotics·2026
Same journal

A high-endurance DNA origami snap-through switch for functional nanoscale control.

Science robotics·2026
Same journal

Learning flight navigation like a honey bee.

Science robotics·2026
Same journal

Is your robot vacuum cleaner spying on you?

Science robotics·2026
Same journal

Do people feel safe in a robot's presence?

Science robotics·2026
Same journal

Stop chasing identical outcomes in HRI replication: Learn from the differences.

Science robotics·2026
See all related articles

Related Experiment Video

Updated: May 1, 2026

Photorealistic Learned Landscapes for Augmented Reality
06:54

Photorealistic Learned Landscapes for Augmented Reality

Published on: June 27, 2025

858

A retrieval-augmented framework enabling VLM spatial awareness for object-centric robot manipulation.

Kai Chen1, Chengkun Li1, Chang Tu1

  • 1Department of Computer Science and Engineering, Chinese University of Hong Kong, HKSAR, China.

Science Robotics
|April 29, 2026
PubMed
Summary
This summary is machine-generated.

Retrieval-Augmented Manipulation (RAM) enables vision-language models to perform precise robotic tasks by grounding language in 3D object representations. This framework bridges semantic understanding and geometric execution for enhanced robot intelligence.

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy
13:44

Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy

Published on: August 8, 2011

15.7K

Related Experiment Videos

Last Updated: May 1, 2026

Photorealistic Learned Landscapes for Augmented Reality
06:54

Photorealistic Learned Landscapes for Augmented Reality

Published on: June 27, 2025

858
Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy
13:44

Haptic/Graphic Rehabilitation: Integrating a Robot into a Virtual Environment Library and Applying it to Stroke Therapy

Published on: August 8, 2011

15.7K

Area of Science:

  • Robotics
  • Artificial Intelligence
  • Computer Vision

Background:

  • Vision-language models (VLMs) struggle with precise spatial reasoning for robotic manipulation.
  • Existing VLMs lack the intrinsic spatial intelligence for object placement and orientation tasks.

Purpose of the Study:

  • Introduce Retrieval-Augmented Manipulation (RAM) to bridge the semantic-to-geometric gap in robotic manipulation.
  • Equip general-purpose vision foundation models with spatial reasoning capabilities for complex tasks.

Main Methods:

  • Developed an object-centric framework (RAM) grounding abstract concepts into 3D representations.
  • Augmented VLMs with grounded 3D information to decompose instructions into precise subgoals.
  • Utilized a real-world robot for zero-shot execution of manipulation tasks.

Main Results:

  • RAM successfully executed complex spatial language instructions in a zero-shot setting.
  • Demonstrated spatially aware manipulation from a single 2D image and adaptive replanning.
  • Validated generalization to unseen objects and robustness to shape variations and occlusions on the CO3D dataset.

Conclusions:

  • RAM provides a structured bridge between semantic intent and geometric execution for robotic systems.
  • This framework is a critical step toward developing more physically intelligent and general-purpose robots.
  • The object-centric approach enhances VLM spatial reasoning for real-world manipulation challenges.