Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Mar 15, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.7K

MS2-CL: Multi-Scale Self-Supervised Learning for Camera to LiDAR Cross-Modal Place Recognition.

Wen Liu1, Lei Ma1, Xuanshun Zhuang1

  • 1School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China.

Sensors (Basel, Switzerland)
|March 14, 2026
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

PriorNav: Prior Knowledge Enhanced Zero-Shot Goal Navigation via Multi-Step Iterative Reasoning.

Sensors (Basel, Switzerland)·2026
Same author

An Indoor UAV Localization Framework with ESKF Tightly-Coupled Fusion and Multi-Epoch UWB Outlier Rejection.

Sensors (Basel, Switzerland)·2025
Same author

Asymmetric Double-Sideband Composite Signal and Dual-Carrier Cooperative Tracking-Based High-Precision Communication-Navigation Convergence Positioning Method.

Sensors (Basel, Switzerland)·2025
Same author

A Frontier Review of Semantic SLAM Technologies Applied to the Open World.

Sensors (Basel, Switzerland)·2025
Same author

SGF-SLAM: Semantic Gaussian Filtering SLAM for Urban Road Environments.

Sensors (Basel, Switzerland)·2025
Same author

A Fault-Tolerant Localization Method for 5G/INS Based on Variational Bayesian Strong Tracking Fusion Filtering with Multilevel Fault Detection.

Sensors (Basel, Switzerland)·2025
Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026
Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026
Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026
Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026
Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026
Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026
See all related articles

This study introduces a novel method for cross-modal place recognition, enabling robots and autonomous vehicles to localize using both visual and 3D point cloud data. The approach achieves state-of-the-art performance by learning a unified embedding space, overcoming domain gaps and improving generalization.

Area of Science:

  • Robotics and Autonomous Systems
  • Computer Vision
  • Machine Learning

Background:

  • Place recognition is crucial for autonomous navigation, but cross-modal localization (e.g., visual to 3D point clouds) faces significant challenges.
  • Existing methods struggle with domain gaps, computational costs, and learning viewpoint/scale-invariant features.

Purpose of the Study:

  • To develop a robust cross-modal place recognition framework that addresses the limitations of current approaches.
  • To enable accurate visual localization within large-scale 3D point cloud maps.

Main Methods:

  • Formulated cross-modal recognition as learning a scale-invariant, unified embedding space.
  • Employed a hierarchical Swin Transformer for multi-scale feature extraction from unified 2D representations.
Keywords:
Swin Transformerautonomous drivingcross-modal place recognitionself-supervised learning

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
03:31

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

1.2K

Related Experiment Videos

Last Updated: Mar 15, 2026

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

9.7K
Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
03:31

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

1.2K
  • Utilized a multi-scale self-distillation paradigm for intra-modal knowledge transfer.
  • Achieved inter-modal alignment using a global contrastive loss on 'teacher' embeddings.
  • Main Results:

    • Achieved state-of-the-art performance on KITTI and KITTI-360 datasets.
    • Demonstrated high accuracy in visual localization within 3D point cloud maps.
    • Achieved over 60% Recall@1 on KITTI-360 without fine-tuning, using a KITTI-trained model.

    Conclusions:

    • The proposed method effectively bridges the domain gap between visual and 3D point cloud data for place recognition.
    • The scale-invariant unified embedding space and self-distillation approach enhance generalization and performance.
    • The framework offers a promising solution for reliable cross-modal localization in robotics and autonomous vehicles.