Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Vision01:24

Vision

52.8K
Vision is the result of light being detected and transduced into neural signals by the retina of the eye. This information is then further analyzed and interpreted by the brain. First, light enters the front of the eye and is focused by the cornea and lens onto the retina—a thin sheet of neural tissue lining the back of the eye. Because of refraction through the convex lens of the eye, images are projected onto the retina upside-down and reversed.
52.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

IRS-Assisted Dual-Mode Relay-Based Adaptive Transmission.

Sensors (Basel, Switzerland)·2025
Same author

Impact of intraoperative neurophysiological monitoring and anesthesia management parameters on postoperative recovery in patients undergoing complex intracranial aneurysm surgery.

Journal of clinical neuroscience : official journal of the Neurosurgical Society of Australasia·2025
Same author

Virtual Signal Processing-Based Integrated Multi-User Detection.

Sensors (Basel, Switzerland)·2025
Same author

DFSP: A fast and automatic distance field-based stem-leaf segmentation pipeline for point cloud of maize shoot.

Frontiers in plant science·2023
Same author

A Transmission Efficiency Evaluation Method of Adaptive Coding Modulation for Ka-Band Data-Transmission of LEO EO Satellites.

Sensors (Basel, Switzerland)·2022
Same author

Panchromatic Image Super-Resolution Via Self Attention-Augmented Wasserstein Generative Adversarial Network.

Sensors (Basel, Switzerland)·2021
Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026
Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026
Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026
Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026
Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026
Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: May 22, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

8.9K

Feature refinement and rethinking attention for remote sensing image captioning.

Yunpeng Li1,2, Chengjin Tao1,2, Meng Liu1,2

  • 1The Jiangsu Province Engineering Research Center of Integrated Circuit Reliability Technology and Testing System, Wuxi University, Wuxi, 214105, China.

Scientific Reports
|March 14, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a novel framework for remote sensing image captioning that refines features and uses rethinking attention. The approach improves accuracy by allowing models to reconsider visual information, leading to better descriptions.

Keywords:
Feature refinementRemote sensing image captioningRethinking attention mechanismVision-languageVisual perception

More Related Videos

Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments
13:00

Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments

Published on: January 23, 2017

9.8K
Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking
05:58

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Published on: August 29, 2018

8.8K

Related Experiment Videos

Last Updated: May 22, 2025

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment
08:25

Combining Eye-tracking Data with an Analysis of Video Content from Free-viewing a Video of a Walk in an Urban Park Environment

Published on: May 7, 2019

8.9K
Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments
13:00

Measuring Attention and Visual Processing Speed by Model-based Analysis of Temporal-order Judgments

Published on: January 23, 2017

9.8K
Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking
05:58

Using Rapid Serial Visual Presentation to Measure Set-Specific Capture, a Consequence of Distraction While Multitasking

Published on: August 29, 2018

8.8K

Area of Science:

  • Computer Science
  • Artificial Intelligence
  • Remote Sensing

Background:

  • Attention mechanisms are crucial for remote sensing image captioning but struggle with restrictive assumptions and weak object correlations.
  • Existing visual feature extractors can fail when object relationships are not clearly defined.

Purpose of the Study:

  • To develop an advanced framework for remote sensing image captioning that overcomes limitations of current attention-driven models.
  • To enhance the accuracy and robustness of image captioning by refining visual features and enabling a 'rethinking' attention process.

Main Methods:

  • A feature refinement module interacts grid-level features using a refinement gate to weaken irrelevant visual information.
  • A rethinking attention mechanism with a rethinking LSTM layer allows for spontaneous focus on multiple regions for single-word prediction.
  • A confidence rectification strategy is employed to model rethinking attention and learn discriminative contextual representations.

Main Results:

  • The proposed framework demonstrated superior performance across four benchmark datasets: NWPU-Captions, RSICD, UCM-Captions, and Sydney-Captions.
  • Significant improvements were achieved, particularly on the NWPU-Captions dataset, highlighting the effectiveness of the approach.

Conclusions:

  • The feature refinement and rethinking attention framework offers a more robust and effective solution for remote sensing image captioning.
  • The model's ability to reconsider visual focus and refine features leads to more accurate and contextually rich image descriptions.