Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Correction to: SIRT3 activation protects from nabumetone-induced mitochondrial toxicity in adult human cardiomyocytes.

Cellular and molecular life sciences : CMLS·2026

Same author

Research on the location model of emergency rescue facilities based on disaster risk-A Case study of earthquake disaster.

PloS one·2026

Same author

CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne·2026

Same author

SHP2 as a pivotal modulator of the tumor microenvironment in gastrointestinal cancers: from mechanisms to targeted therapies.

Journal of translational medicine·2026

Same author

The cGAS-STING pathway contributes to cisplatin-induced skeletal muscle atrophy through altered proteostasis and myogenic signaling.

Cell communication and signaling : CCS·2026

Same author

Mechanical and Thermal Properties of AlN-SiC Composite Ceramics Fabricated by In Situ Reaction Hot-Pressing Sintering.

Materials (Basel, Switzerland)·2026

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 5, 2026

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained

Xiaoqing Zhao¹, Miaomiao Xu¹, Wushour Silamu¹

¹College of Computer Science and Technology, Xinjiang University, No. 777 Huarui Street, Urumqi 830017, China.

Sensors (Basel, Switzerland)

|November 27, 2024

Summary

This summary is machine-generated.

This study introduces CLIP-Llama, a novel approach for Scene Text Recognition (STR) that leverages CLIP and Llama2-7B. CLIP-Llama achieves state-of-the-art results on 11 benchmarks, enhancing AI applications.

Keywords:

pre-trained language model scene text recognition vision-language model

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: May 5, 2026

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Eye Tracking During Visually Situated Language Comprehension: Flexibility and Limitations in Uncovering Visual Context Effects

Published on: November 30, 2018

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Computer Vision
Artificial Intelligence
Natural Language Processing

Background:

Scene Text Recognition (STR) is vital for AI applications like image retrieval and intelligent transportation.
Pre-trained vision-language models are foundational for downstream AI tasks.
CLIP demonstrates robust text recognition across regular and irregular text formats.

Purpose of the Study:

To introduce CLIP-Llama, a novel STR model integrating CLIP and Llama2-7B.
To enhance STR accuracy by combining visual and cross-modal information.
To establish a strong foundation for future STR research using vision-language models.

Main Methods:

Utilized CLIP's image and text encoders with two branches: visual and cross-modal.
Incorporated Llama2-7B in the cross-modal branch for refining predictions.
Employed a dual prediction and refinement decoding scheme for improved inference.

Main Results:

CLIP-Llama achieved state-of-the-art performance on 11 Scene Text Recognition benchmark tests.
Demonstrated robust capabilities in recognizing diverse text in natural images.
Showcased the effectiveness of the dual-branch architecture and Llama2-7B integration.

Conclusions:

CLIP-Llama offers a significant advancement in Scene Text Recognition.
The model's performance highlights the potential of integrating large language models with vision-language models for STR.
This work provides a solid foundation for future research in AI-driven text recognition.