Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Transformers01:26

Transformers

2.2K
A device that transforms voltages from one value to another using induction is called a transformer. A transformer consists of two separate coils, or windings, wrapped around the same soft iron core. However, they are electrically insulated from each other.
The iron core has a substantial relative permeability. Therefore, the magnetic field lines generated due to the current in one winding are almost entirely confined within the core, such that the same magnetic flux permeates each turn of both...
2.2K
Types Of Transformers01:16

Types Of Transformers

1.7K
Transformers can provide desired voltages to a circuit by modifying the number of turns in the secondary windings.
If the ratio of the number of turns in the secondary winding to that of the primary winding is greater than one, then the transformer is said to be a step-up transformer. In a step-up transformer, the voltage at the secondary winding is greater than the voltage applied at the primary winding.
However, if this ratio is less than one, the transformer is said to be a step-down...
1.7K
Improving Translational Accuracy02:07

Improving Translational Accuracy

15.4K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
15.4K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.8K
3.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The Multifaceted Potential of Adhatoda vasica Nees: Traditional Uses, Pharmacological Activities and Biotechnological Applications.

Mini reviews in medicinal chemistry·2026
Same author

Design and optimization of highly sensitive and tunable nanostructure biosensor for heavy metal detection using machine learning.

Discover nano·2026
Same author

Smart graphene-enhanced ceramic material refractive index sensor simulation design developed for highly sensitive breast Cancer detection optimized with machine learning.

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy·2026
Same author

Plasmonic SPR metamaterial sensor optimized via machine learning for sensitive organic chemical detection in wastewater.

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy·2026
Same author

Next-generation advanced surface plasmon resonance biosensor for dopamine detection with ZnO-ag multilayer design: Machine learning optimization for high sensitivity.

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy·2026
Same author

Scalable architecture for autonomous malware detection and defense in software-defined networks using federated learning approaches.

Scientific reports·2025
Same journal

Mental health of healthcare workers in England during the first three years of the COVID-19 pandemic: The NHS CHECK study cohort.

PloS one·2026
Same journal

Research on trajectory tracking control of tracked vehicles based on hydraulic motor system identification and Laguerre-MPC.

PloS one·2026
Same journal

A collaborative cervical precancer screening strategy with concurrent HPV genotyping and visual inspection using alumni of a training centre across Ghana: The Rotary 'Protect Your Pearl' initiative.

PloS one·2026
Same journal

Removal efficiency of pesticide residues on pesticide-spiked Perilla Leaf and Broccoli surfaces using microplasma-treated water.

PloS one·2026
Same journal

Cross-domain zero-shot semantic segmentation for unstructured environments via EVA-CLIP model, ensemble prompt engineering, and optimized text-image matching.

PloS one·2026
Same journal

Adaptive robust sparse representation for face recognition based on weighted and fusion dictionary.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Mar 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K

Deep learning-driven image captioning: Progress through transformers and large language models.

Priyanka Panchal1, Vishal Polara2, Siddaraj U3

  • 1Department of Information Technology, Madhuben and Bhanubhai Patel Institute of Technology, The Charutar Vidya Mandal (CVM) University, New Vallabh Vidya Nagar, Gujarat, India.

Plos One
|March 16, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces a new deep learning model for image captioning, outperforming existing methods with advanced vision transformers and LLMs. The novel cross-attention mechanism enhances visual-linguistic alignment for more human-like image descriptions.

More Related Videos

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
04:23

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

2.4K

Related Experiment Videos

Last Updated: Mar 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images
04:23

A Swin Transformer-Based Model for Thyroid Nodule Detection in Ultrasound Images

Published on: April 21, 2023

2.4K

Area of Science:

  • Artificial Intelligence
  • Computer Vision
  • Natural Language Processing

Background:

  • Traditional Convolutional Neural Network-Recurrent Neural Network (CNN-RNN) hybrids and existing transformer models have limitations in image captioning.
  • Achieving robust multimodal alignment and enhancing caption diversity remain key challenges in the field.

Purpose of the Study:

  • To propose a novel deep learning model for image captioning using an advanced vision transformer architecture and a powerful Large Language Model (LLM).
  • To improve the alignment between linguistic context and visual features through a unique cross-attention mechanism.

Main Methods:

  • Development of a novel deep learning architecture integrating a vision transformer with an LLM.
  • Implementation of a unique cross-attention mechanism for deep alignment between visual and linguistic features.
  • Extensive evaluation on benchmark datasets including MSCOCO, Flickr30K, and NoCaps.

Main Results:

  • The proposed model demonstrates significant improvements over traditional and existing transformer-based approaches.
  • Achieved state-of-the-art performance comparable to leading methods like GIT, BLIP-2, and CoCa on MSCOCO, Flickr30K, and NoCaps.
  • Specific metrics on MS COCO include BLEU-4 (0.495), METEOR (0.390), and CIDEr (1.32).

Conclusions:

  • The novel architecture sets a new performance benchmark for image captioning systems.
  • The fusion strategy proves efficient, enabling more precise, contextually rich, and human-like image descriptions.
  • This work advances multimodal AI systems, supporting Sustainable Development Goals 9 and 4.