Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

93
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
93
Classification of Signals01:30

Classification of Signals

374
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
374
Perceiving Loudness, Pitch, and Location01:21

Perceiving Loudness, Pitch, and Location

182
The human brain perceives pitch through two primary mechanisms reflected in place theory and frequency theory. Each mechanism describes how sound waves are interpreted as specific pitches by the brain, offering insights into the intricate processes of auditory perception.
Place theory, or place coding, suggests that different pitches are heard because various sound waves activate specific locations along the cochlea's basilar membrane. The brain determines the pitch of a sound by...
182
Improving Translational Accuracy02:07

Improving Translational Accuracy

8.5K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
8.5K
Model Approaches for Pharmacokinetic Data: Distributed Parameter Models01:06

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

54
Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...
54

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Dual-chirality flexagon linkages with infinite eversion and surface reconfigurability.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Comprehensive framework for evaluation of deep neural networks in detection and quantification of lymphoma from PET/CT images: Clinical insights, pitfalls, and observer agreement analyses.

Physica medica : PM : an international journal devoted to the applications of physics to medicine and biology : official journal of the Italian Association of Biomedical Physics (AIFB)·2026
Same author

An accurate, efficient, and accessible AI-powered solution for wildlife re-identification in conservation.

Scientific reports·2026
Same author

A fully automated framework for acoustic identification and localization of terrestrial wildlife at scale.

Communications biology·2026
Same author

Towards reliable use of artificial intelligence to classify otitis media using otoscopic images: Addressing bias and improving data quality.

PloS one·2026
Same author

Full-laser-enabled clean hierarchical structuring and multifunctional synergy for high-performance in vivo 3D-printed implants.

Materials today. Bio·2026
Same journal

Therapeutic potential of crude protein extracts from two Egyptian freshwater snails Lanistes carinatus and Bellamya unicolor.

Scientific reports·2026
Same journal

Microbial contamination of donor corneas and post-keratoplasty endophthalmitis: a comparison between Japanese and U.S. eye banks using cold storage.

Scientific reports·2026
Same journal

Prevalence and contributing factors of virological non-suppression among adult patients on first-line antiretroviral therapy in tertiary hospitals in Ethiopia.

Scientific reports·2026
Same journal

An in vitro comparison of color stability between alkasite and different restorative materials in various staining solutions.

Scientific reports·2026
Same journal

Toward accessible mRNA LNP formulation: systematic evaluation of mixing strategies and key parameters.

Scientific reports·2026
Same journal

A network analysis of personality traits, mentalizing, and psychological health in Chinese college students.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: May 24, 2025

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

385

Multi-modal Language models in bioacoustics with zero-shot transfer: a case study.

Zhongqi Miao1, Benjamin Elizalde2, Soham Deshmukh2

  • 1AI for Good Lab, 1 Microsoft Way, Microsoft, Redmond, WA, 98052, USA. zhongqimiao@microsoft.com.

Scientific Reports
|February 28, 2025
PubMed
Summary
This summary is machine-generated.

Multi-Modal Language Models like CLAP show promise for AI-driven wildlife monitoring, recognizing broad sound categories without extensive training. This approach offers new possibilities beyond traditional supervised methods in bioacoustics.

Keywords:
Artificial IntelligenceAudio-Language modelsBioacousticsMulti-modal Language modelsWildlife conservationZero-shot transfer

More Related Videos

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.4K
Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology
09:44

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

4.6K

Related Experiment Videos

Last Updated: May 24, 2025

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

385
Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception
05:48

Author Spotlight: Investigating the Impact of Emotional Prosodies on Voice Recognition and Perception

Published on: August 9, 2024

1.4K
Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology
09:44

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

4.6K

Area of Science:

  • Bioacoustics
  • Ecoacoustics
  • Soundscape Ecology
  • Artificial Intelligence (AI)
  • Machine Learning

Background:

  • Traditional AI methods for wildlife monitoring rely on supervised learning, requiring extensive manual annotation of bioacoustic data.
  • Manual annotation is labor-intensive, costly, and demands significant domain expertise, limiting AI deployment in real-world conservation.
  • Supervised learning is restricted to predefined categories, hindering adaptability to novel or diverse acoustic environments.

Purpose of the Study:

  • To explore the potential and limitations of Multi-Modal Language Models (MMLMs) in bioacoustic applications.
  • To showcase how MMLMs can overcome challenges associated with traditional supervised learning in wildlife sound detection.
  • To evaluate the zero-shot transfer capabilities of an Audio-Language Model for bioacoustic monitoring.

Main Methods:

  • Applied the Contrastive Language-Audio Pretraining (CLAP) model, an Audio-Language Model, to eight diverse bioacoustic benchmarks.
  • Utilized simple prompt engineering to guide the CLAP model's recognition capabilities.
  • Evaluated CLAP's performance on recognizing group-level sound categories without model fine-tuning or additional training.

Main Results:

  • CLAP effectively recognized broad categories like birds, frogs, and whales across benchmarks with zero-shot transfer, achieving performance comparable to supervised baselines.
  • Demonstrated CLAP's potential for novel tasks, including estimating relative sound distances and discovering unknown species.
  • Identified limitations, such as the inability to discern fine-grained species-level categories and the dependency on manually crafted text prompts.

Conclusions:

  • Multi-Modal Language Models, specifically Audio-Language Models like CLAP, offer a versatile and efficient alternative for bioacoustic monitoring.
  • CLAP demonstrates significant potential for zero-shot sound event detection in diverse ecological contexts, reducing reliance on manual annotation.
  • Further research is needed to address limitations in fine-grained recognition and prompt engineering for practical, real-world bioacoustic applications.