Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

2.7K
2.7K
Multiple Comparison Tests01:13

Multiple Comparison Tests

4.0K
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
4.0K
Language and Cognition01:27

Language and Cognition

438
Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.
438
Language Development01:22

Language Development

444
Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
444
Aggregates Classification01:29

Aggregates Classification

380
Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
380
One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation01:24

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

708
This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...
708

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Multi-Scale Attention Fusion With Depthwise Separable Convolutions for Efficient Skin Cancer Detection.

Journal of cutaneous pathology·2025
Same author

Sentiment analysis for deepfake X posts using novel transfer learning based word embedding and hybrid LGR approach.

Scientific reports·2025
Same author

Explainable deep learning approaches for high precision early melanoma detection using dermoscopic images.

Scientific reports·2025
Same author

Enhancing the YOLOv8 model for realtime object detection to ensure online platform safety.

Scientific reports·2025
Same author

Fusing Transformer-XL with bi-directional recurrent networks for cyberbullying detection.

PeerJ. Computer science·2025
Same author

Mpox-XDE: an ensemble model utilizing deep CNN and explainable AI for monkeypox detection and classification.

BMC infectious diseases·2025
Same journal

Invaders taking over-Mollusc faunal change in volcanic barrier lakes of the Albertine Rift biodiversity hotspot.

PloS one·2026
Same journal

AI-driven molecular diversification and ligand-based optimization of macitentan derivatives targeting VEGFR1 and endothelin signaling pathways.

PloS one·2026
Same journal

Performance patterns and records in the world aquatics masters championships: Where do the most frequently represented nations among the top-ten masters swimmers come from?

PloS one·2026
Same journal

Modeling diurnal Temperature-Rainfall relationships under multicollinearity using PLS-SEM: A case study of Ghana.

PloS one·2026
Same journal

Organizational culture, social capital, and emergency capacity in primary healthcare institutions: A cross-sectional structural equation modeling study comparing ordinary and older communities.

PloS one·2026
Same journal

Impact of kidney function on the metabolome in the general population.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Sep 9, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

681

GATmath and GATLc: Comprehensive benchmarks for evaluating Arabic large language models.

Safa AlBallaa1, Nora AlTwairesh1, Abdulmalik AlSalman1

  • 1Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.

Plos One
|September 2, 2025
PubMed
Summary
This summary is machine-generated.

Developing Arabic Large Language Models (LLMs) is challenging due to limited benchmarks. New datasets, GATmath and GATLc, offer large-scale reasoning and language tasks to drive progress in Arabic AI.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

570
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

522

Related Experiment Videos

Last Updated: Sep 9, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

681
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

570
Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody
09:09

Foreign Accent and Forensic Speaker Identification in Voice Lineups: The Influence of Acoustic Features Based on Prosody

Published on: September 27, 2024

522

Area of Science:

  • Artificial Intelligence
  • Natural Language Processing
  • Computational Linguistics

Background:

  • Large Language Models (LLMs) have advanced AI, but their development requires robust evaluation.
  • Assessing Arabic LLMs is hindered by a lack of comprehensive benchmarks and evaluation tools.
  • This scarcity limits the progress and real-world application of Arabic language models.

Purpose of the Study:

  • Introduce GATmath (7k questions) and GATLc (9k questions), novel Arabic benchmarks for multitask reasoning and language understanding.
  • Provide the first large-scale, comprehensive reasoning dataset specifically designed for the Arabic language.
  • Facilitate rigorous evaluation and drive the advancement of Arabic LLMs.

Main Methods:

  • Created two large-scale Arabic datasets, GATmath and GATLc, derived from the General Aptitude Test (GAT).
  • Datasets encompass diverse categories requiring reasoning, semantic analysis, language comprehension, and mathematical problem-solving.
  • Evaluated seven prominent LLMs on these newly developed benchmarks.

Main Results:

  • The highest-performing LLM achieved only 66.9% (GATmath) and 64.3% (GATLc) accuracy.
  • These results highlight the significant difficulty posed by the GATmath and GATLc datasets.
  • Current state-of-the-art LLMs demonstrate substantial limitations in Arabic reasoning and language understanding.

Conclusions:

  • The GATmath and GATLc datasets present a considerable challenge for existing Arabic LLMs.
  • There is substantial room for improvement in developing more capable Arabic language models.
  • These benchmarks are crucial for advancing research and development in Arabic AI.