Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

Rapidly Varying Flow

Rapidly Varying Flow

Rapidly varying flow (RVF) in open channels is characterized by abrupt changes in flow depth over a short distance, with the rate of depth change relative to distance often approaching unity. These flows are inherently complex due to their transient and multi-dimensional nature, making exact analysis difficult. However, approximate solutions using simplified models provide valuable insights into their behavior.Key Features of Rapidly Varying FlowRVF is commonly observed in scenarios involving...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

PyAO: PyTorch-Based Memory-Efficient LLM Training on Ethernet-Interconnected Clusters.

Sensors (Basel, Switzerland)·2026

Same author

The role of ecology in allopatric speciation of darters in the Central Highlands, USA.

Evolution; international journal of organic evolution·2025

Same author

Pleistocene speciation and glacial refugia in the Gilt Darter (Percidae: Percina evides) species complex.

Evolution; international journal of organic evolution·2025

Same author

Undescribed and imperiled vertebrate biodiversity near an American urban center.

Biology letters·2025

Same author

Comparative species delimitation of a biological conservation icon.

Current biology : CB·2025

Same author

Clinical predictors of causative radiographic findings in adults with acute onset diplopia.

Frontiers in neurology·2024

Same journal

RETRACTED: Zhang et al. A Novel Framework for Reconstruction and Imaging of Target Scattering Centers via Wide-Angle Incidence in Radar Networks. <i>Sensors</i> 2025, <i>25</i>, 6802.

Sensors (Basel, Switzerland)·2026

Same journal

Enhancing Unsupervised Multi-Source Domain Adaptation for Person Re-Identification via Mixture of Experts and Graph-Based Relation.

Sensors (Basel, Switzerland)·2026

Same journal

Development of an Instrumented Glove for Palmar Pressure Assessment in Kayakers.

Sensors (Basel, Switzerland)·2026

Same journal

Development and Experimental Validation of an Autonomous IoT-Based Monitoring System for Real-Time Water Quality Assessment in the Amazon River.

Sensors (Basel, Switzerland)·2026

Same journal

Semi-Supervised Adversarial Learning Framework for Controller Area Network Bus Intrusion Detection.

Sensors (Basel, Switzerland)·2026

Same journal

Smart Optimization Method for Safety Signs in Innovative Manufacturing Environments Integrating Industrial Field IoT Sensors and Knowledge Graphs.

Sensors (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 3, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Dynamic Micro-Batch and Token-Budget Scheduling for IoT-Scale Pipeline-Parallel LLM Inference.

Juncheol Ahn¹, Yubin Son¹, Daemin Kim¹

¹System Software Laboratory, Department of Computer Engineering, Keimyung University, Daegu 42601, Republic of Korea.

Sensors (Basel, Switzerland)

|February 27, 2026

Summary

This summary is machine-generated.

We developed a runtime-adaptive scheduler for large language models (LLMs) in IoT-edge-cloud settings. This dynamic scheduling significantly reduces GPU idle time and improves throughput for LLM inference.

Keywords:

GPU scheduling IoT cloud inference edge computing large language models micro-batching pipeline parallelism

Related Experiment Videos

Last Updated: Jun 3, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Computer Science
Artificial Intelligence
Distributed Systems

Background:

Large language models (LLMs) in IoT-edge-cloud environments handle diverse, unpredictable requests.
Pipeline-parallel LLM inference is susceptible to micro-batch imbalance and communication delays, leading to GPU idleness and Service Level Objective (SLO) violations.

Purpose of the Study:

To propose a novel runtime-adaptive scheduler for optimizing LLM inference in resource-constrained IoT-edge-cloud settings.
To address micro-batch imbalance and communication stalls in pipeline-parallel LLM inference.

Main Methods:

Developed a scheduler that dynamically adjusts token budgets and micro-batch sizes.
Optimized the balance between prefill and decoding workloads.
Minimized pipeline bubbles under varying network and compute conditions.

Main Results:

Achieved up to 55% reduction in GPU idle time.
Improved throughput by up to 1.61 times compared to existing methods like vLLM and SGLang.
Enhanced Time-To-First-Token (TTFT) and Iteration Latency (ITL) SLO satisfaction.

Conclusions:

Dynamic scheduling is crucial for efficient and stable LLM inference in IoT-edge-cloud systems.
The proposed adaptive scheduler effectively mitigates performance bottlenecks in heterogeneous request environments.