Jove
Visualize
Contact Us

Related Concept Videos

Approximate Integration01:24

Approximate Integration

46
In many practical and theoretical contexts, the exact value of a definite integral may be inaccessible. This limitation typically arises when the antiderivative of a function is either unknown or cannot be expressed in a closed mathematical form. Alternatively, it can occur when a function is defined not by a formula but by a finite set of empirical data points, such as those collected during experiments. In these cases, approximate integration techniques provide a valuable solution.One of the...
46
Linearization and Approximation01:26

Linearization and Approximation

57
Linearization is a mathematical technique used to approximate complex, nonlinear functions with simpler linear models in the vicinity of a chosen reference point. The method is based on the idea that, although a function may be difficult to evaluate exactly, its behavior near a specific input value can often be closely approximated by the tangent line at that point. This approach is particularly useful when small deviations from a known value are involved.Consider the square root function, for...
57
Accuracy, limits, and approximation01:28

Accuracy, limits, and approximation

1.3K
Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...
1.3K
Application of Linearization and Approximation01:29

Application of Linearization and Approximation

88
A drone flying through complex terrain often relies on more than one sensing method to estimate small changes in altitude. Along with direct measurements, air pressure provides a useful indirect indicator of vertical movement. Atmospheric pressure decreases as altitude increases, and this relationship is commonly described using an exponential model. Although accurate, converting pressure measurements into altitude values requires calculations that are too complex to perform repeatedly during...
88
Bacterial Transformation01:33

Bacterial Transformation

59.7K
In 1928, bacteriologist Frederick Griffith worked on a vaccine for pneumonia, which is caused by Streptococcus pneumoniae bacteria. Griffith studied two pneumonia strains in mice: one pathogenic and one non-pathogenic. Only the pathogenic strain killed host mice.
Griffith made an unexpected discovery when he killed the pathogenic strain and mixed its remains with the live, non-pathogenic strain. Not only did the mixture kill host mice, but it also contained living pathogenic bacteria that...
59.7K
Linear Approximation in Frequency Domain01:26

Linear Approximation in Frequency Domain

370
Linear systems are characterized by two main properties: superposition and homogeneity. Superposition allows the response to multiple inputs to be the sum of the responses to each individual input. Homogeneity ensures that scaling an input by a scalar results in the response being scaled by the same scalar.
In contrast, nonlinear systems do not inherently possess these properties. However, for small deviations around an operating point, a nonlinear system can often be approximated as linear....
370

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection.

Micromachines·2025
Same author

LDF-BNN: A Real-Time and High-Accuracy Binary Neural Network Accelerator Based on the Improved BNext.

Micromachines·2024
Same author

Ponte: Represent Totally Binary Neural Network Toward Efficiency.

Sensors (Basel, Switzerland)·2024
Same author

An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications.

Micromachines·2024
Same author

An OpenCL-Based FPGA Accelerator for Faster R-CNN.

Entropy (Basel, Switzerland)·2023
Same author

Fast and Accurate Object Detection in Remote Sensing Images Based on Lightweight Deep Neural Network.

Sensors (Basel, Switzerland)·2021
Same journal

Correction: Kang et al. Fluid Flow to Electricity: Capturing Flow-Induced Vibrations with Micro-Electromechanical-System-Based Piezoelectric Energy Harvester. <i>Micromachines</i> 2024, <i>15</i>, 581.

Micromachines·2026
Same journal

Femtosecond Laser Texturing of Wood Coatings with Bio-Based Epoxy and Wax Additives for Enhanced Hydrophobicity.

Micromachines·2026
Same journal

Engineering of Optoelectronic Devices for Renewable Energy Applications.

Micromachines·2026
Same journal

Phase Transformation and Electrochemical Behavior of Hexagonal TiO<sub>2</sub> Nanotubes Under Different Annealing Temperatures and Heating Rates.

Micromachines·2026
Same journal

Process Optimization and Predictive Modeling of Femtosecond Laser Precision Milling for Commercial PMMA Slices.

Micromachines·2026
Same journal

A Hybrid Preprocessing Multi-Objective Surrogate Model for Thermal MEMS Actuators.

Micromachines·2026
See all related articles
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jan 29, 2026

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens
04:54

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens

Published on: April 19, 2011

41.5K

Hardware-Oriented Approximations of Softmax and RMSNorm for Efficient Transformer Inference.

Yiwen Kang1,2, Dong Wang1,2

  • 1Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China.

Micromachines
|January 28, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces hardware-efficient methods to accelerate Transformer inference by optimizing nonlinear operators like Softmax and RMSNorm. These techniques reduce resource costs and latency while maintaining model accuracy for large language models (LLMs).

Keywords:
FPGARMSNormSoftmaxhardware accelerationtransformer inference

More Related Videos

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

43.6K
Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3
11:10

Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3

Published on: December 27, 2010

12.7K

Related Experiment Videos

Last Updated: Jan 29, 2026

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens
04:54

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens

Published on: April 19, 2011

41.5K
Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

43.6K
Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3
11:10

Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3

Published on: December 27, 2010

12.7K

Area of Science:

  • Computer Engineering
  • Artificial Intelligence
  • Software Engineering

Background:

  • Transformer-based large language models (LLMs) are increasingly used in software engineering for tasks like code generation and NFR classification.
  • Existing research on LLM optimization primarily targets linear operations, leaving nonlinear operators underexplored.
  • Nonlinear operators such as Softmax and RMSNorm are critical for Transformer performance but are computationally expensive.

Purpose of the Study:

  • To propose hardware-efficient approximation and acceleration methods for Softmax and RMSNorm operators in Transformer models.
  • To reduce resource costs and accelerate Transformer inference speed.
  • To maintain the accuracy of LLMs while optimizing hardware utilization.

Main Methods:

  • Developed a SafeSoftmax technique with range reduction for bipartite lookup table (LUT) approximation and acceleration.
  • Optimized bit-width configuration using Pareto frontier analysis and applied error compensation for numerical accuracy.
  • Reformulated division as logarithmic subtraction using a LOD-driven LUT and optimized RMSNorm using LOD for parallel computation.

Main Results:

  • Implemented an FPGA-based pipelined accelerator demonstrating low operator-level latency and power consumption.
  • Achieved significant reductions in hardware resource usage.
  • Preserved model accuracy despite the approximations and accelerations applied to Softmax and RMSNorm.

Conclusions:

  • The proposed hardware-efficient methods effectively accelerate Transformer inference by optimizing critical nonlinear operators.
  • The FPGA-based accelerator offers a practical solution for deploying LLMs with reduced resource footprints and improved performance.
  • This work highlights the potential of hardware-level optimizations for nonlinear operators in advancing LLM applications.