Search research articles

Related Concept Videos

Approximate Integration

Approximate Integration

In many practical and theoretical contexts, the exact value of a definite integral may be inaccessible. This limitation typically arises when the antiderivative of a function is either unknown or cannot be expressed in a closed mathematical form. Alternatively, it can occur when a function is defined not by a formula but by a finite set of empirical data points, such as those collected during experiments. In these cases, approximate integration techniques provide a valuable solution.One of the...

Linearization and Approximation

Linearization and Approximation

Linearization is a mathematical technique used to approximate complex, nonlinear functions with simpler linear models in the vicinity of a chosen reference point. The method is based on the idea that, although a function may be difficult to evaluate exactly, its behavior near a specific input value can often be closely approximated by the tangent line at that point. This approach is particularly useful when small deviations from a known value are involved.Consider the square root function, for...

Accuracy, limits, and approximation

Accuracy, limits, and approximation

Accuracy, limits, and approximations are common in many fields, especially in engineering calculations. These concepts are imperative for ensuring that a given value is as close as possible to its true value.
Accuracy is defined as the closeness of the measured value to the true or actual value. In engineering mechanics, repeated measurements are taken during theoretical or experimental analyses to ensure that the result is precise and accurate.
The accuracy of any solution is based on the...

Application of Linearization and Approximation

Application of Linearization and Approximation

A drone flying through complex terrain often relies on more than one sensing method to estimate small changes in altitude. Along with direct measurements, air pressure provides a useful indirect indicator of vertical movement. Atmospheric pressure decreases as altitude increases, and this relationship is commonly described using an exponential model. Although accurate, converting pressure measurements into altitude values requires calculations that are too complex to perform repeatedly during...

Bacterial Transformation

Bacterial Transformation

In 1928, bacteriologist Frederick Griffith worked on a vaccine for pneumonia, which is caused by Streptococcus pneumoniae bacteria. Griffith studied two pneumonia strains in mice: one pathogenic and one non-pathogenic. Only the pathogenic strain killed host mice.
Griffith made an unexpected discovery when he killed the pathogenic strain and mixed its remains with the live, non-pathogenic strain. Not only did the mixture kill host mice, but it also contained living pathogenic bacteria that...

Linear Approximation in Frequency Domain

Linear Approximation in Frequency Domain

Linear systems are characterized by two main properties: superposition and homogeneity. Superposition allows the response to multiple inputs to be the sum of the responses to each individual input. Homogeneity ensures that scaling an input by a scalar results in the response being scaled by the same scalar.
In contrast, nonlinear systems do not inherently possess these properties. However, for small deviations around an operating point, a nonlinear system can often be approximated as linear....

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection.

Micromachines·2025

Same author

LDF-BNN: A Real-Time and High-Accuracy Binary Neural Network Accelerator Based on the Improved BNext.

Micromachines·2024

Same author

Ponte: Represent Totally Binary Neural Network Toward Efficiency.

Sensors (Basel, Switzerland)·2024

Same author

An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications.

Micromachines·2024

Same author

An OpenCL-Based FPGA Accelerator for Faster R-CNN.

Entropy (Basel, Switzerland)·2023

Same author

Fast and Accurate Object Detection in Remote Sensing Images Based on Lightweight Deep Neural Network.

Sensors (Basel, Switzerland)·2021

Same journal

Correction: Kang et al. Fluid Flow to Electricity: Capturing Flow-Induced Vibrations with Micro-Electromechanical-System-Based Piezoelectric Energy Harvester. <i>Micromachines</i> 2024, <i>15</i>, 581.

Micromachines·2026

Same journal

Femtosecond Laser Texturing of Wood Coatings with Bio-Based Epoxy and Wax Additives for Enhanced Hydrophobicity.

Micromachines·2026

Same journal

Engineering of Optoelectronic Devices for Renewable Energy Applications.

Micromachines·2026

Same journal

Phase Transformation and Electrochemical Behavior of Hexagonal TiO<sub>2</sub> Nanotubes Under Different Annealing Temperatures and Heating Rates.

Micromachines·2026

Same journal

Process Optimization and Predictive Modeling of Femtosecond Laser Precision Milling for Commercial PMMA Slices.

Micromachines·2026

Same journal

A Hybrid Preprocessing Multi-Objective Surrogate Model for Thermal MEMS Actuators.

Micromachines·2026

See all related articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jan 29, 2026

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens

Published on: April 19, 2011

Hardware-Oriented Approximations of Softmax and RMSNorm for Efficient Transformer Inference.

Yiwen Kang^1,2, Dong Wang^1,2

¹Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China.

|January 28, 2026

Summary

This summary is machine-generated.

This study introduces hardware-efficient methods to accelerate Transformer inference by optimizing nonlinear operators like Softmax and RMSNorm. These techniques reduce resource costs and latency while maintaining model accuracy for large language models (LLMs).

Keywords:

FPGA RMSNorm Softmax hardware acceleration transformer inference

More Related Videos

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3

Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3

Published on: December 27, 2010

Related Experiment Videos

Last Updated: Jan 29, 2026

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens

Efficient Polyethylene Glycol PEG Mediated Transformation of the Moss Physcomitrella patens

Published on: April 19, 2011

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3

Genotypic Inference of HIV-1 Tropism Using Population-based Sequencing of V3

Published on: December 27, 2010

Area of Science:

Computer Engineering
Artificial Intelligence
Software Engineering

Background:

Transformer-based large language models (LLMs) are increasingly used in software engineering for tasks like code generation and NFR classification.
Existing research on LLM optimization primarily targets linear operations, leaving nonlinear operators underexplored.
Nonlinear operators such as Softmax and RMSNorm are critical for Transformer performance but are computationally expensive.

Purpose of the Study:

To propose hardware-efficient approximation and acceleration methods for Softmax and RMSNorm operators in Transformer models.
To reduce resource costs and accelerate Transformer inference speed.
To maintain the accuracy of LLMs while optimizing hardware utilization.

Main Methods:

Developed a SafeSoftmax technique with range reduction for bipartite lookup table (LUT) approximation and acceleration.
Optimized bit-width configuration using Pareto frontier analysis and applied error compensation for numerical accuracy.
Reformulated division as logarithmic subtraction using a LOD-driven LUT and optimized RMSNorm using LOD for parallel computation.

Main Results:

Implemented an FPGA-based pipelined accelerator demonstrating low operator-level latency and power consumption.
Achieved significant reductions in hardware resource usage.
Preserved model accuracy despite the approximations and accelerations applied to Softmax and RMSNorm.

Conclusions:

The proposed hardware-efficient methods effectively accelerate Transformer inference by optimizing critical nonlinear operators.
The FPGA-based accelerator offers a practical solution for deploying LLMs with reduced resource footprints and improved performance.
This work highlights the potential of hardware-level optimizations for nonlinear operators in advancing LLM applications.