Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Parallel Processing

Parallel Processing

The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...

Fast Fourier Transform

Fast Fourier Transform

The Fast Fourier Transform (FFT) is a computational algorithm designed to compute the Discrete Fourier Transform (DFT) efficiently. By breaking down the calculations into smaller, manageable sections, the FFT significantly reduces the computational complexity involved. Direct computation of an N-point DFT requires N2 complex multiplications, whereas the FFT algorithm needs only (N/2)log⁡2N multiplications, offering a much faster performance.
The computational efficiency of the FFT becomes...

Downsampling

Downsampling

When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...

Convolution: Math, Graphics, and Discrete Signals

Convolution: Math, Graphics, and Discrete Signals

In any LTI (Linear Time-Invariant) system, the convolution of two signals is denoted using a convolution operator, assuming all initial conditions are zero. The convolution integral can be divided into two parts: the zero-input or natural response and the zero-state or forced response, with t0 indicating the initial time.
To simplify the convolution integral, it is assumed that both the input signal and impulse response are zero for negative time values. The graphical convolution process...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Upsampling

Upsampling

Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Hardware-Oriented Approximations of Softmax and RMSNorm for Efficient Transformer Inference.

Micromachines·2026

Same author

LDF-BNN: A Real-Time and High-Accuracy Binary Neural Network Accelerator Based on the Improved BNext.

Micromachines·2024

Same author

Ponte: Represent Totally Binary Neural Network Toward Efficiency.

Sensors (Basel, Switzerland)·2024

Same author

An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications.

Micromachines·2024

Same author

An OpenCL-Based FPGA Accelerator for Faster R-CNN.

Entropy (Basel, Switzerland)·2023

Same author

Fast and Accurate Object Detection in Remote Sensing Images Based on Lightweight Deep Neural Network.

Sensors (Basel, Switzerland)·2021

Same journal

Correction: Kang et al. Fluid Flow to Electricity: Capturing Flow-Induced Vibrations with Micro-Electromechanical-System-Based Piezoelectric Energy Harvester. <i>Micromachines</i> 2024, <i>15</i>, 581.

Micromachines·2026

Same journal

Femtosecond Laser Texturing of Wood Coatings with Bio-Based Epoxy and Wax Additives for Enhanced Hydrophobicity.

Micromachines·2026

Same journal

Engineering of Optoelectronic Devices for Renewable Energy Applications.

Micromachines·2026

Same journal

Phase Transformation and Electrochemical Behavior of Hexagonal TiO<sub>2</sub> Nanotubes Under Different Annealing Temperatures and Heating Rates.

Micromachines·2026

Same journal

Process Optimization and Predictive Modeling of Femtosecond Laser Precision Milling for Commercial PMMA Slices.

Micromachines·2026

Same journal

A Hybrid Preprocessing Multi-Objective Surrogate Model for Thermal MEMS Actuators.

Micromachines·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 31, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection.

Jia Xu^1,2,3, Han Pu^1,2, Dong Wang^1,2

¹Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China.

|January 25, 2025

Summary

This summary is machine-generated.

This study introduces novel hardware acceleration techniques for sparse neural networks, significantly improving energy efficiency and reducing latency. The optimized accelerator achieved high throughput with low power consumption on an FPGA.

Keywords:

FPGA cache memory deep convolutional neural network heterogeneous computing high-level synthesis

More Related Videos

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: May 31, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Computer Engineering
Artificial Intelligence
Hardware Acceleration

Background:

Deep convolutional neural network (DCNN) acceleration is crucial for AI, but general-purpose devices lack efficiency for sparse models.
Existing neural network accelerators face challenges in optimal efficiency, low latency, and minimal power consumption.
Sparse neural network acceleration is an active research area with potential for further optimization.

Purpose of the Study:

To investigate and propose three key techniques for hardware acceleration of sparse neural networks.
To enhance energy efficiency by designing specialized circuits that eliminate computations on zero values in sparse kernels.
To improve off-chip memory access efficiency and reduce latency in convolutional neural network accelerators.

Main Methods:

Developed a specialized computational circuit to detect and skip zero-value computations in sparse convolutional kernels.
Utilized a Vitis HLS compiler optimization plugin to enhance on-chip bandwidth utilization for data access patterns.
Proposed a shared feature map cache with a hash-based indexing algorithm for efficient asynchronous convolution, reducing on-chip memory usage.

Main Results:

The specialized circuit enhanced energy efficiency by eliminating zero-value computations.
Compiler optimizations improved off-chip memory access efficiency and data handling.
The shared cache design enabled efficient asynchronous convolution, conserving on-chip resources.
ResNet50 inference on an Intel Arria 10 1150GX FPGA achieved 497 GOPS throughput (1579 GOPS equivalent) at 22W power consumption.

Conclusions:

The proposed techniques significantly improve the efficiency, latency, and power consumption of sparse neural network accelerators.
The optimized accelerator demonstrates high performance and energy efficiency for DCNN inference tasks.
This work provides a viable solution for deploying efficient neural network hardware acceleration.