Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Parallel Processing01:20

Parallel Processing

144
The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...
144
Fast Fourier Transform01:10

Fast Fourier Transform

260
The Fast Fourier Transform (FFT) is a computational algorithm designed to compute the Discrete Fourier Transform (DFT) efficiently. By breaking down the calculations into smaller, manageable sections, the FFT significantly reduces the computational complexity involved. Direct computation of an N-point DFT requires N2 complex multiplications, whereas the FFT algorithm needs only (N/2)log⁡2N multiplications, offering a much faster performance.
The computational efficiency of the FFT becomes...
260
Downsampling01:20

Downsampling

126
When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...
126
Convolution: Math, Graphics, and Discrete Signals01:24

Convolution: Math, Graphics, and Discrete Signals

223
In any LTI (Linear Time-Invariant) system, the convolution of two signals is denoted using a convolution operator, assuming all initial conditions are zero. The convolution integral can be divided into two parts: the zero-input or natural response and the zero-state or forced response, with t0 indicating the initial time.
To simplify the convolution integral, it is assumed that both the input signal and impulse response are zero for negative time values. The graphical convolution process...
223
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

94
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
94
Upsampling01:22

Upsampling

195
Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...
195

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Hardware-Oriented Approximations of Softmax and RMSNorm for Efficient Transformer Inference.

Micromachines·2026
Same author

LDF-BNN: A Real-Time and High-Accuracy Binary Neural Network Accelerator Based on the Improved BNext.

Micromachines·2024
Same author

Ponte: Represent Totally Binary Neural Network Toward Efficiency.

Sensors (Basel, Switzerland)·2024
Same author

An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications.

Micromachines·2024
Same author

An OpenCL-Based FPGA Accelerator for Faster R-CNN.

Entropy (Basel, Switzerland)·2023
Same author

Fast and Accurate Object Detection in Remote Sensing Images Based on Lightweight Deep Neural Network.

Sensors (Basel, Switzerland)·2021
Same journal

Correction: Kang et al. Fluid Flow to Electricity: Capturing Flow-Induced Vibrations with Micro-Electromechanical-System-Based Piezoelectric Energy Harvester. <i>Micromachines</i> 2024, <i>15</i>, 581.

Micromachines·2026
Same journal

Femtosecond Laser Texturing of Wood Coatings with Bio-Based Epoxy and Wax Additives for Enhanced Hydrophobicity.

Micromachines·2026
Same journal

Engineering of Optoelectronic Devices for Renewable Energy Applications.

Micromachines·2026
Same journal

Phase Transformation and Electrochemical Behavior of Hexagonal TiO<sub>2</sub> Nanotubes Under Different Annealing Temperatures and Heating Rates.

Micromachines·2026
Same journal

Process Optimization and Predictive Modeling of Femtosecond Laser Precision Milling for Commercial PMMA Slices.

Micromachines·2026
Same journal

A Hybrid Preprocessing Multi-Objective Surrogate Model for Thermal MEMS Actuators.

Micromachines·2026
See all related articles

Related Experiment Video

Updated: May 31, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
04:48

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

363

Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection.

Jia Xu1,2,3, Han Pu1,2, Dong Wang1,2

  • 1Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China.

Micromachines
|January 25, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces novel hardware acceleration techniques for sparse neural networks, significantly improving energy efficiency and reducing latency. The optimized accelerator achieved high throughput with low power consumption on an FPGA.

Keywords:
FPGAcache memorydeep convolutional neural networkheterogeneous computinghigh-level synthesis

More Related Videos

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

2.6K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Related Experiment Videos

Last Updated: May 31, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
04:48

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

363
Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

2.6K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Area of Science:

  • Computer Engineering
  • Artificial Intelligence
  • Hardware Acceleration

Background:

  • Deep convolutional neural network (DCNN) acceleration is crucial for AI, but general-purpose devices lack efficiency for sparse models.
  • Existing neural network accelerators face challenges in optimal efficiency, low latency, and minimal power consumption.
  • Sparse neural network acceleration is an active research area with potential for further optimization.

Purpose of the Study:

  • To investigate and propose three key techniques for hardware acceleration of sparse neural networks.
  • To enhance energy efficiency by designing specialized circuits that eliminate computations on zero values in sparse kernels.
  • To improve off-chip memory access efficiency and reduce latency in convolutional neural network accelerators.

Main Methods:

  • Developed a specialized computational circuit to detect and skip zero-value computations in sparse convolutional kernels.
  • Utilized a Vitis HLS compiler optimization plugin to enhance on-chip bandwidth utilization for data access patterns.
  • Proposed a shared feature map cache with a hash-based indexing algorithm for efficient asynchronous convolution, reducing on-chip memory usage.

Main Results:

  • The specialized circuit enhanced energy efficiency by eliminating zero-value computations.
  • Compiler optimizations improved off-chip memory access efficiency and data handling.
  • The shared cache design enabled efficient asynchronous convolution, conserving on-chip resources.
  • ResNet50 inference on an Intel Arria 10 1150GX FPGA achieved 497 GOPS throughput (1579 GOPS equivalent) at 22W power consumption.

Conclusions:

  • The proposed techniques significantly improve the efficiency, latency, and power consumption of sparse neural network accelerators.
  • The optimized accelerator demonstrates high performance and energy efficiency for DCNN inference tasks.
  • This work provides a viable solution for deploying efficient neural network hardware acceleration.