Jove
Visualize
联系我们
JoVE
x logofacebook logolinkedin logoyoutube logo
关于 JoVE
概览领导团队博客JoVE 帮助中心
作者
出版流程编辑委员会范围与政策同行评审常见问题投稿
图书馆员
用户评价订阅访问资源图书馆顾问委员会常见问题
研究
JoVE JournalMethods CollectionsJoVE Encyclopedia of Experiments存档
教育
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab Manual教师资源中心教师网站
使用条款与条件
隐私政策
政策

相关概念视频

Parallel Processing01:20

Parallel Processing

144
The brain processes sensory information rapidly due to parallel processing, which involves sending data across multiple neural pathways at the same time. This method allows the brain to manage various sensory qualities, such as shapes, colors, movements, and locations, all concurrently. For instance, when observing a forest landscape, the brain simultaneously processes the movement of leaves, the shapes of trees, the depth between them, and the various shades of green. This enables a quick and...
144
Fast Fourier Transform01:10

Fast Fourier Transform

260
The Fast Fourier Transform (FFT) is a computational algorithm designed to compute the Discrete Fourier Transform (DFT) efficiently. By breaking down the calculations into smaller, manageable sections, the FFT significantly reduces the computational complexity involved. Direct computation of an N-point DFT requires N2 complex multiplications, whereas the FFT algorithm needs only (N/2)log⁡2N multiplications, offering a much faster performance.
The computational efficiency of the FFT becomes...
260
Downsampling01:20

Downsampling

126
When considering a sampled sequence with zero values between sampling instants, one can replace it by taking every N-th value of the sequence. At these integer multiples of N, the original and sampled sequences coincide. This process, known as decimation, involves extracting every N-th sample from a sequence, thereby creating a more efficient sequence.
The Fourier transform of the decimated sequence reveals a combination of scaled and shifted versions of the original spectrum. This...
126
Convolution: Math, Graphics, and Discrete Signals01:24

Convolution: Math, Graphics, and Discrete Signals

223
In any LTI (Linear Time-Invariant) system, the convolution of two signals is denoted using a convolution operator, assuming all initial conditions are zero. The convolution integral can be divided into two parts: the zero-input or natural response and the zero-state or forced response, with t0 indicating the initial time.
To simplify the convolution integral, it is assumed that both the input signal and impulse response are zero for negative time values. The graphical convolution process...
223
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

94
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...
94
Upsampling01:22

Upsampling

195
Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...
195

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序
Same author

Hardware-Oriented Approximations of Softmax and RMSNorm for Efficient Transformer Inference.

Micromachines·2026
Same author

LDF-BNN: A Real-Time and High-Accuracy Binary Neural Network Accelerator Based on the Improved BNext.

Micromachines·2024
Same author

Ponte: Represent Totally Binary Neural Network Toward Efficiency.

Sensors (Basel, Switzerland)·2024
Same author

An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications.

Micromachines·2024
Same author

An OpenCL-Based FPGA Accelerator for Faster R-CNN.

Entropy (Basel, Switzerland)·2023
Same author

Fast and Accurate Object Detection in Remote Sensing Images Based on Lightweight Deep Neural Network.

Sensors (Basel, Switzerland)·2021
Same journal

Correction: Kang et al. Fluid Flow to Electricity: Capturing Flow-Induced Vibrations with Micro-Electromechanical-System-Based Piezoelectric Energy Harvester. <i>Micromachines</i> 2024, <i>15</i>, 581.

Micromachines·2026
Same journal

Femtosecond Laser Texturing of Wood Coatings with Bio-Based Epoxy and Wax Additives for Enhanced Hydrophobicity.

Micromachines·2026
Same journal

Engineering of Optoelectronic Devices for Renewable Energy Applications.

Micromachines·2026
Same journal

Phase Transformation and Electrochemical Behavior of Hexagonal TiO<sub>2</sub> Nanotubes Under Different Annealing Temperatures and Heating Rates.

Micromachines·2026
Same journal

Process Optimization and Predictive Modeling of Femtosecond Laser Precision Milling for Commercial PMMA Slices.

Micromachines·2026
Same journal

A Hybrid Preprocessing Multi-Objective Surrogate Model for Thermal MEMS Actuators.

Micromachines·2026
查看所有相关文章

相关实验视频

Updated: May 31, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
04:48

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

363

稀疏转换FPGA加速器基于多银行哈希选择

Jia Xu1,2,3, Han Pu1,2, Dong Wang1,2

  • 1Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China.

Micromachines
|January 25, 2025
PubMed
概括
此摘要是机器生成的。

本研究介绍了稀疏神经网络的新型硬件加速技术,显著提高能源效率和减少延迟. 优化的加速器在FPGA上实现了高吞吐量与低功耗.

关键词:
在FPGA中,FPGA是指FPGA.缓存内存 缓存内存 缓存内存深度卷积神经网络是一个深度卷积神经网络.不同质的计算方式.高层次的合成.

更多相关视频

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

2.6K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

相关实验视频

Last Updated: May 31, 2025

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique
04:48

Swin-PSAxialNet: An Efficient Multi-Organ Segmentation Technique

Published on: July 5, 2024

363
Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention
06:37

Author Spotlight: Addressing Technical and Subjective Challenges in Measuring Classroom Attention

Published on: December 15, 2023

2.6K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

科学领域:

  • 计算机工程 计算机工程
  • 人工智能的人工智能
  • 硬件加速器 硬件加速器

背景情况:

  • 深度卷积神经网络 (DCNN) 加速对人工智能至关重要,但通用设备对稀疏模型缺乏效率.
  • 现有的神经网络加速器在最佳效率,低延迟和最小功耗方面面临着挑战.
  • 稀疏神经网络加速是一个活跃的研究领域,有可能进一步优化.

研究的目的:

  • 研究和提出三种关键技术,用于稀疏神经网络的硬件加速.
  • 通过设计专门的电路来提高能源效率,从而消除在稀疏内核中对零值的计算.
  • 为了提高离芯片内存访问效率,并减少卷积神经网络加速器中的延迟.

主要方法:

  • 开发了一个专门的计算电路来检测和跳过零值计算在稀疏的卷积内核.
  • 使用Vitis HLS编译器优化插件来增强对数据访问模式的芯片上带宽利用率.
  • 提出了一个共享功能地图缓存与基于哈希的索引算法,以实现高效的异步卷积,减少芯片内存的使用.

主要成果:

  • 专用电路通过消除零值计算来提高能源效率.
  • 编译器优化提高了离芯片内存访问效率和数据处理.
  • 共享缓存设计使有效的异步卷积成为可能,节省了芯片上的资源.
  • 根据ResNet50对Intel Arria 10 1150GXFPGA的推断,在22W的功耗下实现了497个GOPS的吞吐量 (相当于1579个GOPS).

结论:

  • 提出的技术显著提高了稀疏神经网络加速器的效率,延迟和功耗.
  • 优化的加速器在DCNN推断任务中表现出高性能和高能效.
  • 这项工作为部署高效的神经网络硬件加速提供了可行的解决方案.