Abstract
As an emerging State Space Model (SSM), the Mamba model draws inspiration from the architecture of Recurrent Neural Networks (RNNs), significantly enhancing the global receptive field and feature extraction capabilities of object detection models. Compared to traditional Convolutional Neural Networks (CNNs) and Transformers, Mamba demonstrates superior performance in handling complex scale variations and multi-view interference, making it particularly suitable for object detection tasks in dynamic environments such as in fire detection scenarios. To enhance the performance of visual fire detection technologies and provide a novel approach, this paper proposes an efficient fire detection algorithm based on the YOLOv9 architecture and introduces multiple key techniques to design a high-performance fire detection model leveraging the Mamba attention mechanism. First, this paper presents an efficient attention mechanism, the Efficient Mamba Attention (EMA) module. Unlike existing self-attention mechanisms, EMA integrates adaptive average pooling with an SSM module, eliminating the need for full-scale association computations across all positions. Instead, it performs dimensionality reduction on input features through adaptive average pooling and utilizes the state update mechanism of the SSM module to significantly enhance feature representation and optimize information flow. Second, to address the limitations of SSM models in local feature modeling, this study incorporates the ConvNeXtV2 module to optimize the backbone network, improving the model's ability to capture fine-grained local details and thereby strengthening its overall representation capability. Additionally, a dynamic non-monotonic focusing mechanism and distance penalty strategy are employed to refine the loss function, leading to a substantial improvement in bounding box accuracy. Experimental results demonstrate the superior performance of the proposed method in fire detection tasks. The model achieves an FPS of 71, with an [Formula: see text] of 91.0% on the large-scale fire dataset and 87.2% on the small-scale fire dataset. Compared to existing methods, the proposed approach maintains high detection performance while exhibiting significant computational efficiency advantages.