A hybrid model based on transformer and Mamba for enhanced sequence modeling

  • 0Jiangxi Academy Sciences, Institute of Energy, Nanchang, 330029, Jiangxi, China. zhuxiaocui@jxas.ac.cn.
Scientific reports +

|

Abstract

State Space Models (SSMs) have made remarkable strides in language modeling in recent years. With the introduction of Mamba, these models have garnered increased attention, often surpassing Transformers in specific areas. Nevertheless, despite Mamba's unique strengths, Transformers remain essential due to their advanced computational capabilities and proven effectiveness. In this paper, we propose a novel model that effectively integrates the strengths of both Transformers and Mamba. Specifically, our model utilizes the Transformer's encoder for encoding tasks while employing Mamba as the decoder for decoding tasks. We introduce a feature fusion technique that combines the features generated by the encoder with the hidden states produced by the decoder. This approach successfully merges the advantages of the Transformer and Mamba, resulting in enhanced performance. Comprehensive experiments across various language tasks demonstrate that our proposed model consistently achieves competitive results, outperforming existing benchmarks.