AUTHOR=Asriani Euis , Muchtadi-Alamsyah Intan , Purwarianti Ayu TITLE=Real block-circulant matrices and DCT-DST algorithm for transformer neural network JOURNAL=Frontiers in Applied Mathematics and Statistics VOLUME=9 YEAR=2023 URL=https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2023.1260187 DOI=10.3389/fams.2023.1260187 ISSN=2297-4687 ABSTRACT=
In the encoding and decoding process of transformer neural networks, a weight matrix-vector multiplication occurs in each multihead attention and feed forward sublayer. Assigning the appropriate weight matrix and algorithm can improve transformer performance, especially for machine translation tasks. In this study, we investigate the use of the real block-circulant matrices and an alternative to the commonly used fast Fourier transform (FFT) algorithm, namely, the discrete cosine transform–discrete sine transform (DCT-DST) algorithm, to be implemented in a transformer. We explore three transformer models that combine the use of real block-circulant matrices with different algorithms. We start from generating two orthogonal matrices,