AUTHOR=Yu Zhongjiang , Yang Shaoping , Li Zhongtai , Li Ligang , Luo Hui , Yang Fan TITLE=LogMS: a multi-stage log anomaly detection method based on multi-source information fusion and probability label estimation JOURNAL=Frontiers in Physics VOLUME=12 YEAR=2024 URL=https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2024.1401857 DOI=10.3389/fphy.2024.1401857 ISSN=2296-424X ABSTRACT=

Introduction: Log anomaly detection is essential for monitoring and maintaining the normal operation of systems. With the rapid development and maturation of deep learning technologies, deep learning-based log anomaly detection has become a prominent research area. However, existing methods primarily concentrate on directly detecting log data in a single stage using specific anomaly information, such as log sequential information or log semantic information. This leads to a limited understanding of log data, resulting in low detection accuracy and poor model robustness.

Methods: To tackle this challenge, we propose LogMS, a multi-stage log anomaly detection method based on multi-source information fusion and probability label estimation. Before anomaly detection, the logs undergo parsing and vectorization to capture semantic information. Subsequently, we propose a multi-source information fusion-based long short-term memory (MSIF-LSTM) network for the initial stage of anomaly log detection. By fusing semantic information, sequential information, and quantitative information, MSIF-LSTM enhances the anomaly detection capability. Furthermore, we introduce a probability label estimation-based gate recurrent unit (PLE-GRU) network, which leverages easily obtainable normal log labels to construct pseudo-labeled data and train a GRU for further detection. PLE-GRU enhances the detection capability from the perspective of label information. To ensure the overall efficiency of the LogMS, the second-stage will only be activated when anomalies are not detected in the first stage.

Results and Discussion: Experimental results demonstrate that LogMS outperforms baseline models across various log anomaly detection datasets, exhibiting superior performance in robustness testing.