The final, formatted version of the article will be published soon.
ORIGINAL RESEARCH article
Front. Artif. Intell.
Sec. Pattern Recognition
Volume 7 - 2024 |
doi: 10.3389/frai.2024.1499913
Interpreting CNN Models for Musical Instrument Recognition Using Multi-Spectrogram Heatmap Analysis: A Preliminary Study
Provisionally accepted- Auckland University of Technology, Auckland, New Zealand
Musical instrument recognition is a critical component of music information retrieval (MIR), focused on identifying and classifying instruments from audio recordings. This task is challenging due to the complex and variable nature of musical signals. In this study, we employ convolutional neural networks (CNNs) to analyze the contributions of various spectrogram representations—STFT, Log-Mel, MFCC, Chroma, Spectral Contrast, and Tonnetz—to the classification of ten different musical instruments using samples from the NSynth database. Our methodology includes visual heatmap analysis and statistical metrics such as Difference Mean, KL Divergence, JS Divergence, and Earth Mover's Distance to assess feature importance and model interpretability. The results highlight the strengths and limitations of each spectrogram type in capturing the distinctive features of different instruments, providing insights into optimizing recognition for future musical instrument recognition models.
Keywords: musical instrument recognition, music information retrieval, Convolutional neural networks (CNNs), Spectrogram analysis, Feature Maps, Heatmaps, pattern recognition, feature extraction
Received: 22 Sep 2024; Accepted: 03 Dec 2024.
Copyright: © 2024 Chen, Ghobakhloua and Narayanan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Rujia Chen, Auckland University of Technology, Auckland, New Zealand
Ajit Narayanan, Auckland University of Technology, Auckland, New Zealand
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.