AUTHOR=Li Sheng , Li Jiyi , Cao Yang TITLE=Phantom in the opera: adversarial music attack for robot dialogue system JOURNAL=Frontiers in Computer Science VOLUME=6 YEAR=2024 URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2024.1355975 DOI=10.3389/fcomp.2024.1355975 ISSN=2624-9898 ABSTRACT=

This study explores the vulnerability of robot dialogue systems' automatic speech recognition (ASR) module to adversarial music attacks. Specifically, we explore music as a natural camouflage for such attacks. We propose a novel method to hide ghost speech commands in a music clip by slightly perturbing its raw waveform. We apply our attack on an industry-popular ASR model, namely the time-delay neural network (TDNN), widely used for speech and speaker recognition. Our experiment demonstrates that adversarial music crafted by our attack can easily mislead industry-level TDNN models into picking up ghost commands with high success rates. However, it sounds no different from the original music to the human ear. This reveals a serious threat by adversarial music to robot dialogue systems, calling for effective defenses against such stealthy attacks.