Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Machine Learning and Artificial Intelligence
Volume 8 - 2025 | doi: 10.3389/frai.2025.1527299
This article is part of the Research Topic Deep Neural Network Architectures and Reservoir Computing View all 4 articles

A Novel Approach to Indian Bird Species Identification: Employing Visual-Acoustic Fusion Techniques for Improved Classification Accuracy

Provisionally accepted
  • VIT University, Vellore, Tamil Nadu, India

The final, formatted version of the article will be published soon.

    Accurate identification of bird species is essential for monitoring biodiversity, analyzing ecological patterns, assessing population health, and guiding conservation efforts. Birds serve as vital indicators of environmental change, making species identification critical for habitat protection and understanding ecosystem dynamics. With over 1,300 species, India's avifauna presents significant challenges due to morphological and acoustic similarities among species. For bird monitoring, recent work often uses acoustic sensors to collect bird sounds and an automated bird classification system to recognize bird species. Traditional machine learning requires manual feature extraction and model training to build an automated bird classification system. Automatically extracting features is now possible due to recent advances in deep learning models. This study presents a novel approach utilizing visual-acoustic fusion techniques to enhance species identification accuracy. We employ a Deep Convolutional Neural Network (DCNN) to extract features from bird images and a Long Short-Term Memory (LSTM) network to analyze bird calls. By integrating these modalities early in the classification process, our method significantly improves performance compared to traditional methods that rely on either data type alone or utilize late fusion strategies.Testing on the iBC53 (Indian Bird Call) dataset demonstrates an impressive accuracy of 94%, highlighting the effectiveness of our multi-modal fusion approach.

    Keywords: Birds Identification, species classification, Visual-acoustic data, Fusion technique, Deep CNN

    Received: 13 Nov 2024; Accepted: 29 Jan 2025.

    Copyright: © 2025 Gavali and Jamal Mohammed. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Saira Banu Jamal Mohammed, VIT University, Vellore, 632 014, Tamil Nadu, India

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.