Bioacoustics for Tiny Hardware


Challenge results

More details can be found on the task page:

Task description page

Teams results

Submission information Performance Size Time
Name Technical
Report
Average Precision Model Size (bytes) RAM Usage (bytes) Preprocessing Time (ms) Model Time (ms) Total Time (ms)
Londono_enriched Londono2025 0.98 16600.00 26844.00 176.11 24.42 200.53
Londono_non_enriched Londono2025 0.94 16600.00 26844.00 176.22 24.42 200.53
Christian_Walter Walter2025 0.74 7020.00 6744.00 1.62 16.45 18.07
Oguamanam_team Oguamanam2025 0.97 4192.00 18354.00 34.28 7.39 41.67
Oguamanam_team_flux Oguamanam2025 0.97 1848.00 780.00 16.54 0.18 16.72
Toby_Martin Martin2025 0.99 7776.00 20924.00 73.42 15.16 88.58
Naveen_Dhar Dhar_1_2025 0.74 12920.00 30328.00 64.84 112.78 177.62

Technical reports

Convolutional Gated Recurrent Network with Knowledge Distillation for Resource-Constrained Bioacoustics

Naveen, Dhar
High Tech High Mesa, San Diego, USA

Abstract

The shift toward ”tiny” machine learning enables the deployment of sophisticated audio classification models directly on power-efficient hardware, offering numerous benefits: immediate threat detection, lower latency for conservation insight, and vastly increased opera- tional lifetimes in remote environments. However, these advantages come with constraints on memory, computational power, and en- ergy consumption, necessitating unique approaches in both model architecture and feature extraction, in an environment where tra- ditional deep learning methods negate practicality despite accu- racy. A pipeline utilizing knowledge distillation and a custom Con- volutional Recurrent Neural Network (CRNN) was developed for resource-constrained bioacoustics and evaluated on Task 3 of the 2025 BioDCASE challenge: detecting Yellowhammer bunting vo- calizations in near and far-field environments using a lightweight model. The proposed network was built upon the BioDCASE base- line system, utilizing MobileNet convolution and depthwise convo- lution blocks, which preceded a Gated-Recurrent Unit (GRU), giv- ing the name ”MobileGRU”. A larger ”MobileGRU” was used as a teacher, and a smaller version was used as a student during knowl- edge distillation. Due to the nature of the task, ways to improve or maintain performance while maintaining or lowering architecture size, such as data augmentation, weight pruning, and adjustment of feature creation parameters, were given attention. The proposed ”MobileGRU” CRNN achieved a 0.993 average precision on the provided validation dataset and a 34KB file size when quantized, demonstrating the potential for efficient machine learning.

PDF

Convolutional Neural Network with Knowledge Distillation for Resource-Constrained Bioacoustics

Naveen, Dhar
High Tech High Mesa, San Diego, USA

Abstract

The shift toward ”tiny” machine learning enables the deployment of sophisticated audio classification models directly on power-efficient hardware, offering numerous benefits: immediate threat detection, lower latency for conservation insight, and vastly increased op- erational lifetimes in remote environments. However, these ad- vantages come with constraints on memory, computational power, and energy consumption, necessitating unique approaches in both model architecture and feature extraction, in an environment where traditional deep learning methods negate practicality despite ac- curacy. A pipeline utilizing knowledge distillation and a custom Convolutional Neural Network (CNN) was developed for resource- constrained bioacoustics and evaluated on Task 3 of the 2025 BioD- CASE challenge: detecting Yellowhammer bunting vocalizations in near and far-field environments using a lightweight model. The proposed network was built upon the BioDCASE baseline system, utilizing MobileNet convolution and depthwise convolution blocks, giving the name ”SlimCNN”. A larger ”SlimCNN” was used as a teacher, and a smaller version was used as a student during knowl- edge distillation. Due to the nature of the task, ways to improve or maintain performance while maintaining or lowering architecture size, such as data augmentation, weight pruning, and adjustment of feature creation parameters, were given attention. The proposed ”SlimCNN” achieved a 0.994 average precision on the provided val- idation dataset and a 13KB file size when quantized, demonstrating the potential for efficient machine learning

PDF

BioDCASE Task 3: TinyML Models for Bird Classification

Espitia Londoño, Sebastián
Human-Environment Research (HER), La Salle Campus Barcelona — Universitat Ramon Llull, Spain

Abstract

This technical report is given to participate in task 3 BioDCASE 2025. The task consists in identifying Yellowhammer birds, found throughout Europe. This task is tackled with two models, one trained with augmented data and another trained without any aug- mentation data, in this way, we can compare the behavior of both models. The model demonstrates high confidence in its predic- tions, with 95.96% of outputs classified as high-confidence de- cisions. Non-Enriched Model achieves superior feature learning while maintaining high confidence levels, suggesting better balance between certainty and complexity capture. Its enhanced clustering quality, combined with more calibrated uncertainty levels, positions it as the more robust choice for applications requiring both reliable predictions and meaningful internal representations. The trade-off in prediction consistency appears acceptable given the substantial improvements in representation quality.

PDF

BioDCASE 2025 Challenge Technical Report

Martin, Toby

Abstract

This project focuses on detecting the presence of bird vocalisa- tions, specifically those of the yellowhammer (Emberiza citrinella) . The task is a binary classification problem: each audio clip either contains a bird call or it doesn’t. The initial model architecture was based on a MobileNet-style convolutional neural network (CNN). While it achieved reason- ably high accuracy and was relatively compact, it was still larger than necessary for this domain specific task and not well suited for deployment on resource constrained devices like embedded micro- controllers or edge AI systems. The objective of this exercise was to optimise the model to reduce inference memory and computational requirements, and to main- tain or improve performance on the validation set. This document describes the process undertaken to improve the model, focusing on reducing its size and computational load while either preserving or improving accuracy. The rationale behind each modification is explained, the validation methodology pre- sented, and the results of experiments comparing the original and optimised versions are given.

PDF

Targeted Feature Extraction for Bird Sound Classification Systems

Oguamanam, Ifeanyi and Machowski, Lucas and Simic, Marija and Krishnan, Sri
Signal Analysis Research (SAR) Group, Dept. Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Canada

Abstract

Convolutional neural networks with small computational footprint are deployed on a tiny hardware for a binary classification task of “yellowhammer” bird sounds. We use a simple network structure to minimize the amount of parameters and computations as much as possible. Our submission model for this task achieved a classifica- tion accuracy of 91.35% on the validation set. Further, the inference time of the deployed model on the tiny hardware including feature extraction was 20.34ms.

PDF

Efficient Convolutional Neural Network for Bioacoustics in Tiny Hardware

Walter, Christian
Computational Medicine, University of Veterinary Medicine, Vienna, Austria

Abstract

Convolutional neural networks with small computational footprint are deployed on a tiny hardware for a binary classification task of “yellowhammer” bird sounds. We use a simple network structure to minimize the amount of parameters and computations as much as possible. Our submission model for this task achieved a classifica- tion accuracy of 91.35% on the validation set. Further, the inference time of the deployed model on the tiny hardware including feature extraction was 20.34ms.

PDF