Bird Counting


Task 6 Description

Coordinators

Aki Härmä
Aki Härmä

Maastricht University

Emre Argın
Emre Argın

Maastricht University

Aysenur Arslan-Dogan
Aysenur Arslan-Dogan

Maastricht University

This task is part of the BioDCASE Challenge 2026 and focuses on estimating bird abundance from acoustic recordings collected in zoo aviaries. The goal is to develop robust methods for counting individuals of a target species in realistic, multi-species acoustic environments where ground-truth population sizes are known.

The main leaderboard ranks systems based solely on target-species abundance estimation performance. Participants may optionally extend their methods to additional species; a secondary leaderboard showcases generalist systems capable of estimating multiple species simultaneously, but this does not contribute to final rankings.

Scientific Context

Passive acoustic monitoring (PAM) has transformed biodiversity research by enabling continuous, non-invasive observation of vocalizing animals across vast spatial and temporal scales. Deep learning models such as BirdNET and PERCH can now reliably detect and classify bird species from audio recordings, achieving performance comparable to expert human listeners. However, a critical gap remains between species detection and population monitoring: knowing which species are present is ecologically valuable, but conservation and management decisions often require knowing how many individuals are present.

The transition from detection to abundance estimation is methodologically challenging. A single bird may vocalize hundreds of times per hour, while another individual of the same species remains silent. Overlapping calls from multiple individuals can merge into indistinguishable acoustic events. Call rates vary with time of day, breeding stage, weather conditions, and social context. Traditional approaches require either individual identification through acoustic fingerprinting — feasible only for species with highly distinctive individual signatures — or independent estimation of per-capita call rates, which demands substantial additional fieldwork.

Zoo aviaries offer a unique opportunity to advance this research frontier. Unlike wild populations where true abundance is unknown or estimated with considerable uncertainty, zoo records provide exact population counts that can serve as ground-truth labels. The controlled yet naturalistic aviary environment produces acoustic conditions representative of real-world monitoring scenarios: multiple species vocalizing simultaneously, environmental noise from visitors and infrastructure, and natural behavioral variation across time. This combination of ecological realism and ground-truth availability creates an ideal testbed for developing and validating abundance estimation methods that can subsequently transfer to wild population monitoring.

Task Description

Given a collection of audio fragments from an aviary, participants must estimate the number of individuals of each target species present in that aviary. Three target species are designated for evaluation:

  • Greater flamingo (Phoenicopterus roseus) — present in 4 aviaries with populations of 52, 52, 107, and 161 individuals
  • Red-billed quelea (Quelea quelea) — present in 2 aviaries with populations of 61 and 153 individuals
  • Hadada ibis (Bostrychia hagedash) — present in 2 aviaries with populations of 4 and 6 individuals

Participants receive 6 audio collections, each representing recordings from a distinct aviary with a different number of target-species individuals. Each collection contains 11,000–36,000 short audio fragments (3-second clips at 48 kHz) extracted from continuous recordings spanning 2–3 days. The fragments capture the temporal variation in vocal activity, including dawn choruses, midday lulls, and periods of silence.

The aviaries contain 2–12 bird species that vocalize concurrently, requiring participants to either explicitly identify the target species or develop methods robust to multi-species acoustic mixtures. The target species were selected based on sufficient vocal activity and presence across multiple aviaries with varying population sizes.

This task is intentionally open-ended in terms of methodology. The problem does not prescribe a particular machine learning paradigm. Participants may frame it as they see fit — whether as a regression problem, a count prediction task, a clustering problem, a density estimation challenge, or something else entirely. What matters is the final output: an integer estimate of how many individuals are present.

Species detection accuracy is not evaluated; only the final abundance estimate matters. Participants are free to use pre-trained models (BirdNET, PERCH, etc.) including our provided detection models (ARIA), train custom detectors, or bypass detection entirely.

Key challenges

  • Flock-calling species: Greater flamingos vocalize synchronously in large groups. When many individuals call within the same 3-second window, raw detection counts saturate — the acoustic scene becomes a continuous chorus rather than a sequence of distinguishable events. The per-individual detection rate decreases as population grows.
  • Sparse calibration data: With only 6 aviaries (and 2–4 data points per target species), models must generalize from very few examples. Standard supervised learning approaches with train/test splits are impractical.
  • Multi-species environments: Each aviary contains multiple co-occurring species with overlapping frequency ranges and calling times. Target species detections must be distinguished from acoustically similar co-residents.
  • Population range: Target populations span two orders of magnitude (4 to 161 individuals), requiring methods that work across scales without species-specific tuning.

Dataset

Development Set

The development dataset contains 140,899 audio files across 6 aviaries recorded at European zoos using passive acoustic monitoring equipment during spring and summer. Each aviary was recorded continuously for 7–11 days; the released dataset includes a curated subset of 2–3 representative days per aviary, selected to minimize distributional distortion of key acoustic features while keeping the dataset size manageable.

All audio files are WAV, 16-bit PCM, sampled at 48 kHz, with a duration of approximately 3 seconds. The files represent consecutive, non-overlapping segments extracted from continuous recordings.

Aviary         Days   Audio files   Target species                Target population
------------------------------------------------------------------------------------
dev_aviary_1     3      12,627      Red-billed quelea             153
dev_aviary_2     3      25,569      Greater flamingo, Hadada ibis 107, 6
dev_aviary_3     3      11,879      Red-billed quelea             61
dev_aviary_4     3      36,340      Greater flamingo, Hadada ibis 161, 4
dev_aviary_5     2      19,363      Greater flamingo              52
dev_aviary_6     3      35,121      Greater flamingo              52

Table 1. Summary of development set aviaries.

Note: dev_aviary_5 and dev_aviary_6 are two separate recording sessions from the same physical location with the same bird population, captured on different dates under different acoustic conditions. They are treated as independent data points.

Each aviary also contains non-target species (2–12 per aviary, 28 species total). The complete species inventory with population counts is provided in metadata/ground_truth.csv. This information may be useful for systems that model inter-species acoustic interactions or use co-occurrence patterns.

The development dataset is available on Hugging Face:

BioDCASE 2026 Task 6 — Bird Counting Dataset

Evaluation Set

To be released on 1 June 2026. Participants will receive audio collections from held-out aviaries without population labels. Systems must output a single integer estimate per (aviary, target species) pair.

The evaluation aviaries will exhibit the same realistic acoustic complexity as the development set:

  • Overlapping vocalizations: Multiple individuals often call simultaneously
  • Variable call rates: Vocal activity fluctuates across hours and days
  • Environmental noise: Background sounds from aviary infrastructure and ambient sources
  • Multi-species mixtures: Non-target species vocalize concurrently

Structure

Audio files are organized into aviary directories with chunk subdirectories for file management:

BioDCASE2026_Bird_Counting/
├── dev_aviary_1/
│   ├── chunk_000/
│   │   ├── rec_d1_00_00_45.750000.wav
│   │   ├── rec_d1_00_01_49.wav
│   │   └── ...
│   ├── chunk_001/
│   │   └── ...
│   └── ...
├── dev_aviary_2/
│   └── ...
├── ...
├── dev_aviary_6/
│   └── ...
└── metadata/
    ├── ground_truth.csv
    └── recording_info.csv

Filenames follow the pattern rec_{day}_{HH}_{MM}_{SS}[.ffffff].wav, where {day} is an anonymized day identifier (d1, d2, d3) and the remainder encodes the time of day. The chunk subdirectories have no acoustic significance — they exist purely for file management, and all chunks within an aviary should be treated as a single collection.

Baselines

The baseline system implements a two-stage pipeline: species detection followed by detection-count regression. Full code is available in the baseline repository. You can also see the flow of the pipeline in the repository.

Stage 1: Species Detection

Two detection packages are provided:

  • ARIA — a fusion ensemble combining BirdNET and PERCH v2 foundation models: pip install aria-inference
  • BirdNET-only — customized BirdNET detection tailored to this task: pip install aria-inference-birdnet

In the baseline solution repository, the reported birdnet_detections were generated with the default BirdNET model for simplicity, while the ARIA detections were generated with the ARIA inference package. Participants are encouraged to also consider the customized BirdNET-only ARIA package (aria-inference-birdnet), as it has been adapted specifically for this dataset and may provide a stronger starting point than the default BirdNET configuration. For instructions on running the default BirdNET model, please refer to the official BirdNET repository.

Both produce per-segment species detections with confidence scores.

Stage 2: Feature Extraction and Population Estimation

The feature builder extracts 80+ features per (aviary, target species) pair from the detection output, including detection-count statistics (confidence-weighted rate, bout rate), temporal structure (bout duration, inter-bout gaps, active hours), occupancy metrics, and optionally scikit-maad acoustic indices (ACI, NDSI, nROI, etc.) with positive-minus-background contrast features, flock-calling indicators, and adaptive frequency band selection.

Five estimation models are provided, each being the best performer for at least one (target species, detection format) combination:

Model Best for (ARIA) Best for (BirdNET)
flock_corrected_cwr Greater flamingo
sim_weighted_cwr Greater flamingo
linear_coeff_bout_rate Hadada ibis Hadada ibis
linear_coeff_cwr Red-billed quelea
adaptive_band_contrast Red-billed quelea

Table 2. Baseline models and their best-performing species/detector combinations.

All models use leave-one-out (LOO) cross-validation. Three models (1, 2, 4) work with detection-only features; two models (3, 5) additionally require acoustic features computed from the original audio.

Baseline Results (ARIA detections)

Species              Best model               Aviary          True   Predicted   Error
---------------------------------------------------------------------------------------
Greater flamingo     flock_corrected_cwr      dev_aviary_2     107      116       +9
Greater flamingo     flock_corrected_cwr      dev_aviary_4     161       97      -64
Greater flamingo     flock_corrected_cwr      dev_aviary_5      52       57       +5
Greater flamingo     flock_corrected_cwr      dev_aviary_6      52       66      +14
Hadada ibis          linear_coeff_bout_rate   dev_aviary_2       6        6       +0
Hadada ibis          linear_coeff_bout_rate   dev_aviary_4       4        4       +0
Red-billed quelea    linear_coeff_cwr         dev_aviary_1     153      153       +0
Red-billed quelea    linear_coeff_cwr         dev_aviary_3      61       61       +0

Table 3. Baseline results with ARIA detections — combined MAE: 11.50, RMSE: 23.45, R²: 0.8279, MAPE: 10.6%.

Baseline Results (BirdNET detections)

Species              Best model               Aviary          True   Predicted   Error
---------------------------------------------------------------------------------------
Greater flamingo     sim_weighted_cwr         dev_aviary_2     107      108       +1
Greater flamingo     sim_weighted_cwr         dev_aviary_4     161       55      -106
Greater flamingo     sim_weighted_cwr         dev_aviary_5      52       74      +22
Greater flamingo     sim_weighted_cwr         dev_aviary_6      52       67      +15
Hadada ibis          linear_coeff_bout_rate   dev_aviary_2       6        7       +1
Hadada ibis          linear_coeff_bout_rate   dev_aviary_4       4        3       -1
Red-billed quelea    adaptive_band_contrast   dev_aviary_1     153      152       -1
Red-billed quelea    adaptive_band_contrast   dev_aviary_3      61       61       +0

Table 4. Baseline results with BirdNET detections — combined MAE: 18.38, RMSE: 38.65, R²: 0.5325, MAPE: 22.5%.

The baseline performs strongly for Red-billed quelea and Hadada ibis, where detection-derived features remain closely related to true population size. Greater flamingo remains the primary challenge due to flock-calling synchronization, where many individuals vocalize simultaneously within the same detection window, reducing the usefulness of raw detection counts as population increases. Improving flamingo estimation is therefore a key opportunity for participants.

Evaluation

Ranking Metric

Mean Absolute Error (MAE) computed across all (aviary, target species) data points in the evaluation set. Lower is better. Systems are ranked by their MAE on the held-out evaluation aviaries.

Supplementary Metrics

Additional metrics will be reported alongside MAE but will not be used for ranking:

  • Root Mean Squared Error (RMSE): penalizes large individual errors more heavily than MAE.
  • Coefficient of determination (R²): measures how well predictions track the variance in true population sizes. Note that R² depends on variance in the true values and can be misleading for species with few data points or similar population sizes.
  • Mean Absolute Percentage Error (MAPE): scale-independent error expressed as a percentage of the true count, useful for comparing performance across species with different population scales.

Optional Secondary Leaderboard

Participants may optionally submit population estimates for non-target species present in each aviary (listed in metadata/ground_truth.csv). A secondary leaderboard will showcase generalist systems but will not contribute to the main ranking.

Submission

General BioDCASE instructions are available here. This specifies submission naming conventions and the final report template.

Deliverables

Please submit a .zip file containing the following:

Predictions (predictions_{lastname}.csv): A CSV file with columns aviary_id, species, predicted_count containing one row per (aviary, target species) pair. Example:

aviary_id,species,predicted_count
dev_aviary_1,Red-billed quelea,153
dev_aviary_2,Greater flamingo,107
dev_aviary_2,Hadada ibis,6
dev_aviary_3,Red-billed quelea,61
dev_aviary_4,Greater flamingo,161
dev_aviary_4,Hadada ibis,4
dev_aviary_5,Greater flamingo,52
dev_aviary_6,Greater flamingo,52

Code: Source code for the submitted system, sufficient to reproduce the results.

Report (.pdf): Final report describing the method (see submissions page for details).

Citation

If you use the baseline biodcase-population-estimation repository, please cite the baseline software.

@software{ml4biodiversity2026baseline,
  author       = {Arg{\i}n, Emre and H{\"a}rm{\"a}, Aki and Arslan-Dogan, Aysenur},
  title        = {{BioDCASE 2026 Bird Counting Baseline: Avian Population Estimation
                   from Passive Acoustic Recordings}},
  year         = {2026},
  publisher    = {GitHub},
  url          = {https://github.com/ml4biodiversity/biodcase-population-estimation},
  version      = {1.0.0},
}

If you use the BioDCASE 2026 Bird Counting dataset, please also cite the dataset.

@dataset{ml4biodiversity2026dataset,
  author       = {Arg{\i}n, Emre and H{\"a}rm{\"a}, Aki and Arslan-Dogan, Aysenur},
  title        = {{BioDCASE 2026 Bird Counting: Avian Population Estimation
                   from Passive Acoustic Recordings}},
  year         = {2026},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/datasets/Emreargin/BioDCASE2026_Bird_Counting},
}

If you use ARIA detections or build on the ARIA methodology, please also cite the ARIA paper.

@inproceedings{argincosta2026aria,
  author       = {Arg{\i}n, Emre and Amado Pereira da Costa, Bernardo and
                  H{\"a}rm{\"a}, Aki and Arslan-Dogan, Aysenur},
  title        = {{ARIA: Acoustic Recognition for Inventory in Aviaries}},
  booktitle    = {Proceedings of the IEEE World Congress on Computational Intelligence
                  (WCCI) / International Joint Conference on Neural Networks (IJCNN)},
  year         = {2026},
  note         = {Accepted, to appear},
}

Support

If you have questions please use the BioDCASE Google Groups community forum, or contact the task organizers at: aysenur.arslan-dogan@maastrichtuniversity.nl