This task is part of the BioDCASE Challenge 2026 and focuses on estimating bird abundance from acoustic recordings collected in zoo aviaries. The goal is to develop robust methods for counting individuals of a target species in realistic, multi-species acoustic environments where ground-truth population sizes are known.

The main leaderboard ranks systems based solely on target-species abundance estimation performance. Participants may optionally extend their methods to additional species; a secondary leaderboard showcases generalist systems capable of estimating multiple species simultaneously, but this does not contribute to final rankings.

NEWS: Task 6 results are now available. View results

Scientific Context

Passive acoustic monitoring (PAM) has transformed biodiversity research by enabling continuous, non-invasive observation of vocalizing animals across vast spatial and temporal scales. Deep learning models such as BirdNET and PERCH can now reliably detect and classify bird species from audio recordings, achieving performance comparable to expert human listeners. However, a critical gap remains between species detection and population monitoring: knowing which species are present is ecologically valuable, but conservation and management decisions often require knowing how many individuals are present.

The transition from detection to abundance estimation is methodologically challenging. A single bird may vocalize hundreds of times per hour, while another individual of the same species remains silent. Overlapping calls from multiple individuals can merge into indistinguishable acoustic events. Call rates vary with time of day, breeding stage, weather conditions, and social context. Traditional approaches require either individual identification through acoustic fingerprinting — feasible only for species with highly distinctive individual signatures — or independent estimation of per-capita call rates, which demands substantial additional fieldwork.

Zoo aviaries offer a unique opportunity to advance this research frontier. Unlike wild populations where true abundance is unknown or estimated with considerable uncertainty, zoo records provide exact population counts that can serve as ground-truth labels. The controlled yet naturalistic aviary environment produces acoustic conditions representative of real-world monitoring scenarios: multiple species vocalizing simultaneously, environmental noise from visitors and infrastructure, and natural behavioral variation across time. This combination of ecological realism and ground-truth availability creates an ideal testbed for developing and validating abundance estimation methods that can subsequently transfer to wild population monitoring.

Task Description

Given a collection of audio fragments from an aviary, participants must estimate the number of individuals of each target species present in that aviary. Three target species are designated for the main leaderboard:

Greater flamingo (Phoenicopterus roseus) — present in 4 aviaries with populations of 52, 52, 107, and 161 individuals
Red-billed quelea (Quelea quelea) — present in 2 aviaries with populations of 61 and 153 individuals
Hadada ibis (Bostrychia hagedash) — present in 2 aviaries with populations of 4 and 6 individuals

Additionally, for BioDCASE 2026 we introduce an optional fourth target species:

Pied avocet (Recurvirostra avosetta) — present in 4 evaluation aviaries (not in the development set). Pied avocets produce distinctive, well-articulated calls that contrast with flock-calling species, making them an interesting test case for methods exploiting individual call structure. Pied avocet predictions are optional: they contribute to the secondary generalist leaderboard only and do not affect the main ranking.

Participants receive 6 audio collections, each representing recordings from a distinct aviary with a different number of target-species individuals. Each collection contains 11,000–36,000 short audio fragments (3-second clips at 48 kHz) extracted from continuous recordings spanning 2–3 days. The fragments capture the temporal variation in vocal activity, including dawn choruses, midday lulls, and periods of silence.

The aviaries contain 2–12 bird species that vocalize concurrently, requiring participants to either explicitly identify the target species or develop methods robust to multi-species acoustic mixtures. The target species were selected based on sufficient vocal activity and presence across multiple aviaries with varying population sizes.

This task is intentionally open-ended in terms of methodology. The problem does not prescribe a particular machine learning paradigm. Participants may frame it as they see fit — whether as a regression problem, a count prediction task, a clustering problem, a density estimation challenge, or something else entirely. What matters is the final output: an integer estimate of how many individuals are present.

Species detection accuracy is not evaluated; only the final abundance estimate matters. Participants are free to use pre-trained models (BirdNET, PERCH, etc.) including our provided detection models (ARIA), train custom detectors, or bypass detection entirely.

Key challenges

Flock-calling species: Greater flamingos vocalize synchronously in large groups. When many individuals call within the same 3-second window, raw detection counts saturate — the acoustic scene becomes a continuous chorus rather than a sequence of distinguishable events. The per-individual detection rate decreases as population grows.
Sparse calibration data: With only 6 aviaries (and 2–4 data points per target species), models must generalize from very few examples. Standard supervised learning approaches with train/test splits are impractical.
Multi-species environments: Each aviary contains multiple co-occurring species with overlapping frequency ranges and calling times. Target species detections must be distinguished from acoustically similar co-residents.
Population range: Target populations span two orders of magnitude (4 to 161 individuals), requiring methods that work across scales without species-specific tuning.

Dataset

Development Set

The development dataset contains 140,899 audio files across 6 aviaries recorded at European zoos using passive acoustic monitoring equipment during spring and summer. Each aviary was recorded continuously for 7–11 days; the released dataset includes a curated subset of 2–3 representative days per aviary, selected to minimize distributional distortion of key acoustic features while keeping the dataset size manageable.

All audio files are WAV, 16-bit PCM, sampled at 48 kHz, with a duration of approximately 3 seconds. The files represent consecutive, non-overlapping segments extracted from continuous recordings.

Aviary         Days   Audio files   Target species                Target population
------------------------------------------------------------------------------------
dev_aviary_1     3      12,627      Red-billed quelea             153
dev_aviary_2     3      25,569      Greater flamingo, Hadada ibis 107, 6
dev_aviary_3     3      11,879      Red-billed quelea             61
dev_aviary_4     3      36,340      Greater flamingo, Hadada ibis 161, 4
dev_aviary_5     2      19,363      Greater flamingo              52
dev_aviary_6     3      35,121      Greater flamingo              52

Table 1. Summary of development set aviaries.

Note: dev_aviary_5 and dev_aviary_6 are two separate recording sessions from the same physical location with the same bird population, captured on different dates under different acoustic conditions. They are treated as independent data points.

Each aviary also contains non-target species (2–12 per aviary, 28 species total). The complete species inventory with population counts is provided in metadata/ground_truth.csv. This information may be useful for systems that model inter-species acoustic interactions or use co-occurrence patterns.

The development dataset is available on Hugging Face:

BioDCASE 2026 Task 6 — Bird Counting Dataset

Evaluation Set

Released on 1 June 2026. The evaluation dataset is now live on Hugging Face under the same repository as the development set:

BioDCASE 2026 Task 6 — Bird Counting Dataset

The evaluation set contains ~380,000 audio files across 10 held-out aviaries recorded at European zoos using the same passive acoustic monitoring setup as the development set. Each aviary was recorded for 5–10 days. All audio files are WAV, 16-bit PCM, sampled at 48 kHz, with a duration of approximately 3 seconds — identical format to the development set.

Aviary            Days   Audio files   Target species in this aviary
---------------------------------------------------------------------
eval_aviary_1      5       21,190      Red-billed quelea
eval_aviary_2      5       15,268      Red-billed quelea
eval_aviary_3      8       50,901      Pied avocet*
eval_aviary_4      8       46,728      Greater flamingo
eval_aviary_5     10       56,092      Pied avocet*
eval_aviary_6      8       36,020      Hadada ibis
eval_aviary_7      8       46,066      Hadada ibis
eval_aviary_8      8       64,196      Greater flamingo
eval_aviary_9      7       16,658      Pied avocet*
eval_aviary_10     7       26,296      Pied avocet*

Table 5. Summary of evaluation set aviaries. *Pied avocet is an additional optional target species — see § New optional target species below.

Each row tells participants the target species that needs an integer count for that aviary. Each aviary also contains additional non-target species; the per-aviary non-target composition is held private until the challenge concludes.

New optional target species: Pied avocet

For BioDCASE 2026 we introduce Pied avocet (Recurvirostra avosetta) as an additional optional target species. Pied avocets produce distinctive, well-articulated calls that contrast sharply — making them an interesting test case for methods that exploit individual call structure for abundance estimation. Four evaluation aviaries contain Pied avocet populations of varying sizes.

Pied avocet predictions are not part of the main leaderboard and will not affect main rankings. They will contribute to the secondary generalist leaderboard alongside non-target species predictions. Participants who only target the original three species (Greater flamingo, Red-billed quelea, Hadada ibis) are unaffected and can ignore Pied avocet entirely.

Species inventory

Some evaluation aviaries are continuations of development-set aviaries (recorded on different days). Publishing the full per-aviary species composition would therefore reveal which evaluation aviaries correspond to which development aviaries, compromising the held-out nature of the data. Instead, we publish a single combined species inventory listing every species that appears somewhere in the evaluation set, without specifying which aviary each species belongs to:

metadata/eval_species_list.csv — 67 species total: 3 main target species, 1 optional target species, and 63 non-target species.

Participants are guaranteed that no other zoo species outside this list appears anywhere in the evaluation audio. The inventory is therefore the safe and tight superset to use when running pre-trained detectors with a species-filter list. The per-aviary target_species column in eval_recording_info.csv (and in the table above) tells participants which species needs to be counted for each aviary; the remaining per-aviary species composition stays private until challenge results are published.

Structure

Audio files are organized identically to the development set, with chunk subdirectories used purely for Hugging Face's per-directory file-count limits:

BioDCASE2026_Bird_Counting/
├── dev_aviary_1/              # ─── development set (unchanged) ───
│   └── ...
├── dev_aviary_6/
│   └── ...
├── eval_aviary_1/             # ─── evaluation set ───
│   ├── chunk_000/
│   │   ├── rec_d1_00_00_45.750000.wav
│   │   ├── rec_d1_00_01_49.wav
│   │   └── ...
│   ├── chunk_001/
│   └── ...
├── eval_aviary_2/
│   └── ...
├── ...
├── eval_aviary_10/
│   └── ...
└── metadata/
    ├── ground_truth.csv             # development set only
    ├── recording_info.csv           # development set
    ├── eval_recording_info.csv      # eval: aviary_id, n_days, n_files, n_chunks, target_species
    └── eval_species_list.csv        # eval: 67-species combined inventory

Filenames follow the same rec_{day}_{HH}_{MM}_{SS}[.ffffff].wav pattern as the development set. The day index d1, d2, ... is local to each aviary and unrelated to the development-set day numbering. Chunk subdirectories have no acoustic significance — all chunks within an eval_aviary_N directory should be treated as a single continuous collection.

Realistic acoustic complexity

The evaluation aviaries exhibit the same realistic acoustic complexity as the development set:

Overlapping vocalizations: Multiple individuals often call simultaneously
Variable call rates: Vocal activity fluctuates across hours and days
Environmental noise: Background sounds from aviary infrastructure and ambient sources
Multi-species mixtures: Non-target species vocalize concurrently in every aviary

Baselines

The baseline system implements a two-stage pipeline: species detection followed by detection-count regression. Full code is available in the baseline repository. You can also see the flow of the pipeline in the repository.

Stage 1: Species Detection

Two detection packages are provided:

ARIA — a fusion ensemble combining BirdNET and PERCH v2 foundation models: pip install aria-inference
BirdNET-only — customized BirdNET detection tailored to this task: pip install aria-inference-birdnet

In the baseline solution repository, the reported birdnet_detections were generated with the default BirdNET model for simplicity, while the ARIA detections were generated with the ARIA inference package. Participants are encouraged to also consider the customized BirdNET-only ARIA package (aria-inference-birdnet), as it has been adapted specifically for this dataset and may provide a stronger starting point than the default BirdNET configuration. For instructions on running the default BirdNET model, please refer to the official BirdNET repository.

Both produce per-segment species detections with confidence scores.

Stage 2: Feature Extraction and Population Estimation

The feature builder extracts 80+ features per (aviary, target species) pair from the detection output, including detection-count statistics (confidence-weighted rate, bout rate), temporal structure (bout duration, inter-bout gaps, active hours), occupancy metrics, and optionally scikit-maad acoustic indices (ACI, NDSI, nROI, etc.) with positive-minus-background contrast features, flock-calling indicators, and adaptive frequency band selection.

Five estimation models are provided, each being the best performer for at least one (target species, detection format) combination:

Model	Best for (ARIA)	Best for (BirdNET)
`flock_corrected_cwr`	Greater flamingo	—
`sim_weighted_cwr`	—	Greater flamingo
`linear_coeff_bout_rate`	Hadada ibis	Hadada ibis
`linear_coeff_cwr`	Red-billed quelea	—
`adaptive_band_contrast`	—	Red-billed quelea

Table 2. Baseline models and their best-performing species/detector combinations.

All models use leave-one-out (LOO) cross-validation. Three models (1, 2, 4) work with detection-only features; two models (3, 5) additionally require acoustic features computed from the original audio.

Baseline Results (ARIA detections)

Species              Best model               Aviary          True   Predicted   Error
---------------------------------------------------------------------------------------
Greater flamingo     flock_corrected_cwr      dev_aviary_2     107      116       +9
Greater flamingo     flock_corrected_cwr      dev_aviary_4     161       97      -64
Greater flamingo     flock_corrected_cwr      dev_aviary_5      52       57       +5
Greater flamingo     flock_corrected_cwr      dev_aviary_6      52       66      +14
Hadada ibis          linear_coeff_bout_rate   dev_aviary_2       6        6       +0
Hadada ibis          linear_coeff_bout_rate   dev_aviary_4       4        4       +0
Red-billed quelea    linear_coeff_cwr         dev_aviary_1     153      153       +0
Red-billed quelea    linear_coeff_cwr         dev_aviary_3      61       61       +0

Table 3. Baseline results with ARIA detections — combined MAE: 11.50, RMSE: 23.45, R²: 0.8279, MAPE: 10.6%.

Baseline Results (BirdNET detections)

Species              Best model               Aviary          True   Predicted   Error
---------------------------------------------------------------------------------------
Greater flamingo     sim_weighted_cwr         dev_aviary_2     107      108       +1
Greater flamingo     sim_weighted_cwr         dev_aviary_4     161       55      -106
Greater flamingo     sim_weighted_cwr         dev_aviary_5      52       74      +22
Greater flamingo     sim_weighted_cwr         dev_aviary_6      52       67      +15
Hadada ibis          linear_coeff_bout_rate   dev_aviary_2       6        7       +1
Hadada ibis          linear_coeff_bout_rate   dev_aviary_4       4        3       -1
Red-billed quelea    adaptive_band_contrast   dev_aviary_1     153      152       -1
Red-billed quelea    adaptive_band_contrast   dev_aviary_3      61       61       +0

Table 4. Baseline results with BirdNET detections — combined MAE: 18.38, RMSE: 38.65, R²: 0.5325, MAPE: 22.5%.

The baseline performs strongly for Red-billed quelea and Hadada ibis, where detection-derived features remain closely related to true population size. Greater flamingo remains the primary challenge due to flock-calling synchronization, where many individuals vocalize simultaneously within the same detection window, reducing the usefulness of raw detection counts as population increases. Improving flamingo estimation is therefore a key opportunity for participants.

Evaluation

Ranking Metric

Mean Absolute Error (MAE) computed across all (aviary, target species) data points in the evaluation set. Lower is better. Systems are ranked by their MAE on the held-out evaluation aviaries.

Supplementary Metrics

Additional metrics will be reported alongside MAE but will not be used for ranking:

Root Mean Squared Error (RMSE): penalizes large individual errors more heavily than MAE.
Coefficient of determination (R²): measures how well predictions track the variance in true population sizes. Note that R² depends on variance in the true values and can be misleading for species with few data points or similar population sizes.
Mean Absolute Percentage Error (MAPE): scale-independent error expressed as a percentage of the true count, useful for comparing performance across species with different population scales.

Optional Secondary Leaderboard

Participants may optionally submit population estimates for two additional categories:

Pied avocet (Recurvirostra avosetta) — present in 4 evaluation aviaries. Distinctive, well-articulated calls; an interesting target for methods that exploit individual call structure.
Non-target species — any species listed in the development set's ground_truth.csv or the evaluation set's eval_species_list.csv.

These predictions contribute to the secondary generalist leaderboard, which showcases systems capable of estimating multiple species simultaneously. They do not affect the main ranking.

Submission

General BioDCASE instructions are available here. This specifies submission naming conventions and the final report template.

Deliverables

Please submit a .zip file containing the following:

Predictions (predictions_{lastname}.csv): A CSV file with columns aviary_id, species, predicted_count containing one row per (aviary, target species) pair from the evaluation set. Predictions for the eight main-target rows below are required. Predictions for Pied avocet aviaries and any non-target species are optional and contribute only to the secondary leaderboard. Main-target rows:

aviary_id,species,predicted_count
eval_aviary_1,Red-billed quelea,100
eval_aviary_2,Red-billed quelea,48
eval_aviary_4,Greater flamingo,32
eval_aviary_6,Hadada ibis,10
eval_aviary_7,Hadada ibis,6
eval_aviary_8,Greater flamingo,180

Optional rows:

eval_aviary_3,Pied avocet,7
eval_aviary_5,Pied avocet,20
eval_aviary_9,Pied avocet,34
eval_aviary_10,Pied avocet,12

Code: Source code for the submitted system, sufficient to reproduce the results.

Report (.pdf): Final report describing the method (see submissions page for details).

Citation

If you use the baseline biodcase-population-estimation repository, please cite the baseline software.

@software{ml4biodiversity2026baseline,
  author       = {Arg{\i}n, Emre and H{\"a}rm{\"a}, Aki and Arslan-Dogan, Aysenur},
  title        = {{BioDCASE 2026 Bird Counting Baseline: Avian Population Estimation
                   from Passive Acoustic Recordings}},
  year         = {2026},
  publisher    = {GitHub},
  url          = {https://github.com/ml4biodiversity/biodcase-population-estimation},
  version      = {1.0.0},
}

If you use the BioDCASE 2026 Bird Counting dataset, please also cite the dataset.

@dataset{ml4biodiversity2026dataset,
  author       = {Arg{\i}n, Emre and H{\"a}rm{\"a}, Aki and Arslan-Dogan, Aysenur},
  title        = {{BioDCASE 2026 Bird Counting: Avian Population Estimation
                   from Passive Acoustic Recordings}},
  year         = {2026},
  publisher    = {Hugging Face},
  url          = {https://huggingface.co/datasets/Emreargin/BioDCASE2026_Bird_Counting},
}

If you use ARIA detections or build on the ARIA methodology, please also cite the ARIA paper.

@inproceedings{argincosta2026aria,
  author       = {Arg{\i}n, Emre and Amado Pereira da Costa, Bernardo and
                  H{\"a}rm{\"a}, Aki and Arslan-Dogan, Aysenur},
  title        = {{ARIA: Acoustic Recognition for Inventory in Aviaries}},
  booktitle    = {Proceedings of the IEEE World Congress on Computational Intelligence
                  (WCCI) / International Joint Conference on Neural Networks (IJCNN)},
  year         = {2026},
  note         = {Accepted, to appear},
}

Support

If you have questions please use the BioDCASE Google Groups community forum, or contact the task organizers at: aysenur.arslan-dogan@maastrichtuniversity.nl

	Aki Härmä Maastricht University
	Emre Argın Maastricht University
	Aysenur Arslan-Dogan Maastricht University

Coordinators

Content