Background
Perivascular spaces (PVS) form a brain-wide network of channels surrounding arterioles, capillaries, and venules that facilitate the movement of cerebrospinal fluid (CSF) and contribute to the clearance of metabolic and neurotoxic waste products from the brain. PVS are dynamic structures that can dilate or constrict, becoming detectable or undetectable in vivo on MRI. When visible, they typically appear as small ovoid or tubular structures with CSF-like signal intensity—hyperintense on T2-weighted images and hypointense on T1-weighted and FLAIR scans. Persistent enlargement of PVS to the point of MRI visibility is generally considered pathological and is regarded as one of the earliest structural indicators of cerebrovascular dysfunction and impaired brain waste clearance. Because disrupted metabolic clearance is a common feature of many proteinopathies and dementias, it has been hypothesised—and in some cases demonstrated—that such alterations may contribute to downstream cognitive impairment.
Challenging quantification
The growing recognition of PVS as a non-invasive imaging marker of compromised brain health has stimulated the development of computational methods for their automated quantification and monitoring. Broadly, two categories of approaches have been proposed for MRI-based PVS segmentation: non-machine learning methods and machine learning methods. Non-machine learning approaches can achieve high sensitivity but typically require careful parameter tuning and extensive post-processing to control false positives. Machine learning methods have the potential to overcome some of these limitations, provided that models are trained on sufficiently large, diverse, and well-annotated datasets. In practice, however, such datasets are rarely available and, thus, many models are trained on limited data, which restricts their ability to generalise to unseen datasets.
This limitation was highlighted in the MICCAI 2024 PVS Segmentation Challenge (DOI: 10.48550/arXiv.2512.18197), where models trained on a shared dataset performed well on similar data but exhibited substantial performance degradation on previously unseen sites, with Dice similarity coefficients dropping to zero in some cases. These findings illustrate a broader issue in medical image segmentation: models often perform well in-distribution but fail when applied to heterogeneous real-world data.
New avenues
One of the main reasons deep learning algorithms struggle to generalise to unseen datasets is the scarcity of large, diverse, and well-annotated training data—resources that are extremely time-consuming to produce. As this limitation is unlikely to be resolved in the near term, methods capable of compensating for the lack of heterogeneous labelled data are essential.
Domain randomisation offers a promising solution. By using procedural image generation models conditioned on segmentation maps and driven by randomised parameters, domain randomisation produces highly diverse synthetic training data. This diversity encourages models to learn domain-independent representations that are robust to variations in acquisition protocols, scanners, and populations. A recent study by members of the organising team demonstrated that domain randomisation can achieve accurate out-of-sample PVS segmentation and, in some cases, outperform existing techniques (DOI: 10.1101/2025.10.22.25337423). However, median area-under-the-precision-recall-curve values remaining below 0.7 indicate substantial room for further improvement.
Challenge purpose
This challenge aims to identify the most effective domain randomisation strategies for achieving robust out-of-sample PVS segmentation. Rather than just benchmarking different deep learning architectures, we focus also on comparing domain randomisation strategies.
Evaluation will be conducted entirely out-of-sample using MRI data acquired at 1.5T, 3T, and 7T with both isotropic and anisotropic resolutions. The test datasets originate from cohorts spanning a wide spectrum of clinical conditions, including normal cognition, sleep deprivation, post-COVID syndrome, hypertensive arteriopathy, cerebral amyloid angiopathy, heart failure, mild cognitive impairment, Parkinson’s disease, dementia with Lewy bodies, and Alzheimer’s disease. This heterogeneous evaluation design ensures that successful methods demonstrate robustness across scanners, populations, and clinical contexts.
What to expect
All teams will begin with a provided baseline package that includes:
- a deep learning segmentation architecture
- a functional baseline domain randomisation pipeline
- a Dockerisation template
Participants are expected to replace both the deep learning model and domain randomisation component with their own implementation, train the model locally, and submit the Dockerised pipeline. All methods will be executed by the organisers within a unified evaluation framework and tested strictly out-of-sample across multiple MRI cohorts.
Models must segment individual PVS across the entire brain—including supratentorial white matter and the basal ganglia—while reliably distinguishing them from confounding structures and artefacts such as white matter hyperintensities, lacunes, intracerebral haemorrhage, other lesion types, and imaging artefacts including noise, Gibbs ringing, and motion.
At the same time, models must remain robust to substantial sources of variability, including imaging sequence (T1-weighted vs T2-weighted), voxel resolution (isotropic vs anisotropic), MRI field strength (1.5T, 3T, and 7T), scanner vendor, and heterogeneous patient populations with varying disease burden.
The challenge therefore provides a rigorous testbed for evaluating how domain-randomised training enables models to generalise across sites, scanners, and populations.
What not to expect
No imaging data, labels, or derivatives will be distributed to participants at any point. Teams will only receive baseline code, model architecture, and documentation. All submissions will be run internally in a secure environment, meaning no participant or third party will have access to the data during or after the challenge.
Impact
Through this challenge, we aim to address one of the most persistent barriers to clinical translation in medical imaging: the limited ability of deep learning models to generalise reliably across scanners, institutions, and patient populations. By shifting the focus from just architectural innovation to synthetic data generation, the challenge reframes a long-standing constraint in medical image analysis—the scarcity of annotated data—as an opportunity for methodological innovation.
Evaluation on highly heterogeneous datasets will also allow us to better understand when and why segmentation algorithms fail, helping to identify biases, guide methodological improvements, and support the development of more reliable clinical tools. Ultimately, robust PVS segmentation may advance research into cerebral small vessel disease and impaired waste clearance mechanisms.
How to participate?
Participation can take two forms: either by taking part directly in the domain randomisation challenge or by submitting a previously developed method (machine learning-based or non-machine learning-based).
Teams must submit their PVS segmentation methods as Docker containers that can be executed within a controlled evaluation environment. Docker templates will be provided at the start of the challenge and made available on the challenge website:
Unlike most challenges, no imaging data will be shared with participating teams. Instead, the organisers will execute all submitted methods within a unified framework and evaluate their performance strictly out-of-sample across multiple cohorts with diverse imaging characteristics and clinical conditions.
At the conclusion of the challenge, a benchmark paper summarising the results will be prepared. Up to three members from each participating team may be included as co-authors.
Participation information
Teams may submit Dockerised PVS segmentation methods until the Docker submission deadline (see Important Dates).
Both machine learning and non-machine learning approaches are eligible, provided that the method targets PVS segmentation.
The leaderboard will be updated after each valid submission. Each team may submit up to two entries during the submission period. Submissions that fail to execute successfully will not be counted.
Teams are also invited to submit a late-breaking abstract to the special session (up to 5 pages including references; seehttps://aiih.cc/paper-submission/).
AIiH Special Session
We will hold a special session at AIiH 2026 to present and discuss the challenge results, with the aim of identifying which PVS segmentation methods truly generalise to unseen, heterogeneous MRI data. The session will begin with a presentation by the organising team summarising the challenge design, test datasets, evaluation procedures, and main findings. This will be followed by presentations from the top-performing teams, who will be invited to present their methodological strategies and key insights.
Post-AIiH challenge paper
Following the session, the organising team will coordinate a benchmark manuscript describing the challenge design, participating methods, and key findings. The paper will present the benchmarking results, highlight methodological insights, and discuss implications for robust and generalisable PVS segmentation across heterogeneous MRI datasets.
Participating teams will be invited to contribute short descriptions of their methods. Up to three members from each team may be listed as co-authors, provided they contribute to the preparation of the manuscript and approve the final version.
Important dates
- Release of resources: 20 March 2026
- Registration period: 20 March 2026 – 10 Mai 2026
- Docker submission window: 20 March 2026 – 10 June 2026
- Notification of top three teams: 15 June 2026
- Late-abstract submission deadline: 30 June 2026
- Late-breaking author registration deadline: 17 July 2026
- Announcement of challenge results: 26 – 28 August 2026
Organisers
- Jose Bernal (Contact person), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany, jose.bernal@fau.de
- Sumeet Dash, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany, sumeet.dash@fau.de
- Yuan Cao, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany, yuan.cao@fau.de
- Maria del C. Valdés Hernández, The University of Edinburgh, United Kingdom, M.Valdes-Hernan@ed.ac.uk
- Roberto Duarte Coello, The University of Edinburgh, United Kingdom, rduarte@exseed.ed.ac.uk
- Joanna M. Wardlaw, The University of Edinburgh, United Kingdom, Joanna.Wardlaw@ed.ac.uk
- Stefanie Schreiber, Otto-von-Guericke Universität Magdeburg, Germany, stefanie.schreiber@med.ovgu.de
- Hendrik Mattern, Otto-von-Guericke Universität Magdeburg, Germany, hendrik.mattern@ovgu.de
- Katja Neumann, Otto-von-Guericke Universität Magdeburg, Germany, katja.neumann@med.ovgu.de
- Patrick Müller, Otto-von-Guericke Universität Magdeburg, Germany, patrick.mueller@med.ovgu.de
- Daniel Behme, Otto-von-Guericke Universität Magdeburg, Germany, daniel.behme@med.ovgu.de
- Eric Einspänner, Otto-von-Guericke Universität Magdeburg, Germany, eric.einspaenner@med.ovgu.de
- Serena Tang, University of California San Francisco, United States, serenatang@berkeley.edu
- Duygu Tosun, University of California San Francisco, United States, duygu.tosun@ucsf.edu
- Bianca Besteher, Jena University Hospital, Germany, Bianca.Besteher@med.uni-jena.de
- Merel M. van der Thiel, Maastricht University Medical Center, Netherlands, merel.vanderthiel@maastrichtuniversity.nl
- Jacobus F. A. Jansen, Maastricht University Medical Center, Netherlands, jacobus.jansen@mumc.nl
- Eva M. van Heese, Amsterdam University Medical Centre, Netherlands, e.vanheese@amsterdamumc.nl
- Max A. Laansma, Amsterdam University Medical Centre, Netherlands, m.laansma@amsterdamumc.nl