Making a Vascular Model Repository — Design Decisions, Metadata, and Workflows

Building a reproducible, standardized vascular model repository with traceability, quality control, and support for automated workflows.

Introduction

In many of my recent projects on patient-specific vascular modeling — from segmentation to simulation — I’ve run into the same problem over and over: managing the data.
Not just storing it, but doing so in a way that’s traceable, reproducible, and easy for others (and algorithms) to use.

That challenge became the motivation for creating a vascular model repository (see VMR here)— a structured system for organizing, versioning, and sharing vascular data with clear metadata and consistent workflows. What follows is an overview of the design decisions and practical trade-offs involved in building it.


Traceability and Persistent Identifiers

One of the most important principles is traceability.
Once data enters the repository, its full history should stay visible — including updates or replacements over time.

Each dataset gets a persistent identifier (DOI), ensuring that anyone citing or using it can always trace back to the exact version referenced.
When updates are made, the old versions remain archived and linked through a clear version history.
That version lineage is what gives the data real scientific credibility.


Naming Structure and Standardization

Naming conventions sound trivial until you try to merge data from multiple sources — hospitals, labs, and research groups, each with their own systems.

The repository uses a standardized naming pattern to keep things consistent:
{StudyName}{PatientID}{Region}{Modality}{Version}
Example: CCTA001_LAD_CT_v2

But older datasets don’t always fit. For those, we wrote mapping scripts that automatically translate legacy names into the new format.
It’s a constant balance — enforcing standards without breaking compatibility with existing archives.


Hosting and Backup

Reliability starts with where the data lives.
The repository runs on a dedicated research server with backups, so no single point of failure can wipe anything out.

We should also experiment with cloud-based object storage (S3-compatible) for long-term archiving, which would make scaling and collaboration much easier while keeping version control intact.


Workflow for Adding New Data

Adding data should be simple, not chaotic.
The submission process follows three main steps:

  1. Upload — contributors upload their model files (segmentations, meshes, or simulation results).
  2. Metadata Form — they fill in key details: study name, region, modality, preprocessing steps, and license.
  3. Validation Check — automated scripts confirm that the files and metadata follow the required format and naming rules.

This keeps the workflow lightweight, but structured enough to guarantee completeness and consistency.


Quality Control

Every dataset should go through quality control (QC) before it’s officially added.
That includes:

Contributors should also be able to run the same QC scripts locally before submitting, which helps maintain a uniform quality standard across contributions.


Automated and AI-Based Validation

Manual review doesn’t scale forever.
That’s where AI-assisted validation comes in — lightweight automated checks that flag potential issues, such as:

These don’t replace human review but act as an early filter, keeping the process fast while maintaining trust in the data.


Data Structure and ML Compatibility

A big design goal was making data not just organized, but usable.
Consistent folder hierarchies, standardized formats (.nii.gz, .vtp, .stl), and structured JSON metadata mean that ML or coding workflows can plug in directly.

With this structure, training pipelines can automatically parse metadata, load the correct models, and run experiments — a major step toward reproducible research at scale.


Looking Ahead

This repository isn’t just a storage project; it’s an attempt to make vascular data reliable, transparent, and usable.
Having structured, traceable datasets doesn’t just help with reproducibility — it builds trust and speeds up collaboration across groups.

Next steps include: