Architecture¶
This page describes the internal structure of fsatlas for contributors and users who want to understand how the pipeline works.
Project Layout¶
src/fsatlas/
├── __init__.py # Package version
├── cli/
│ ├── __init__.py
│ └── main.py # Click CLI: extract, aggregate, list-atlases, download, generate-lut
├── atlases/
│ ├── __init__.py
│ ├── catalog.yaml # Built-in atlas definitions (31 atlases)
│ ├── *_labels.tsv # Bundled LUT files for volumetric atlases
│ └── registry.py # AtlasRegistry, AtlasSpec, CustomAtlasSpec
└── core/
├── __init__.py
├── aggregate.py # BIDS CSV discovery + wide-format aggregation
├── bids.py # BIDS path construction for bids output layout
├── command.py # run_command() subprocess wrapper (10 min timeout)
├── environment.py # FreeSurferEnv, SubjectPaths
├── extract.py # FreeSurfer command runners + .stats file parsers
├── formats.py # Format handler registry (annot, nifti, dlabel_gii, gca)
├── lut.py # LookupTable: from_tsv, from_annot, merge_measures
└── pipeline.py # run_extraction() orchestrator + FlatWriter/BidsWriter
Data Flow¶
flowchart TD
A[CLI: fsatlas extract] --> B[Resolve atlas\nAtlasRegistry]
A --> C[Discover subjects\nFreeSurferEnv]
B --> D[Pipeline: run_extraction]
C --> D
D --> E[Load LUT once\nLookupTable]
E --> F{Per subject}
F --> G[Validate subject\nSubjectPaths]
G --> H[Get format handler\nFORMAT_HANDLERS]
H --> I[handler.transfer\natlas → subject space]
I --> J[handler.extract\nrun FreeSurfer stats]
J --> K[LUT.merge_measures\nwide-format DataFrame]
K --> L{OutputWriter}
L -->|flat| M[Accumulate → atlas.tsv]
L -->|bids| N[Per-subject CSV\nBIDS tree]
Module Descriptions¶
cli/main.py — Command-Line Interface¶
Built with Click. Provides five user-facing commands:
extract— Main command; orchestrates the full pipeline.aggregate— Combines BIDS-layout per-subject CSVs into one wide-format table.list-atlases— Displays atlas catalog in a Rich table.download— Pre-downloads atlases to the cache.generate-lut— Extracts the embedded colour table from.annotfiles into a reusable LUT TSV.
Global options:
- --freesurfer-license-file — Path to FreeSurfer license.txt (also accepts FS_LICENSE env var).
Responsibilities:
- Calls FreeSurferEnv.detect() to find and validate the FreeSurfer installation.
- Calls AtlasRegistry to resolve the atlas (catalog lookup or custom file path).
- Discovers subjects from $SUBJECTS_DIR, -s flags, or --subjects-file.
- Delegates to run_extraction().
atlases/registry.py — Atlas Registry¶
Three classes:
AtlasSpec — A catalog atlas entry loaded from catalog.yaml. Key properties:
- cache_dir — ~/.cache/fsatlas/atlases/{name}/
- is_downloaded — checks if files are present in cache
- get_file(key) — returns path to a named file in cache
- labels_tsv_path — path to the LUT TSV in cache
- download(force=False, env=None) — downloads all atlas files + LUT
CustomAtlasSpec — A user-provided atlas. Stores paths; auto-detects format from extension.
AtlasRegistry — Loads catalog.yaml and manages the catalog:
- get(name) — looks up an AtlasSpec by ID
- list_atlases() — returns all AtlasSpec entries
- download(name, force, env) — downloads and caches atlas + LUT
- Static factory methods for custom atlases:
- from_custom_surface(lh_annot, rh_annot, labels_tsv=None)
- from_custom_volumetric(nifti_path, labels_tsv=None)
- from_custom_dlabel_gii(lh_dlabel, rh_dlabel, labels_tsv=None)
- from_custom_gca(gca_path, labels_tsv=None)
Type alias:
atlases/catalog.yaml — Atlas Definitions¶
A YAML file bundled with the package. Each entry specifies:
- name: schaefer100-7
family: Schaefer2018
description: "100-parcel 7-network Schaefer 2018 atlas"
format: annot
space: fsaverage
source_url: "https://..."
files:
lh_annot: lh.Schaefer2018_100Parcels_7Networks_order.annot
rh_annot: rh.Schaefer2018_100Parcels_7Networks_order.annot
labels_tsv: labels.tsv
citation: "Schaefer et al. 2018"
bids_name: schaefer100x7
For FreeSurfer built-ins (desikan, destrieux, dkt, aseg): no source_url; atlas files are read directly from the subject's label/ directory.
For atlases not publicly downloadable (e.g. Brainnetome): local_source_dir is used instead of source_url.
core/environment.py — FreeSurfer Environment¶
FreeSurferEnv:
- detect(subjects_dir=None) — class method; reads FREESURFER_HOME, validates installation, detects FS version
- subjects_dir — resolves SUBJECTS_DIR
- list_subjects() — returns valid subject directories (must have lh.white + aseg.mgz)
- fsaverage_dir — path to fsaverage in the FS installation
- find_subject(subject_id) — returns SubjectPaths for a given subject
SubjectPaths:
- Wraps a single subject directory
- Properties: surf_dir, label_dir, mri_dir, stats_dir
- annot_path(hemi, annot_name) — path to {hemi}.{annot_name}.annot
- aseg_mgz, norm_mgz, talairach_xfm, talairach_m3z, sphere_reg — key file paths
- validate() — checks 8 essential files exist; returns list of missing files
core/lut.py — Lookup Table¶
LookupTable — canonical mapping from integer region index to label + hemisphere. Every atlas requires a LUT; it drives the output schema.
Constructors:
- from_tsv(path) — loads TSV; auto-detects 3 formats (header row, two-column headerless, name-only); auto-infers hemisphere from region name
- from_annot(lh_annot, rh_annot) — extracts embedded colour table from .annot files via nibabel
Key methods:
- to_tsv(path) — write LUT to TSV
- to_ctab() — generate FreeSurfer colour table (.ctab) for mri_segstats
- merge_measures(raw_stats, subject_id, tiv, measure_map, join_on) — joins raw extracted stats onto the LUT; returns wide-format DataFrame
Output schema (one row per region per subject):
core/formats.py — Format Handlers¶
Pattern: Abstract base AtlasFormatHandler with four concrete implementations.
class AtlasFormatHandler(ABC):
format_id: str # "annot", "nifti", "dlabel_gii", "gca"
measure_map: dict # raw column → output column names
join_on: str # "label" or "index"
@abstractmethod
def transfer(atlas, subject, env, overwrite) -> TransferResult: ...
@abstractmethod
def extract(atlas, subject, env, transfer_result, force) -> (DataFrame, tiv): ...
TransferResult carries the paths to transferred atlas files:
- Surface: {lh: Path, rh: Path}
- Volumetric: {volume: Path}
Concrete handlers:
| Handler | Format | Transfer command | Extract command | Measures |
|---|---|---|---|---|
AnnotHandler |
.annot |
mri_surf2surf |
mris_anatomical_stats |
9 cortical |
NiftiHandler |
.nii/.nii.gz |
mri_vol2vol |
mri_segstats |
7 volumetric |
DlabelGiiHandler |
.dlabel.gii |
mris_convert + mri_surf2surf |
via AnnotHandler | 9 cortical |
GcaHandler |
.gca |
mri_ca_label |
mri_segstats |
7 volumetric |
Handler registry:
FORMAT_HANDLERS = {
"annot": AnnotHandler(),
"nifti": NiftiHandler(),
"dlabel_gii": DlabelGiiHandler(),
"gca": GcaHandler(),
}
core/extract.py — Stats Extraction¶
Low-level runners called by the format handlers:
_run_anatomical_stats(subject, hemi, atlas_name, annot_path, env)→.statspath_run_segstats(subject, atlas_name, vol_path, ctab_path, env)→.statspath
Parsers:
- _parse_cortical_stats_file(path) → DataFrame (StructName + 9 measure columns)
- _parse_segstats_file(path) → DataFrame (10 columns from mri_segstats output)
- _parse_etiv_from_header(path) → eTIV float
core/command.py — Subprocess Wrapper¶
Sets FREESURFER_HOME and SUBJECTS_DIR in the subprocess environment. Runs with a 10-minute timeout. Raises RuntimeError on non-zero exit with captured stderr.
core/pipeline.py — Orchestrator¶
Single public function:
def run_extraction(
atlas: AnyAtlasSpec,
subjects: list[str],
env: FreeSurferEnv,
output_dir: Path,
force: bool = False,
output_layout: str = "flat",
) -> dict[str, Path]
For each atlas run:
1. Loads the LUT once (shared across all subjects).
2. Initialises an OutputWriter (FlatWriter or BidsWriter).
3. For each subject: validate → transfer → extract → merge with LUT → write.
4. Catches per-subject errors; logs to {atlas}_failures.csv.
FlatWriter — accumulates all subjects in memory; writes a single {atlas}.tsv at the end.
BidsWriter — writes a per-subject CSV for each structure (cortical/subcortical) inside a BIDS-like directory tree.
core/aggregate.py — BIDS Aggregation¶
Combines per-subject BIDS CSV outputs (from BidsWriter) into a single wide-format DataFrame without invoking FreeSurfer.
Key functions:
- discover_atlases(bids_dir) — scans for atlas-* directories under sub-*/[ses-*/]anat/; returns sorted atlas names
- discover_csv_files(bids_dir, atlas, structure, subjects) — globs for CSV files matching the BIDS naming pattern; supports optional structure and subject filtering
- _parse_entities_from_path(csv_path) — extracts sub, ses, atlas, structure from a BIDS filename
- _standardize_dataframe(df, structure, atlas, session) — keeps universal ID columns + known measure columns + tiv_mm3; injects session, atlas, structure metadata; drops atlas-specific extras (e.g. hemi, name); renames gray_matter_volume_mm3 → volume_mm3
- aggregate(bids_dir, atlas, structures, subjects) — discovers, reads, standardises, and concatenates all matching CSVs
Output schema (one row per subject × region):
Cortical and subcortical rows are stacked; columns not applicable to a structure type are NaN.
core/bids.py — BIDS Path Construction¶
parse_bids_entities(subject_id)— extractssub/seslabels from BIDS-formatted subject IDs; falls back to sanitising non-BIDS IDs.build_bids_path(output_dir, entities, atlas_bids_name, structure)— builds the full output path:
{output_dir}/sub-{sub}/[ses-{ses}/]anat/atlas-{atlas}/sub-{sub}[_ses-{ses}]_atlas-{atlas}_structure-{structure}.csv
Caching Strategy¶
| Cache location | What is cached | Cleared by |
|---|---|---|
~/.cache/fsatlas/atlases/ |
Downloaded atlas files + LUT TSVs | fsatlas download --force |
{subject}/label/{hemi}.{atlas}.annot |
Transferred surface atlas | --force flag |
{subject}/mri/atlas/{atlas}.nii.gz |
Transferred volumetric atlas | --force flag |
Design Decisions¶
Why wide format?
Wide-format output (one row per region per subject, measures as columns) is immediately usable for common analysis patterns — filtering by region, pivoting to a subjects × regions matrix, or computing ratios. The LUT drives the schema, ensuring every region appears as a row even if the FreeSurfer command returned no data (those cells are NaN).
Why the format-handler pattern?
Each atlas format (.annot, .nii, .dlabel.gii, .gca) requires different FreeSurfer commands for transfer and extraction. Encapsulating this in a handler class keeps pipeline.py format-agnostic and makes adding new formats straightforward.
Why FreeSurfer CLI wrappers instead of Python bindings? FreeSurfer's Python bindings are incomplete and version-specific. Wrapping the CLI commands ensures compatibility with any FreeSurfer 8.x installation and makes invocations transparent and debuggable.
Why nearest-neighbor interpolation for volumetric atlases? Atlas labels are integers. Trilinear or sinc interpolation would produce fractional values that map to no valid label. Nearest-neighbor preserves label integrity.
Why affine (talairach) registration for NIfTI atlases?
FreeSurfer's recon-all computes talairach.xfm for every subject as part of the standard workflow. Using it requires no extra processing. For atlases requiring non-linear registration, register externally and pass the native-space result as a custom volumetric atlas. The GCA handler uses talairach.m3z (non-linear).