Medical Image Segmentation Research

MSCAF-TransUNet for Synapse Segmentation

A cleaned research codebase built around MSCAF-TransUNet, a Multi-Scale CNN Attention Fusion extension of the hybrid R50-ViT TransUNet encoder, then packaged for reproducible Google Colab runs.

View Repository Read Original Paper

Backbone: R50-ViT-B/16
Dataset: Synapse multi-organ segmentation
Focus: attention fusion variants and Colab reproducibility

Overview

What this project is actually doing

Base model

The project starts from the hybrid ResNet-50 plus ViT TransUNet encoder-decoder pipeline for medical image segmentation.

Research change

The main modification is not a full redesign. It is a targeted encoder update that injects CNN attention at carefully chosen scales before or during hidden feature fusion.

Practical goal

Keep the codebase small enough for research iteration, and make the experiment easy to rerun on Google Colab with resume checkpoints and Google Drive asset caching.

Attention Update

The attention design behind MSCAF-TransUNet

Hybrid encoder features

Extract multi-scale CNN features from the ResNet branch before patch projection.

Attention refinement

Apply residual channel-spatial attention to selected CNN scales.

Fusion back into hidden features

Bridge the refined CNN features into the hidden representation used by the transformer.

`pre_hidden`

Attention is applied on a selected CNN scale and fused into the hidden feature before patch embedding. This keeps the intervention narrow and controlled.

Typical setup: 1/8

`cnn_fusion`

Attention is applied after selected CNN stages, the skip features are refined, and the selected scales are fused back into the hidden feature together.

Current default: 1/8,1/4,1/2

Results

Current attention result versus the older baseline

MSCAF-TransUNet 76.61

Mean Dice for the current multi-scale CNN attention fusion configuration.

MSCAF-TransUNet 28.80

Mean HD95 from the same evaluation run.

Comparison Stronger HD95 and four organ Dice scores

Compared with the original TransUNet paper, MSCAF-TransUNet is stronger on HD95, Pancreas, Liver, Spleen, and Stomach, while remaining slightly lower on overall Dice.

Reproducibility

Two notebooks, two distinct jobs

1. Drive setup notebook

Use the setup notebook once to cache the Synapse dataset and pretrained R50-ViT-B/16 checkpoint to Google Drive.

Open notebook file

2. Main research notebook

Use the Colab research notebook to rebuild the repo, train MSCAF-TransUNet, evaluate it, and export checkpoints and metrics.

Open notebook file

Checkpoint resume

Training checkpoints are saved every epoch and can resume after a runtime interruption, which matters for long Colab sessions.

Resume granularity: epoch-level

Repository

What remains in the cleaned codebase

The repository was reduced to the core research pieces only: model code, dataset loader, split metadata, and Colab-first reproducibility notebooks. AWS deployment assets, helper ops scripts, and unrelated local tooling were removed.

datasets/ dataset package and Synapse loader

networks/ TransUNet + CNN attention modules

splits/ train and test split metadata

notebooks/ Colab setup and experiment flows

train.py experiment entrypoint

test.py evaluation entrypoint

trainer.py epoch-level resume logic