Base model
The project starts from the hybrid ResNet-50 plus ViT TransUNet encoder-decoder pipeline for medical image segmentation.
Medical Image Segmentation Research
A cleaned research codebase built around MSCAF-TransUNet, a Multi-Scale CNN Attention Fusion extension of the hybrid R50-ViT TransUNet encoder, then packaged for reproducible Google Colab runs.
Overview
The project starts from the hybrid ResNet-50 plus ViT TransUNet encoder-decoder pipeline for medical image segmentation.
The main modification is not a full redesign. It is a targeted encoder update that injects CNN attention at carefully chosen scales before or during hidden feature fusion.
Keep the codebase small enough for research iteration, and make the experiment easy to rerun on Google Colab with resume checkpoints and Google Drive asset caching.
Attention Update
Extract multi-scale CNN features from the ResNet branch before patch projection.
Apply residual channel-spatial attention to selected CNN scales.
Bridge the refined CNN features into the hidden representation used by the transformer.
pre_hiddenAttention is applied on a selected CNN scale and fused into the hidden feature before patch embedding. This keeps the intervention narrow and controlled.
Typical setup: 1/8
cnn_fusionAttention is applied after selected CNN stages, the skip features are refined, and the selected scales are fused back into the hidden feature together.
Current default: 1/8,1/4,1/2
Results
Mean Dice for the current multi-scale CNN attention fusion configuration.
Mean HD95 from the same evaluation run.
Compared with the original TransUNet paper, MSCAF-TransUNet is stronger on
HD95, Pancreas, Liver, Spleen,
and Stomach, while remaining slightly lower on overall Dice.
Reproducibility
Use the setup notebook once to cache the Synapse dataset and pretrained R50-ViT-B/16 checkpoint to Google Drive.
Open notebook fileUse the Colab research notebook to rebuild the repo, train MSCAF-TransUNet, evaluate it, and export checkpoints and metrics.
Open notebook fileTraining checkpoints are saved every epoch and can resume after a runtime interruption, which matters for long Colab sessions.
Resume granularity: epoch-levelRepository
The repository was reduced to the core research pieces only: model code, dataset loader, split metadata, and Colab-first reproducibility notebooks. AWS deployment assets, helper ops scripts, and unrelated local tooling were removed.
datasets/ dataset package and Synapse loadernetworks/ TransUNet + CNN attention modulessplits/ train and test split metadatanotebooks/ Colab setup and experiment flowstrain.py experiment entrypointtest.py evaluation entrypointtrainer.py epoch-level resume logic