U-Mamba2 and U-Mamba2-SSL

Background

Cone-Beam Computed Tomography (CBCT) is a widely used imaging modality in dentistry. It provides comprehensive 3D volumetric information and excellent visualization of the orofacial region, including jaws, teeth, nerves. Accurate segmentation of individual anatomical structures in CBCT images is crucial in applications such as dental diagnosis, treatment, and surgical planning. However, manual segmentation of CBCT scans requires specialized training and is extremely time-consuming due to its three-dimensional nature, high resolution containing a massive number of voxels and the high variability across scans, making it impractical to scale up in practice. Thus, there is a strong demand for robust and efficient CBCT segmentation algorithms, as well as developing effective semi-supervised approaches with only limited labeled data while leveraging a large amount of unlabeled CBCT scans, to improve the accuracy and efficiency of dental care and ultimately lead to better patient outcomes.

CBCT image with labeled anatomies mask (ToothFairy3)

U-Mamba2

We propose U-Mamba2, a hybrid CNN-SSD architecture for 3D image segmentation. U-Mamba2 extends the previous U-Mamba model by leveraging the Mamba2 SSD framework that simplifies the Mamba architecture with stronger constraints imposed on the hidden space structure. Mamba2 introduced several architectural changes to enable tensor and sequence parallelism, providing a significant speedup without compromising performance. Similar to U-Mamba, U-Mamba2 can effectively extract local spatial features via CNN and capture global long-range dependencies with Mamba2. We implement interactive click prompts with cross-attention blocks and incorporate several domain knowledge to address key challenges of dental anatomy segmentation in CBCT.

U-Mamba2 is inspired by U-Mamba, where we integrate the strengths of U-Net and Mamba2 to efficiently capture global information. U-Mamba2 follows a structure similar to U-Net, with a symmetric encoder-decoder architecture that extracts image features across multiple scales. Residual connections between the encoder and decoder blocks at each stage facilitate the fusion of low-level and high-level features. As convolutional operations are inherently localized, we leverage Mamba2 to enhance the vanilla U-Net’s limited capability to model global long-range dependencies in images by treating the features as long sequences. Similar to Mamba, Mamba2 scales linearly with sequence length but leverages the SSD framework to constrain the internal recurrent structure and uses matrix multiplication instead of selective scan, thereby improving efficiency through parallelism.

Furthremore, we incorporate several domain knowledge into the model design to address key challenges of dental anatomy segmentation in CBCT, in the scope of the ToothFairy3 challenge. This includes label smoothing of related anatomies, weighted loss for tiny structures, left-right mirroring augmentation and post-processing based on structure volume.

U-Mamba2-SSL

We extend the implementation of U-Mamba2 for semi-supervised learning and propose the U-Mamba2-SSL. At the high-level, the U-Mamba2-SSL framework consists of three training stages where we first pre-train the U-Mamba2 model with reconstruction objectives, then combine supervised loss for the labeled data and unsupervised loss with consistency regularization for the unlabeled data. The final third stage introduces pseudo labeling to the training objectives.

To exploit the vast amount of unlabeled CBCT data, we first pre-train U-Mamba2 with the disruptive autoencoder on all provided data. Then, the second training stage involves using the labeled data for supervised learning and the unlabeled data for unsupervised learning via consistency regularization techniques in the input and feature spaces. Lastly, the final stage introduces the pseudo labeling method to the training procedure of the previous stage, with a lower loss weight to further optimize the model weights.

Experiments Results

We visualizes the qualitative comparison between the ground truth and our model’s predictions of the scans with the highest and lowest Dice score, in the top and bottom rows, respectively, for our internal validation set of the ToothFairy3 Challenge (U-Mamba2) and the validaton set of the STSR 2025 challenge (U-Mamba2-SSL).

U-Mamba2

For ToothFairy3, we observe that in most cases, U-Mamba2 produces precise segmentation predictions, showcasing the effectiveness of incorporating dental domain knowledge into the model design. Furthermore, we observe that U-Mamba2 can accurately localize the three tiny structures (incisive and lingual nerves), producing visually acceptable segmentations. In the worst-case scenario, although the scan is imperfect due to image artifacts caused by metallic objects, false positives are primarily confined around the image edge or confusion between the actual tooth and the crown or implant, underscoring U-Mamba2’s robustness under noisy conditions.

Qualitative results of U-Mamba2 on the validation set of Task 1. The 3D render and a representative 2D slice are shown for: (Top) the best scoring case and (Bottom) the worst scoring case.

Ground Truth

Prediction

U-Mamba2-SSL

For STSR 2025, we observe that our method can accurately differentiate between the tooth and different classes of pulp and root canal. The failure cases of our method typically stem from the inability to precisely predict the thickness and the length or extent of the pulp. Moreover, our model also struggles with limited field of view (LFOV) CBCTs where it predicts more false positives around the image edges.

Qualitative results of U-Mamba2-SSL on the internal validation set. The 3D render and a representative 2D slice are shown for: (Top) the best scoring case and (Bottom) the worst scoring case.

Ground Truth

Prediction

Ablation Study

We analyse the effectiveness of our method by studying the effect of the incorporated dental knowledge for U-Mamba2, and the three semi-supervisied learning stages for U-Mamba2-SSL.

Label Smoothing	Weighted Loss	L/R Mirroring	Dice	HD95	Dice (ILN)	HD95 (ILN)
❌	❌	❌	0.867	42.36	0.617	38.41
✅	❌	❌	0.872	40.74	0.628	38.15
❌	✅	❌	0.870	41.31	0.635	37.99
❌	❌	✅	0.871	41.20	0.642	36.48
✅	✅	✅	0.873	41.08	0.646	35.21

Ablation Study of U-Mamba2 for the internal validation set of Task 1. ILN indicates the metrics for the left and right incisive nerves and the lingual nerve.

Pre-training	Consistency Regularisation	Pseudo Labeling	Dice	NSD	mIoU	IA	Average
❌	❌	❌	0.965	0.998	0.930	0.464	0.839
✅	❌	❌	0.967	0.998	0.937	0.731	0.908
✅	✅	❌	0.967	0.999	0.935	0.736	0.910
✅	✅	✅	0.967	0.999	0.935	0.738	0.910

Ablation Study of U-Mamba2-SSL for the validation set of STSR2025.

Challenges Results

We won all three segmentation challenges in the ODIN 2025 workshop at MICCAI 2025, including Tasks 1 and 2 of the ToothFairy3 challenge, and Task 1 of the STSR 2025 challenge.

Leaderboard: ToothFairy 3; Task 1 in STSR 2025.

Presentation Slides

BibTeX

@inproceedings{u-mamba2,
   title={U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT}, 
   author={Zhi Qin Tan and Xiatian Zhu and Owen Addison and Yunpeng Li},
   year={2025},
   booktitle={Medical Image Computing and Computer Assisted Intervention (MICCAI)},
   maintitle={Workshop on Oral and Dental Image Analysis (ODIN)},
}

@inproceedings{u-mamba2-ssl,
   title={U-Mamba2-SSL for Semi-Supervised Tooth and Pulp Segmentation in CBCT}, 
   author={Zhi Qin Tan and Xiatian Zhu and Owen Addison and Yunpeng Li},
   year={2025},
   booktitle={Medical Image Computing and Computer Assisted Intervention (MICCAI)},
   maintitle={Workshop on Oral and Dental Image Analysis (ODIN)},
}

Scaling State Space Models for Dental Anatomy Segmentation and Semi-supervised Tooth and Pulp Segmentation in CBCT

Abstract

Background

U-Mamba2

U-Mamba2-SSL

Experiments Results

U-Mamba2

U-Mamba2-SSL

Ablation Study

Challenges Results

Presentation Slides

BibTeX