SceneSAM: Integrating 2D Labels for Weakly Supervised 3D Scene Understanding


Julius Koerner (Technical University of Munich), Dogu Tamgac (Technical University of Munich), David Rozenberszki (Technical University of Munich)
The 35th British Machine Vision Conference

Abstract

Reconstruction of class-agnostic segmented 3D scenes presents significant challenges as both the requirements for quality 3D reconstructions and object level annotations require significant hardware and human resources. Thus, we propose SceneSAM an efficient and weakly supervised 3D method capable of reconstructing and re-rendering in-the-wild room-scale scenes with class-agnostic instance masks from a single, unaligned video stream. We leverage a hierarchical grid based representation for implicit fields as a 3D representation and rely on the Segment Anything Model (SAM) for the class-agnostic instance annotations. Our proposed method trains an order of magnitude faster than previous state-of-the-art methods, while also preserving highly detailed segmentation masks and without relying on any closed vocabulary model. For consistent mask supervision of independent video frames, we also introduce a novel self-consistent video segmentation algorithm based on 3D grounded instance proposals. Finally, our approach is agnostic to video registration, as it can be used both with and without camera poses, saving additional significant amount of computation by replacing the industry standard COLMAP optimization with minimal loss in reconstruction quality. We evaluate our method both in synthetic and real-world datasets and show that efficient and robust scene reconstructions are possible both in the color and instance domain within reasonable time constraints.

Citation

@inproceedings{Koerner_2024_BMVC,
author    = {Julius Koerner and Dogu Tamgac and David Rozenberszki},
title     = {SceneSAM: Integrating 2D Labels for Weakly Supervised 3D Scene Understanding},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0933.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection