Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection


Christian Fruhwirth-Reisinger (Graz University of Technology), Wei Lin (Johannes Kepler University Linz), Dušan Malić (Graz University of Technology), Horst Bischof (Graz University of Technology), Horst Possegger (Graz University of Technology)
The 35th British Machine Vision Conference

Abstract

Accurate 3D object detection in LiDAR point clouds is crucial for autonomous driving systems. To achieve state-of-the-art performance, the supervised training of detectors requires large amounts of human-annotated data, which is expensive to obtain and restricted to predefined object categories. To mitigate manual labeling efforts, recent un supervised object detection approaches generate class-agnostic pseudo-labels for moving objects, subsequently serving as supervision signal to bootstrap a detector. Despite promising results, these approaches do not provide class labels or generalize well to static objects. Furthermore, they are mostly restricted to data containing multiple drives from the same scene or images from a precisely calibrated and synchronized camera setup. To overcome these limitations, we propose a vision-language-guided unsupervised 3D detection approach that operates exclusively on LiDAR point clouds. We transfer CLIP knowledge to classify point clusters of static and moving objects, which we discover by exploiting the inherent spatio-temporal information of LiDAR point clouds for clustering, tracking, as well as box and label refinement. Our approach outperforms state-of-the-art unsupervised 3D object detectors on the Waymo Open Dataset (+23 AP 3D) and Argoverse 2 (+7.9 AP 3D) and provides class labels not solely based on object size assumptions, marking a significant advancement in the field.

Citation

@inproceedings{Fruhwirth-Reisinger_2024_BMVC,
author    = {Christian Fruhwirth-Reisinger and Wei Lin and Dušan Malić and Horst Bischof and Horst Possegger},
title     = {Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0545.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection