Training-Free Zero-Shot Semantic Segmentation with LLM Refinement


Yuantian Huang (CyberAgent, Inc.), Satoshi Iizuka (University of Tsukuba, Tsukuba University), Kazuhiro Fukui (University of Tsukuba)
The 35th British Machine Vision Conference

Abstract

Semantic segmentation models are predominantly based on supervised or unsupervised learning methodologies, which require substantial effort in annotation or training. In this study, we present a novel framework that leverages multiple pre-trained foundational models for semantic segmentation tasks on previously unseen images, eliminating the need for additional training. Our framework utilizes image recognition models to transform an input image into textual information. This text information is then used to engage an advanced Large Language Model (LLM) to predict the presence of specific classes within the given image. The labels predicted by the LLM are subsequently processed through an open-set detection and segmentation model to generate our ultimate outcomes. To ensure that the class information is precisely aligned with the intended context, we incorporate both a pre-refinement and a post-refinement procedure utilizing the LLM. The segmentation model is further modified to accept both bounding boxes and point prompts, resulting in higher accuracy than original usage that only accepts bounding boxes as input. Our proposed framework accomplishes training-free zero-shot semantic segmentation, requiring only the input image and customizable target classes for different scenarios as inputs. Experiments indicate that the proposed framework demonstrates the capacity to execute semantic segmentation effectively across various datasets. Notably, our results surpass those of existing unsupervised models despite the absence of any training procedure.

Citation

@inproceedings{Huang_2024_BMVC,
author    = {Yuantian Huang and Satoshi Iizuka and Kazuhiro Fukui},
title     = {Training-Free Zero-Shot Semantic Segmentation with LLM Refinement},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0601.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection