Towards Generative Class Prompt Learning for Fine-grained Visual Recognition


Soumitri Chattopadhyay (University of North Carolina at Chapel Hill), Sanket Biswas (Computer Vision Center, Universitat Autonoma de Barcelona), Emanuele Vivoli (Universidad Autonoma de Barcelona ), Josep Llados (Computer Vision Center, Universitat Autonoma de Barcelona)
The 35th British Machine Vision Conference

Abstract

Although foundational vision-language models (VLMs) have proven to be very successful for various semantic discrimination tasks, they still struggle to perform faithfully for fine-grained categorization. Moreover, foundational models trained on one domain do not generalize well on a different domain without fine-tuning. We attribute these to the limitations of the VLM's semantic representations and attempt to improve their fine-grained visual awareness using generative modeling. Specifically, we propose two novel methods: Generative Class Prompt Learning (GCPL) and Contrastive Multi-class Prompt Learning (CoMPLe). Utilizing text-to-image diffusion models, GCPL significantly improves the visio-linguistic synergy in class embeddings by conditioning on few-shot exemplars with learnable class prompts. CoMPLe builds on this foundation by introducing a contrastive learning component that encourages inter-class separation during the generative optimization process. Our empirical results demonstrate that such a generative class prompt learning approach substantially outperform existing methods, offering a better alternative to few shot image recognition challenges. The source code will be made available at: https://github.com/soumitri2001/GCPL.

Citation

@inproceedings{Chattopadhyay_2024_BMVC,
author    = {Soumitri Chattopadhyay and Sanket Biswas and Emanuele Vivoli and Josep Llados},
title     = {Towards Generative Class Prompt Learning for Fine-grained Visual Recognition},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0200.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection