Textual Attention RPN for Open-Vocabulary Object Detection


Tae-Min Choi (Korea Institute of Science and Technology), Inug Yoon (Korea Advanced Institute of Science & Technology), Jong-Hwan Kim (Korea Advanced Institute of Science and Technology), Juyoun Park (Korea Institute of Science and Technology (KIST) )
The 35th British Machine Vision Conference

Abstract

Open-vocabulary object detection (OVD) is a computer vision task that detects and classifies objects from categories not seen during training. While recent OVD methods primarily focus on aligning region embeddings with visual-language pre-trained models like CLIP for classification, object detection requires effective localization as well. However, existing methods often use a proposal generator biased toward the training data, which creates a bottleneck in performance improvement. To address this challenge, we introduce the Textual Attention Region Proposal Network (TA-RPN). This network enhances proposal generation by integrating visual and textual features from the CLIP text encoder, utilizing pixel-wise attention for a comprehensive fusion across the image space. Our approach also incorporates prompt learning to optimize textual features for better localization. Evaluated on the COCO and LVIS benchmarks, TA-RPN outperforms existing state-of-the-art methods, demonstrating its effectiveness in detecting novel object categories.

Citation

@inproceedings{Choi_2024_BMVC,
author    = {Tae-Min Choi and Inug Yoon and Jong-Hwan Kim and Juyoun Park},
title     = {Textual Attention RPN for Open-Vocabulary Object Detection},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0085.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection