Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss


Zhi Cai (Beijing University of Aeronautics and Astronautics), Songtao Liu (Megvii Technology Inc.), Guodong Wang (Beijing University of Aeronautics and Astronautics), Zeming Li (BYTEDANCE), Zheng Ge (Megvii Technology Inc.), Xiangyu Zhang (MEGVII Technology), Di Huang (Beihang University)
The 35th British Machine Vision Conference

Abstract

DETR has set up a simple end-to-end pipeline for object detection by formulating this task as a set prediction problem, showing promising potential. Despite its notable advancements, this paper identifies two key forms of misalignment within the model: classification-regression misalignment and cross-layer target misalignment. Both issues impede DETR's convergence and degrade its overall performance. To tackles both issues simultaneously, we introduce a novel loss function, termed as Align Loss, designed to resolve the discrepancy between the two tasks. Align Loss guides the optimization of DETR through a joint quality metric, strengthening the connection between classification and regression. Furthermore, it incorporates an exponential down-weighting term to facilitate a smooth transition from positive to negative samples. Align-DETR also employs many-to-one matching for supervision of intermediate layers, akin to the design of $\mathcal{H}$-DETR , which enhances robustness against instability. We conducted extensive experiments, yielding highly competitive results. Notably, our method achieves a $49.3\%~(+0.6)$ AP on the $\mathcal{H}$-DETR baseline with the ResNet-50 backbone. It also sets a new state-of-the-art performance, reaching $50.5\%$ AP in the 1$\times$ setting and $51.7\%$ AP in the 2$\times$ setting, surpassing several strong competitors. Our code is available at https://github.com/FelixCaae/AlignDETR

Citation

@inproceedings{Cai_2024_BMVC,
author    = {Zhi Cai and Songtao Liu and Guodong Wang and Zeming Li and Zheng Ge and Xiangyu Zhang and Di Huang},
title     = {Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0211.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection