STPose: 6D object pose estimation network based on sparse attention and cross-layer connection


Shihao Chen (Wuhan University), Xiaobing Li (Guangxi University), Keduo Yan (Guangxi University), Yong Li (Guangxi University), Dongxu Gao (University of Portsmouth)
The 35th British Machine Vision Conference

Abstract

The 6D object position estimation technique provides accurate and rich coordinate information for robots to grasp target objects, while implementing the algorithms of this technique in industry often requires consideration of smaller cost loss. In this paper, we propose STPose, a transformer-based position estimation network using only RGB images as input. Our network is based on PoET and proposes to reduce the computational parameters of the model with convergence efficiency by introducing a sparse attention method and an encoder cross-layer connection method. We also propose a system that enables easy and automatic implementation of labeled position estimation datasets, since no research has been done to apply this technique to the power environment. Using this system, we produce a position estimation dataset, the RCV dataset, targeting power device tools.STPose provides the best results among the currently studied algorithms on the RCV dataset and outperforms PoET (RGB-input-only Sota method) by 2.4% on the difficult YCB-V dataset. We also conduct an experimental analysis of the RCV dataset's features and difficulties. The project is available for public use at https://github.com/Agatha7k/STPose.

Citation

@inproceedings{Chen_2024_BMVC,
author    = {Shihao Chen and Xiaobing Li and Keduo Yan and Yong Li and Dongxu Gao},
title     = {STPose: 6D object pose estimation network based on sparse attention and cross-layer connection},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0611.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection