Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation


Cheng Chen (University of Electronic Science and Technology of China), Jiang Liu (Southwest Jiaotong University), Liaoyuan Zeng (University of Electronic Science and Technology of China), Fang Duan (University of Bath), Sean McGrath (University of Limerick), Tian Dan (University of Electronic Science and Technology of China)
The 35th British Machine Vision Conference

Abstract

In 3D human pose estimation,the effective use of temporal and spatial information is key. Transformers have shown considerable potential in this field. However, existing models often utilize basic temporal position embedding, which restricts their ability to fully leverage temporal information. Additionally, while human body information like bone lengths are known in some cases, current networks do not incorporate this prior information, leading to limitations in estimation accuracy. To address these issues, we propose a transformer-based network for 3D human pose estimation that uses cross-attention with Rotary Position Embedding (RoPE). This network integrates RoPE with windows mechanism, allowing for flexible inference across varying sequence lengths while maintaining strong relative position awareness. Furthermore, we introduce bone length prior input to the network, and a cross-attention to integrate bone constraints into 3D pose estimation. Experimentally, our approach demonstrates that the inclusion of bone length information and longer sequences significantly reduces estimation errors, while improving the continuity of pose sequences. Notably, the performance surpasses state-of-the-art methods, showcasing the benefits of incorporating bone priors and advanced position embedding into 3D human pose estimation.

Citation

@inproceedings{Chen_2024_BMVC,
author    = {Cheng Chen and Jiang Liu and Liaoyuan Zeng and Fang Duan and Sean McGrath and Tian Dan},
title     = {Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0692.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection