PatchRot: Self-Supervised Training of Vision Transformers by Rotation Prediction


Sachin Chhabra (Arizona State University), Hemanth Venkateswara (Georgia State University), Baoxin Li (Arizona State University)
The 35th British Machine Vision Conference

Abstract

Vision transformers require a huge amount of labeled data to outperform convolutional neural networks. However, annotating such a large dataset is an expensive process. Self-supervised learning techniques alleviate this problem by enabling the learning of features similar to supervised learning in an unsupervised manner. In this paper, we propose PatchRot, a self-supervised technique tailored to leverage the inherent properties of Vision Transformers. PatchRot rotates images and image patches and trains the network to predict the rotation angles. Through this process, the network learns to extract both global image and patch-level features. Our extensive experiments on diverse datasets demonstrate that PatchRot training yields feature representations that outperform those obtained through supervised learning and baseline methods.

Citation

@inproceedings{Chhabra_2024_BMVC,
author    = {Sachin Chhabra and Hemanth Venkateswara and Baoxin Li},
title     = {PatchRot: Self-Supervised Training of Vision Transformers by Rotation Prediction},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0391.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection