Beyond Static and Dynamic Quantization - Hybrid Quantization of Vision Transformers


Piotr Kluska (International Business Machines), Florian Scheidegger (International Business Machines), A. Cristiano I. Malossi (International Business Machines), Enrique S. Quintana-Orti (Universidad Politecnica de Valencia)
The 35th British Machine Vision Conference

Abstract

Vision Transformers excel in computer vision. However, deploying these models on edge devices is challenging due to their high memory and computational requirements. Furthermore, they suffer from outliers in the activation maps at inference. Dynamic quantization generally attains the best accuracy but adds quantization and dequantization overhead. While static quantization reduces model latency, it usually degrades accuracy. We propose a hybrid quantization technique that selects a linear layer for either static or dynamic quantization based on the signal-to-noise ratio using a floating-point model as a reference. Our method attains reduced latency and memory usage of static quantization while improving model accuracy with minimal compute overhead. For DeiT3-S/16/224, DeiT3-B/16/384, and DeiT3-L/16/224, we improve top-1 accuracy by up to 1%, 4%, and 11.5%, respectively, over static INT8 quantization. At the same time, we reach an average speedup of 1.87, 1.88, and 1.69 over dynamic INT8 quantization on the ImageNet1K dataset.

Citation

@inproceedings{Kluska_2024_BMVC,
author    = {Piotr Kluska and Florian Scheidegger and A. Cristiano I. Malossi and Enrique S. Quintana-Orti},
title     = {Beyond Static and Dynamic Quantization - Hybrid Quantization of Vision Transformers},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0568.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection