Kernel Representation for Dynamic Networks


Yichen Zhou (Sea Group), Teck Khim Ng (National University of Singapore)
The 35th British Machine Vision Conference

Abstract

Dynamic convolution enhances model capacity by combining multiple kernels based on input features, offering significant improvements over traditional convolution in various vision tasks without substantially increasing computational complexity. However, it solely relies on current input features to generate kernels, overlooking the generation process in previous layers. This results in sub-optimal kernel generation that limits the representational power of dynamic networks. To address these issues, we propose a separate yet coupled network to learn layer-wise kernel representation. The kernel representation, along with the feature representation, can be easily used to generate kernels by a small network and is updated layer-by-layer based on the kernel representation from the previous layer and the new feature representation of the current layer. To further complement the learning network, the initial kernel representation begins with low-frequency image features, and the final output kernel representation is concatenated with the feature representation for classification. Extensive experimental results show that the proposed kernel representation improves the network capacity and brings noticeable accuracy boost for various backbone architectures, e.g. +2.5\%$\sim$3.9\% on ResNets, +3.8\%$\sim$6.0\% on MobileNets, +0.8\%$\sim$6.3\% on vision transformers and +1.8\% on PoolFormers.

Citation

@inproceedings{Zhou_2024_BMVC,
author    = {Yichen Zhou and Teck Khim Ng},
title     = {Kernel Representation for Dynamic Networks},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0417.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection