Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures


Kuluhan Binici (National University of Singapore), Weiming Wu (National University of Singapore), Tulika Mitra (National University of Singapore)
The 35th British Machine Vision Conference

Abstract

Knowledge distillation (KD) is a model compression method that entails training a compact student model to emulate the performance of a more complex teacher model. However, the architectural capacity gap between the two models limits the effectiveness of knowledge transfer. Addressing this issue, previous works focused on customizing teacher-student pairs to improve compatibility, a computationally expensive process that needs to be repeated every time either model changes. Hence, these methods are impractical when a teacher model has to be compressed into different student models for deployment on multiple hardware devices with distinct resource constraints. In this work, we propose {\em Generic Teacher Network (GTN)}, a one-off KD-aware training to create a generic teacher capable of effectively transferring knowledge to any student model sampled from a given finite pool of architectures. To this end, we represent the student pool as a weight-sharing supernet and condition our generic teacher to align with the capacities of various student architectures sampled from this supernet. Experimental evaluation shows that our method both improves overall KD effectiveness and amortizes the minimal additional training cost of the generic teacher across students in the pool.

Citation

@inproceedings{Binici_2024_BMVC,
author    = {Kuluhan Binici and Weiming Wu and Tulika Mitra},
title     = {Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0297.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection