Leveraging Inductive Bias in ViT for Medical Image Diagnosis


Jungmin Ha (Kookmin University), Euihyun-yoon (Kookmin University), Sungsik Kim (Kookmin University), Jinkyu Kim (Korea University), Jaekoo Lee (Kookmin University)
The 35th British Machine Vision Conference

Abstract

Recent advances in attention-based models have raised expectations for an automated diagnosis application in computer vision due to their high performance. However, attention-based models tend to lack some of the inherent assumptions for images, known as inductive biases, which convoultional-based models possess. Herein, we customize a vision transformer (ViT) model to enhance the performance with exploiting locality inductive biases for limited medical images. Specifically, using the ViT model as a backbone, we propose shift window attention (SWA), deformable attention (DA), and a convolutional block attention module (CBAM) to leverage the convolutional neural networks' inductive bias towards locality, thereby improving both global and local context of the lesion. To evaluate the effectiveness and efficiency of our proposed method, we use various publicly available well-known medical images diagnosis such as HAM10000, MURA, ISIC 2018 and CVC-Clinic DB for classification or dense prediction tasks. Experimental results show that our method significantly outperforms the other state-of-the-art alternatives. Furthermore, we utilize GradGAM++ to qualitatively visualize the image regions where the network attends to. Our code will be publicly available upon publication.

Citation

@inproceedings{Ha_2024_BMVC,
author    = {Jungmin Ha and Euihyun-yoon and Sungsik Kim and Jinkyu Kim and Jaekoo Lee},
title     = {Leveraging Inductive Bias in ViT for Medical Image Diagnosis},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0670.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection