Multi-Modal Information Bottleneck Attribution with Cross-Attention Guidance


Pauline Bourigault (Imperial College London), Emmanuelle Bourigault (University of Oxford), Danilo Mandic (Imperial College London)
The 35th British Machine Vision Conference

Abstract

For the advancement of interpretable machine learning, particularly in the intersection of vision and language, ensuring transparency and comprehensibility in model decisions is crucial. This work introduces a novel enhancement to the Multi-modal Information Bottleneck (M2IB) attribution method by integrating cross-attention mechanisms, termed Cross-Attention M2IB (CA-M2IB). This targets the core challenge of improving the interpretability of vision-language pretrained models, such as CLIP, by fostering more discerning and relevant latent representations. CA-M2IB filters and retains essential information across modalities, leveraging cross-attention to dynamically focus on pertinent visual and textual features for any given context. Through evaluations using CLIP as an example, CA-M2IB demonstrates improvements in attribution accuracy and interpretability over existing attribution methods, including gradient-based, perturbation-based, attention-based, and the information-theoretic M2IB methods. By providing a more nuanced understanding of model decisions, CA-M2IB contributes to offer a promising avenue for deploying vision-language models in critical domains such as healthcare.

Citation

@inproceedings{Bourigault_2024_BMVC,
author    = {Pauline Bourigault and Emmanuelle Bourigault and Danilo Mandic},
title     = {Multi-Modal Information Bottleneck Attribution with Cross-Attention Guidance},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0064.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection