RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance


Avideep Mukherjee (Indian Institute of Technology Kanpur), Soumya Banerjee (IIT Kanpur, IIT Kanpur), Piyush Rai (IIT Kanpur, IIT Kanpur), Vinay P Namboodiri (University of Bath)
The 35th British Machine Vision Conference

Abstract

Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices. An approach based on block-wise generation holds considerable promise toward the goal of designing compact-sized (parameter-efficient) deep generative models since the model is responsible for generating only a block instead of the whole image at once. However, block-wise generation is also considerably challenging as it requires ensuring coherence across the generated blocks. We design a retrieval-augmented generation (RAG) approach and leverage the corresponding blocks of the images retrieved by the RAG module to condition the training and generation stages of a block-wise denoising diffusion model. Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation. While we showcase our approach using the latent diffusion model (LDM) as the base model, it can be used with other variants of denoising diffusion models. We validate the solution of the coherence problem through the proposed approach by reporting substantive experiments to demonstrate the effectiveness of our approach in terms of compact model size as well as excellent generation quality.

Citation

@inproceedings{Mukherjee_2024_BMVC,
author    = {Avideep Mukherjee and Soumya Banerjee and Piyush Rai and Vinay P Namboodiri},
title     = {RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0218.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection