LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps


Andrey Palaev (Innopolis University), Adil Khan (University of Hull), Syed M Ahsan Kazmi (University of the West of England, Bristol)
The 35th British Machine Vision Conference

Abstract

The advancement of text-to-image synthesis has introduced powerful generative models capable of creating realistic images from textual prompts. However, precise control over image attributes remains challenging, especially at the instance level. While existing methods offer some control through fine-tuning or auxiliary information, they often face limitations in flexibility and accuracy. To address these challenges, we propose a pipeline leveraging Large Language Models (LLMs), open-vocabulary detectors and cross-attention maps and intermediate activations of diffusion U-Net for instance-level image manipulation. Our method detects objects mentioned in the prompt and present in the generated image, enabling precise manipulation without extensive training or input masks. By incorporating cross-attention maps, our approach ensures coherence in manipulated images while controlling object positions. Our approach enables precise manipulations at the instance level without fine-tuning or auxiliary information such as masks or bounding boxes.

Citation

@inproceedings{Palaev_2024_BMVC,
author    = {Andrey Palaev and Adil Khan and Syed M Ahsan Kazmi},
title     = {LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0457.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection