Learning Object Placement via Convolution Scoring Attention


Yibin Wang (Fudan University), Yuchao Feng (Westlake University), Jianwei Zheng (Zhejiang University of Technology)
The 35th British Machine Vision Conference

Abstract

Object placement aims to determine credible locations and sizes for foreground objects within a given background. Despite its apparent simplicity, this emerging field grapples with pressing challenges, including the intractability of sufficiently capturing the multi-scale semantic interaction between the background and foreground, and the underutilization of valuable prior knowledge. In this paper, we present CSANet to alleviate these challenges. CSANet leverages effective Pyramid-pooling, introducing two critical components: Convolution Scoring-based Union-Attention (PCSUA) and Self-Attention (PCSSA). First, PCSUA, integrated into the generator, adeptly navigates latent feature interplays between objects and scenes at multiple scales, demonstrating enhanced efficiency. Specifically, it first leverages pyramid-pooling for simultaneous feature capture across varied receptive fields. Then, its CSUA module enhances attention mechanisms by eliminating the query, merging keys from both images, and utilizing convolutions as well as Hadamard products for attention score learning and feature extraction, respectively. This process efficiently facilitates the self and cross-feature interactions for more harmonious object placement. On the supervised trail, PCSSA skillfully captures priors from ground truth through multi-scale and high-level feature aggregation, steering the generator to discern more credible object locations and sizes. Extensive experiments demonstrate that CSANet excels in generating credible object placements efficiently without sacrificing diversity. Code is available at https://github.com/CodeGoat24/CSANet.

Citation

@inproceedings{Wang_2024_BMVC,
author    = {Yibin Wang and Yuchao Feng and Jianwei Zheng},
title     = {Learning Object Placement via Convolution Scoring Attention},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0165.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection