Multi-modal Crowd Counting via Modal Emulation


Chenhao Wang (Harbin Institute of Technology), Xiaopeng Hong (Harbin Institute of Technology), Zhiheng Ma (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Chinese Academy of Sciences), Yupeng Wei (Harbin Institute of Technology), Yabin Wang (Xi'an Jiaotong University), Xiaopeng Fan (Harbin Institute of Technology)
The 35th British Machine Vision Conference

Abstract

Multi-modal crowd counting is a crucial task that uses multi-modal cues to estimate the number of people in crowded scenes. To overcome the gap between different modalities, we propose a modal emulation-based two-pass multi-modal crowd-counting framework that enables efficient modal emulation, alignment, and fusion. The framework consists of two key components: a multi-modal inference pass and a cross-modal emulation pass. The former utilizes a hybrid cross-modal attention module to extract global and local information and achieve efficient multi-modal fusion. The latter uses attention prompting to coordinate different modalities and enhance multi-modal alignment. We also introduce a modality alignment module that uses an efficient modal consistency loss to align the outputs of the two passes and bridge the semantic gap between modalities. Extensive experiments on both RGB-Thermal and RGB-Depth counting datasets demonstrate its superior performance compared to previous methods.

Citation

@inproceedings{Wang_2024_BMVC,
author    = {Chenhao Wang and Xiaopeng Hong and Zhiheng Ma and Yupeng Wei and Yabin Wang and Xiaopeng Fan},
title     = {Multi-modal Crowd Counting via Modal Emulation},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0115.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection