The 35th British Machine Vision Conference 2024: Wednesday, 27th November

BMVC conference papers, supplementary material and video presentations can be found at: BMVC Papers

BMVC workshop papers can be found at: BMVC Workshop Papers

Keynote - Laura Sevilla

09:00 - 10:00

09:00 - 10:00	Title: Frontiers of Video Understanding Abstract: Video Understanding is a fundamental skill of intelligent systems. From autonomous robots to virtual assistants, understanding the world in motion is necessary to be able to move and interact with it. The last few years have seen amazing improvements in Video Understanding research. Still there is a remarkable gap between the almost uncanny performance of models in other modalities such as language and still images, and the performance of video. In this talk I will discuss what I believe are the current barriers for video, including efficiency, a tricky relationship with language and finding the right tasks. For each of these topics I will discuss both my recent work on them, as well as what I believe are interesting directions that I hope can be inspiring for the community. https://laurasevilla.me/ Room: M1

Title: Frontiers of Video Understanding

Abstract: Video Understanding is a fundamental skill of intelligent systems. From autonomous robots to virtual assistants, understanding the world in motion is necessary to be able to move and interact with it. The last few years have seen amazing improvements in Video Understanding research. Still there is a remarkable gap between the almost uncanny performance of models in other modalities such as language and still images, and the performance of video. In this talk I will discuss what I believe are the current barriers for video, including efficiency, a tricky relationship with language and finding the right tasks. For each of these topics I will discuss both my recent work on them, as well as what I believe are interesting directions that I hope can be inspiring for the community.

https://laurasevilla.me/

Room: M1

Poster Sessions

10:00 - 11:45 / 15:15 - 17:00

10:00 - 11:45

Papers Presented

16	Region-based Entropy Separation for One-shot Test-Time Adaptation	Kodai Kawamura, Shunya Yamagami, Go Irie
28	SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters	Shohei Tanaka, Hao Wang, Yoshitaka Ushiku
31	COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation	Munish Monga, Sachin Kumar Giroh, Ankit Jha, Mainak Singha, Biplab Banerjee, Jocelyn Chanussot
32	Can CLIP help CLIP in learning 3D?	Cristian Sbrolli, Matteo Matteucci
38	Linear Calibration Approach to Knowledge-free Group Robust Classification	Ryota Ishizaki, Shunya Yamagami, Yuta Goto, Go Irie
70	Advancing Anomaly Detection: The IDW dataset and MC algorithm	Alexander D. J. Taylor, Jonathan James Morrison, Phillip Tregidgo, Neill D. F. Campbell
104	MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression	Sudip Das, Kaixin Xu, Nushrat Hussain, Ziyuan Zhao, Arindam Das, Weisi Lin, Ujjwal Bhattacharya
111	Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients	Maximilian Krahn, Michele Sasdelli, Frances Fengyi Yang, Vladislav Golyanik, Juho Kannala, Tat-Jun Chin, Tolga Birdal
115	Multi-modal Crowd Counting via Modal Emulation	Chenhao Wang, Xiaopeng Hong, Zhiheng Ma, Yupeng Wei, Yabin Wang, Xiaopeng Fan
135	Acoustic-based 3D human pose estimation robust to human position	Yusuke Oumi, Yuto Shibata, Go Irie, Akisato Kimura, Yoshimitsu Aoki, Mariko Isogawa
137	InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth	Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, Jing-Wen Chen, Ulrich Neumann
216	Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning	Masane Fuchi, Tomohiro Takagi
218	RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance	Avideep Mukherjee, Soumya Banerjee, Piyush Rai, Vinay P. Namboodiri
228	Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction	Ziyang Ren, Ping Wei, Haowen Tang, Huan Li, Jin Yang
257	Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery	Jordan Lam
308	Effective Message Hiding with Order-Preserving Mechanisms	Gao Yu, Xuchong QIU, Zihan Ye
329	Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training	Ameera Bawazir, Kebin Wu, Wenbin LI
362	Into the Fog: Evaluating Robustness of Multiple Object Tracking	Nadezda Kirillova, Muhammad Jehanzeb Mirza, Horst Bischof, Horst Possegger
369	Benchmarking and Optimizing Federated Learning with Hardware-related Metrics	Kai Pan, Yapeng Tian, Yinhe Han, Yiming Gan
432	Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution	Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim
482	Unsupervised Hashing Network with Hyper Quantization Tree	Sungeun Kim, Jongbin Ryu
534	Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning	Francesco Girlanda, Olga V. Demler, bjoern menze, Neda Davoudi
546	Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs	Linghong Yao, Denis Hadjivelichkov, Andromachi Maria Delfaki, Yuanchang Liu, Brooks Paige, Dimitrios Kanoulas
563	As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP	Anjun Hu, Jindong Gu, Francesco Pinto, Konstantinos Kamnitsas, Philip Torr
566	SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models	Xiangyu Chen, Jing Liu, Ye Wang, Pu Perry Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino
579	Neural Collapse Inspired Contrastive Continual Learning	Antoine Montmaur, Nicolas Larue, Ngoc-Son Vu
611	STPose: 6D object pose estimation network based on sparse attention and cross-layer connection	Shihao Chen, Xiaobing Li, Keduo Yan, Yong Li, Dongxu Gao
619	Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection	Ching-Yi Lai, Chiou-ting Hsu, Chih-Chung Hsu, Chia-Wen Lin
627	GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt	Yufei Gao, Bin Fu, Lei Shi, Chengming Liu, yucheng shi
659	GN-FR: Generalizable Neural Radinace Fields for Flare Removal	Gopi Raju Matta, Rahul Siddartha, RONGALI SIMHACHALA VENKATA GIRISH, Sumit Sharma, Kaushik Mitra
678	Content and Style Aware Audio-Driven Facial Animation	QINGJU LIU, Hyeongwoo Kim, Gaurav Bharaj
680	May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels	Monica Millunzi, Lorenzo Bonicelli, Angelo Porrello, Jacopo Credi, Petter N. Kolm, Simone Calderara
681	On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models	Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Khan
689	AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning	Jayateja Kalla, Soma Biswas
695	Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies	Marcella Astrid, Enjie Ghorbel, Djamila Aouada
721	Sign Stitching: A Novel Approach to Sign Language Production	Harry Walsh, Ben Saunders, Richard Bowden
731	AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance	Pushpendu Ghosh, Aniket Joshi, Soumyajit Chowdhury, Promod Yenigalla
753	Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning	Hoàng-Ân Lê, Paul Berg, Minh Tan Pham
882	A Multimodal Network on Handwritten Chinese Character Error Correction	Haizhao Sun, Yu Ning, Xu Ji, Chuang Zhang, Ming Wu
885	Efficient Data Source Relevance Quantification for Multi-Source Neural Networks	Jakob Gawlikowski, Nina Maria Gottschling
887	Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models	Bin Fu, Qiyang Wan, Jialin Li, Ruiping Wang, Xilin Chen
103	Prompting Diffusion Representations for Cross-Domain Semantic Segmentation	Rui Gong, Martin Danelljan, Han Sun, Julio Delgado Mangas, Nikolay Marin, Luc Van Gool
200	Towards Generative Class Prompt Learning for Fine-grained Visual Recognition	Soumitri Chattopadhyay, Sanket Biswas, Emanuele Vivoli, Josep Llados
406	When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection	Adam Goodge, Bryan Hooi, Wee Siong Ng
615	Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation	Nathan Louis, Mahzad Khoshlessan, Jason J Corso
754	Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization	Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Room: Hall 2

15:15 - 17:00

Papers Presented

14	Efficiency-preserving Scene-adaptive Object Detection	Zekun Zhang, Vu Quang Truong, Minh Hoai
114	Key-point Guided Deformable Image Manipulation Using Diffusion Model	Seok-Hwan Oh, Guil Jung, Myeong-Gee Kim, Sang-yun Kim, Young-Min Kim, hyeonjik lee, Hyuksool Kwon, Hyeonmin Bae
416	Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance	Pankhi Kashyap, Pavni Tandon, Sunny Gupta, Abhishek Tiwari, Ritwik Kulkarni, Kshitij Sharad Jadhav
517	Interpretable Long-term Action Quality Assessment	Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert
545	Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection	Christian Fruhwirth-Reisinger, Wei Lin, Dušan Malić, Horst Bischof, Horst Possegger
23	Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation	Ban Chen, Xin Jin, LONG HAI WU, Jie Chen, Ilhyun Cho, Cheul-hee Hahm
34	TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation	Jack Saunders, Vinay P. Namboodiri
47	Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances	Haoting He, Yaochen Li, Yutong Wang, Gaojie Li, Wei Guo, Runlin Zou
74	ControlDreamer: Stylized 3D Generation with Multi-View ControlNet	Yeongtak Oh, Jooyoung Choi, Yongsung Kim, Minjun Park, Chaehun Shin, Sungroh Yoon
102	Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes	Donghao Zhou, Jialin Li, Jinpeng Li, Jiancheng Huang, Qiang Nie, Yong Liu, Bin-Bin Gao, Qiong Wang, Pheng-Ann Heng, Guangyong Chen
108	MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds	Ziqiang Dang, Tianxing Fan, Boming Zhao, Xujie Shen, 王磊, Guofeng Zhang, Zhaopeng Cui
140	Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space	Junho Lee, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee
147	MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion	Angel Villar-Corrales, Moritz Austermann, Sven Behnke
180	JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation	Sai Tanmay Reddy Chakkera, Aggelina Chatziagapi, Dimitris Samaras
184	Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization	Roisin Luo, Alexandru Drimbarean, James McDermott, Colm O'Riordan
211	Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss	Zhi Cai, Songtao Liu, Guodong Wang, Zeming Li, Zheng Ge, Xiangyu Zhang, Di Huang
245	Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network	Yamin Mao, Zhihua Liu, Weiming Li, SoonYong Cho, Qiang Wang, Xiaoshuai Hao
288	PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition	Anandavardhan Hegde, Sudha Velusamy, Narayan Kothari, Aman Bahuguna, Apnesh Rawat, Hema Sathiamurthy, Ankit Raja
307	Discovering an Image-Adaptive Coordinate System for Photography Processing	Ziteng Cui, Lin Gu, Tatsuya Harada
318	Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection	Ying Zhang, Yuezun Li, Bo Peng, Jiaran Zhou, Huiyu Zhou, Junyu Dong
323	Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition	Myeong-Yeon Yi, DongJae Lee, Naeun Ko, Yonghyun Jeong, Sang-goo Lee, Seunggyu Chang
335	SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning	Hao Chen, Jiaze Wang, Ziyu Guo, Jinpeng Li, Donghao Zhou, Bian Wu, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng
352	DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation	Raquel Vidaurre, Elena Garces, Dan Casas
384	Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)	Zane Durante, Robathan Harries, Edward Vendrow, Zelun Luo, Yuta Kyuragi, Kazuki Kozuka, Li Fei-Fei, Ehsan Adeli
388	ACIL: Active Class Incremental Learning for Image Classification	Aditya Bhattacharya, Debanjan Goswami, Shayok Chakraborty
414	NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning	Sree Rama Vamsidhar S, Gorthi Rama Krishna Sai Subrahmanyam
421	Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow	Canran Li, Dongnan Liu, Weidong Cai
426	Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition	Tuyen Tran, Thao Minh Le, Duy Hung Tran, Truyen Tran
433	Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning	Dongyoung Kim, Jeong-Gun Lee, WonSook Lee
448	Learning to Project for Cross-Task Knowledge Distillation	Dylan Auty, Roy Miles, Benedikt Kolbeinsson, Krystian Mikolajczyk
493	Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition	Xinxu Lin, Mingxuan Liu, Kezhuo Liu, Hong Chen
499	MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders	Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang, Xiatian Zhu, Zehuan Yuan
505	FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging	Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray
524	A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction	Dragos Costea, Alina Marcu, Marius Leordeanu
537	Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework	Liuyuan Wen
599	VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection	Changkang Li, Yalong Jiang
657	G3FA: Geometry-guided GAN for Face Animation	Alireza Javanmardi, Alain Pagani, Didier Strickerr
723	$ControlEdit: A MultiModal Local Clothing Image Editing Method$	Di Cheng, Yingjie Shi, sun shixin, JiaFu Zhang, weijing wang, YULiu
727	Optimising Diffusion Models for Histopathology Image Synthesis	Victoria Porter, Richard Gault, Stephanie G Craig, Jacqueline James
746	Adapting MIMO video restoration networks to low latency constraints	Valéry Dewil, Zhe Zheng, Arnaud Barral, Lara Raad, Nao Nicolas, Ioannis Cassagne, Jean-michel Morel, Gabriele Facciolo, Bruno Galerne, Pablo Arias
755	PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition	Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, Elliot J. Crowley
790	FILS: Self-Supervised Video Feature Prediction In Semantic Language Space	Mona Ahmadian, Frank Guerin, Andrew Gilbertn
900	Direct-Sum Approach to Integrate Losses Via Classifier Subspace	Takumi Kobayashi
911	A simple Color Correction Matrix for RAW Reconstruction	Anqi Liu, Shiyi Mu, Shugong Xu

Room: Hall 2

Doctoral Consortium

10:00 - 13:00

Chair: Richard Menzies and George Killick	10:00 - 10:15	Fatemeh Amerehi	Toward Comprehensive Neural Network Robustness
	10:15 - 10:30	Zahra Babaiee	Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels
	10:30 - 10:45	Jack Saunders	Style and Speech in Facial Animation
	10:45 - 11:00	Muhammad Akhtar Munir	Exploring Advanced Calibration Loss Techniques for Vision-Language Models
	11:00 - 11:15	Break	Break
	11:15 - 11:30	Filippos Gouidis	Recognizing object states by combining data-driven and symbolic methods
	11:30 - 11:45	Remco Royen	Addressing labelling, complexity, latency, and scalability in deep learning-based processing of point clouds
	11:45 - 12:15	Speaker: Md. Mostafa Kamal Sarker (Technovative Solutions LTD)	Dr Sarker is the Lead AI Research Scientist at Technovative Solutions LTD (TVS) and a Visiting Fellow at the University of Oxford. He's an expert in artificial intelligence, computer vision, and deep learning. His research has significantly impacted clinical AI, biomedical image analysis, and digital healthcare, evident in his 40+ peer-reviewed publications. At BMVC2024, he'll share his valuable insights and guide aspiring researchers on transitioning from academia to industry and discuss the exciting opportunities this path offers.
	12:15 - 13:00	Mentor Session
	Room: M2

Workshop Sessions

09:00 - 18:00

09:00 - 18:00	Robust Recognition in the Open World https://rrow2024.github.io Room: M3
14:00 - 18:00	DIFA: Deep Learning-based Image Fusion and Its Applications https://difa2024.github.io Room: M2

Oral Session - Machine Vision in Challenging Scenarios

11:45 - 13:00

Chair: Amey Pore	11:45	103	Prompting Diffusion Representations for Cross-Domain Semantic Segmentation Rui Gong, Martin Danelljan, Han Sun, Julio Delgado Mangas, Nikolay Marin, Luc Van Gool
	12:00	200	Towards Generative Class Prompt Learning for Fine-grained Visual Recognition Soumitri Chattopadhyay, Sanket Biswas, Emanuele Vivoli, Josep Llados
	12:15	406	When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection Adam Goodge, Bryan Hooi, Wee Siong Ng
	12:30	615	Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation Nathan Louis, Mahzad Khoshlessan, Jason J Corso
	12:45	754	Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
	Room: M1

Oral Session - Image Quality Algorithms

14:00 - 15:15

Chair: Jefersson A. dos Santos	14:00	14	Efficiency-preserving Scene-adaptive Object Detection Zekun Zhang, Vu Quang Truong, Minh Hoai
	14:15	114	Key-point Guided Deformable Image Manipulation Using Diffusion Model Seok-Hwan Oh, Guil Jung, Myeong-Gee Kim, Sang-yun Kim, Young-Min Kim, hyeonjik lee, Hyuksool Kwon, Hyeonmin Bae
	14:30	416	Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance Pankhi Kashyap, Pavni Tandon, Sunny Gupta, Abhishek Tiwari, Ritwik Kulkarni, Kshitij Sharad Jadhav
	14:45	517	Interpretable Long-term Action Quality Assessment Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert
	15:00	545	Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection Christian Fruhwirth-Reisinger, Wei Lin, Dušan Malić, Horst Bischof, Horst Possegger
	Room: M1