Schedule Mon Tue Wed Thu

BMVC conference papers, supplementary material and video presentations can be found at: BMVC Papers

BMVC workshop papers can be found at: BMVC Workshop Papers

Keynote - Laura Sevilla
09:00 - 10:00
09:00 - 10:00 Title: Frontiers of Video Understanding

Abstract: Video Understanding is a fundamental skill of intelligent systems. From autonomous robots to virtual assistants, understanding the world in motion is necessary to be able to move and interact with it. The last few years have seen amazing improvements in Video Understanding research. Still there is a remarkable gap between the almost uncanny performance of models in other modalities such as language and still images, and the performance of video. In this talk I will discuss what I believe are the current barriers for video, including efficiency, a tricky relationship with language and finding the right tasks. For each of these topics I will discuss both my recent work on them, as well as what I believe are interesting directions that I hope can be inspiring for the community.

https://laurasevilla.me/

Room: M1
Poster Sessions
10:00 - 11:45 / 15:15 - 17:00
10:00 - 11:45
Papers Presented
16 Region-based Entropy Separation for One-shot Test-Time Adaptation Kodai Kawamura, Shunya Yamagami, Go Irie
28 SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters Shohei Tanaka, Hao Wang, Yoshitaka Ushiku
31 COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation Munish Monga, Sachin Kumar Giroh, Ankit Jha, Mainak Singha, Biplab Banerjee, Jocelyn Chanussot
32 Can CLIP help CLIP in learning 3D? Cristian Sbrolli, Matteo Matteucci
38 Linear Calibration Approach to Knowledge-free Group Robust Classification Ryota Ishizaki, Shunya Yamagami, Yuta Goto, Go Irie
70 Advancing Anomaly Detection: The IDW dataset and MC algorithm Alexander D. J. Taylor, Jonathan James Morrison, Phillip Tregidgo, Neill D. F. Campbell
104 MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression Sudip Das, Kaixin Xu, Nushrat Hussain, Ziyuan Zhao, Arindam Das, Weisi Lin, Ujjwal Bhattacharya
111 Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients Maximilian Krahn, Michele Sasdelli, Frances Fengyi Yang, Vladislav Golyanik, Juho Kannala, Tat-Jun Chin, Tolga Birdal
115 Multi-modal Crowd Counting via Modal Emulation Chenhao Wang, Xiaopeng Hong, Zhiheng Ma, Yupeng Wei, Yabin Wang, Xiaopeng Fan
135 Acoustic-based 3D human pose estimation robust to human position Yusuke Oumi, Yuto Shibata, Go Irie, Akisato Kimura, Yoshimitsu Aoki, Mariko Isogawa
137 InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, Jing-Wen Chen, Ulrich Neumann
216 Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning Masane Fuchi, Tomohiro Takagi
218 RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance Avideep Mukherjee, Soumya Banerjee, Piyush Rai, Vinay P. Namboodiri
228 Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction Ziyang Ren, Ping Wei, Haowen Tang, Huan Li, Jin Yang
257 Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery Jordan Lam
308 Effective Message Hiding with Order-Preserving Mechanisms Gao Yu, Xuchong QIU, Zihan Ye
329 Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training Ameera Bawazir, Kebin Wu, Wenbin LI
362 Into the Fog: Evaluating Robustness of Multiple Object Tracking Nadezda Kirillova, Muhammad Jehanzeb Mirza, Horst Bischof, Horst Possegger
369 Benchmarking and Optimizing Federated Learning with Hardware-related Metrics Kai Pan, Yapeng Tian, Yinhe Han, Yiming Gan
432 Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim
482 Unsupervised Hashing Network with Hyper Quantization Tree Sungeun Kim, Jongbin Ryu
534 Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning Francesco Girlanda, Olga V. Demler, bjoern menze, Neda Davoudi
546 Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs Linghong Yao, Denis Hadjivelichkov, Andromachi Maria Delfaki, Yuanchang Liu, Brooks Paige, Dimitrios Kanoulas
563 As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP Anjun Hu, Jindong Gu, Francesco Pinto, Konstantinos Kamnitsas, Philip Torr
566 SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models Xiangyu Chen, Jing Liu, Ye Wang, Pu Perry Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino
579 Neural Collapse Inspired Contrastive Continual Learning Antoine Montmaur, Nicolas Larue, Ngoc-Son Vu
611 STPose: 6D object pose estimation network based on sparse attention and cross-layer connection Shihao Chen, Xiaobing Li, Keduo Yan, Yong Li, Dongxu Gao
619 Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection Ching-Yi Lai, Chiou-ting Hsu, Chih-Chung Hsu, Chia-Wen Lin
627 GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt Yufei Gao, Bin Fu, Lei Shi, Chengming Liu, yucheng shi
659 GN-FR: Generalizable Neural Radinace Fields for Flare Removal Gopi Raju Matta, Rahul Siddartha, RONGALI SIMHACHALA VENKATA GIRISH, Sumit Sharma, Kaushik Mitra
678 Content and Style Aware Audio-Driven Facial Animation QINGJU LIU, Hyeongwoo Kim, Gaurav Bharaj
680 May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels Monica Millunzi, Lorenzo Bonicelli, Angelo Porrello, Jacopo Credi, Petter N. Kolm, Simone Calderara
681 On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Khan
689 AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning Jayateja Kalla, Soma Biswas
695 Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies Marcella Astrid, Enjie Ghorbel, Djamila Aouada
721 Sign Stitching: A Novel Approach to Sign Language Production Harry Walsh, Ben Saunders, Richard Bowden
731 AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance Pushpendu Ghosh, Aniket Joshi, Soumyajit Chowdhury, Promod Yenigalla
753 Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning Hoàng-Ân Lê, Paul Berg, Minh Tan Pham
882 A Multimodal Network on Handwritten Chinese Character Error Correction Haizhao Sun, Yu Ning, Xu Ji, Chuang Zhang, Ming Wu
885 Efficient Data Source Relevance Quantification for Multi-Source Neural Networks Jakob Gawlikowski, Nina Maria Gottschling
887 Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models Bin Fu, Qiyang Wan, Jialin Li, Ruiping Wang, Xilin Chen
103 Prompting Diffusion Representations for Cross-Domain Semantic Segmentation Rui Gong, Martin Danelljan, Han Sun, Julio Delgado Mangas, Nikolay Marin, Luc Van Gool
200 Towards Generative Class Prompt Learning for Fine-grained Visual Recognition Soumitri Chattopadhyay, Sanket Biswas, Emanuele Vivoli, Josep Llados
406 When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection Adam Goodge, Bryan Hooi, Wee Siong Ng
615 Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation Nathan Louis, Mahzad Khoshlessan, Jason J Corso
754 Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Room: Hall 2
15:15 - 17:00
Papers Presented
14 Efficiency-preserving Scene-adaptive Object Detection Zekun Zhang, Vu Quang Truong, Minh Hoai
114 Key-point Guided Deformable Image Manipulation Using Diffusion Model Seok-Hwan Oh, Guil Jung, Myeong-Gee Kim, Sang-yun Kim, Young-Min Kim, hyeonjik lee, Hyuksool Kwon, Hyeonmin Bae
416 Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance Pankhi Kashyap, Pavni Tandon, Sunny Gupta, Abhishek Tiwari, Ritwik Kulkarni, Kshitij Sharad Jadhav
517 Interpretable Long-term Action Quality Assessment Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert
545 Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection Christian Fruhwirth-Reisinger, Wei Lin, Dušan Malić, Horst Bischof, Horst Possegger
23 Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation Ban Chen, Xin Jin, LONG HAI WU, Jie Chen, Ilhyun Cho, Cheul-hee Hahm
34 TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation Jack Saunders, Vinay P. Namboodiri
47 Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances Haoting He, Yaochen Li, Yutong Wang, Gaojie Li, Wei Guo, Runlin Zou
74 ControlDreamer: Stylized 3D Generation with Multi-View ControlNet Yeongtak Oh, Jooyoung Choi, Yongsung Kim, Minjun Park, Chaehun Shin, Sungroh Yoon
102 Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes Donghao Zhou, Jialin Li, Jinpeng Li, Jiancheng Huang, Qiang Nie, Yong Liu, Bin-Bin Gao, Qiong Wang, Pheng-Ann Heng, Guangyong Chen
108 MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds Ziqiang Dang, Tianxing Fan, Boming Zhao, Xujie Shen, 王 磊, Guofeng Zhang, Zhaopeng Cui
140 Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space Junho Lee, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee
147 MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion Angel Villar-Corrales, Moritz Austermann, Sven Behnke
180 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Sai Tanmay Reddy Chakkera, Aggelina Chatziagapi, Dimitris Samaras
184 Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization Roisin Luo, Alexandru Drimbarean, James McDermott, Colm O'Riordan
211 Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss Zhi Cai, Songtao Liu, Guodong Wang, Zeming Li, Zheng Ge, Xiangyu Zhang, Di Huang
245 Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network Yamin Mao, Zhihua Liu, Weiming Li, SoonYong Cho, Qiang Wang, Xiaoshuai Hao
288 PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition Anandavardhan Hegde, Sudha Velusamy, Narayan Kothari, Aman Bahuguna, Apnesh Rawat, Hema Sathiamurthy, Ankit Raja
307 Discovering an Image-Adaptive Coordinate System for Photography Processing Ziteng Cui, Lin Gu, Tatsuya Harada
318 Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection Ying Zhang, Yuezun Li, Bo Peng, Jiaran Zhou, Huiyu Zhou, Junyu Dong
323 Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition Myeong-Yeon Yi, DongJae Lee, Naeun Ko, Yonghyun Jeong, Sang-goo Lee, Seunggyu Chang
335 SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning Hao Chen, Jiaze Wang, Ziyu Guo, Jinpeng Li, Donghao Zhou, Bian Wu, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng
352 DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation Raquel Vidaurre, Elena Garces, Dan Casas
384 Few-Shot Classification of Interactive Activities of Daily Living (InteractADL) Zane Durante, Robathan Harries, Edward Vendrow, Zelun Luo, Yuta Kyuragi, Kazuki Kozuka, Li Fei-Fei, Ehsan Adeli
388 ACIL: Active Class Incremental Learning for Image Classification Aditya Bhattacharya, Debanjan Goswami, Shayok Chakraborty
414 NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning Sree Rama Vamsidhar S, Gorthi Rama Krishna Sai Subrahmanyam
421 Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow Canran Li, Dongnan Liu, Weidong Cai
426 Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition Tuyen Tran, Thao Minh Le, Duy Hung Tran, Truyen Tran
433 Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning Dongyoung Kim, Jeong-Gun Lee, WonSook Lee
448 Learning to Project for Cross-Task Knowledge Distillation Dylan Auty, Roy Miles, Benedikt Kolbeinsson, Krystian Mikolajczyk
493 Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition Xinxu Lin, Mingxuan Liu, Kezhuo Liu, Hong Chen
499 MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang, Xiatian Zhu, Zehuan Yuan
505 FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging Mohammed Talha Alam, Raza Imam, Mohsen Guizani, Fakhri Karray
524 A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction Dragos Costea, Alina Marcu, Marius Leordeanu
537 Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework Liuyuan Wen
599 VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection Changkang Li, Yalong Jiang
657 G3FA: Geometry-guided GAN for Face Animation Alireza Javanmardi, Alain Pagani, Didier Strickerr
723 $ControlEdit: A MultiModal Local Clothing Image Editing Method$ Di Cheng, Yingjie Shi, sun shixin, JiaFu Zhang, weijing wang, YULiu
727 Optimising Diffusion Models for Histopathology Image Synthesis Victoria Porter, Richard Gault, Stephanie G Craig, Jacqueline James
746 Adapting MIMO video restoration networks to low latency constraints Valéry Dewil, Zhe Zheng, Arnaud Barral, Lara Raad, Nao Nicolas, Ioannis Cassagne, Jean-michel Morel, Gabriele Facciolo, Bruno Galerne, Pablo Arias
755 PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition Chenhongyi Yang, Zehui Chen, Miguel Espinosa, Linus Ericsson, Zhenyu Wang, Jiaming Liu, Elliot J. Crowley
790 FILS: Self-Supervised Video Feature Prediction In Semantic Language Space Mona Ahmadian, Frank Guerin, Andrew Gilbertn
900 Direct-Sum Approach to Integrate Losses Via Classifier Subspace Takumi Kobayashi
911 A simple Color Correction Matrix for RAW Reconstruction Anqi Liu, Shiyi Mu, Shugong Xu
Room: Hall 2
Doctoral Consortium
10:00 - 13:00
Chair: Richard Menzies and George Killick 10:00 - 10:15 Fatemeh Amerehi Toward Comprehensive Neural Network Robustness
10:15 - 10:30 Zahra Babaiee Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels
10:30 - 10:45 Jack Saunders Style and Speech in Facial Animation
10:45 - 11:00 Muhammad Akhtar Munir Exploring Advanced Calibration Loss Techniques for Vision-Language Models
11:00 - 11:15 Break Break
11:15 - 11:30 Filippos Gouidis Recognizing object states by combining data-driven and symbolic methods
11:30 - 11:45 Remco Royen Addressing labelling, complexity, latency, and scalability in deep learning-based processing of point clouds
11:45 - 12:15 Speaker: Md. Mostafa Kamal Sarker (Technovative Solutions LTD) Dr Sarker is the Lead AI Research Scientist at Technovative Solutions LTD (TVS) and a Visiting Fellow at the University of Oxford. He's an expert in artificial intelligence, computer vision, and deep learning. His research has significantly impacted clinical AI, biomedical image analysis, and digital healthcare, evident in his 40+ peer-reviewed publications. At BMVC2024, he'll share his valuable insights and guide aspiring researchers on transitioning from academia to industry and discuss the exciting opportunities this path offers.
12:15 - 13:00 Mentor Session
Room: M2
Workshop Sessions
09:00 - 18:00
09:00 - 18:00 Robust Recognition in the Open World

https://rrow2024.github.io
Room: M3
14:00 - 18:00 DIFA: Deep Learning-based Image Fusion and Its Applications

https://difa2024.github.io
Room: M2
Oral Session - Machine Vision in Challenging Scenarios
11:45 - 13:00
Chair: Amey Pore 11:45 103
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation
Rui Gong, Martin Danelljan, Han Sun, Julio Delgado Mangas, Nikolay Marin, Luc Van Gool
12:00 200
Towards Generative Class Prompt Learning for Fine-grained Visual Recognition
Soumitri Chattopadhyay, Sanket Biswas, Emanuele Vivoli, Josep Llados
12:15 406
When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
Adam Goodge, Bryan Hooi, Wee Siong Ng
12:30 615
Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
Nathan Louis, Mahzad Khoshlessan, Jason J Corso
12:45 754
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Room: M1
Oral Session - Image Quality Algorithms
14:00 - 15:15
Chair: Jefersson A. dos Santos 14:00 14
Efficiency-preserving Scene-adaptive Object Detection
Zekun Zhang, Vu Quang Truong, Minh Hoai
14:15 114
Key-point Guided Deformable Image Manipulation Using Diffusion Model
Seok-Hwan Oh, Guil Jung, Myeong-Gee Kim, Sang-yun Kim, Young-Min Kim, hyeonjik lee, Hyuksool Kwon, Hyeonmin Bae
14:30 416
Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance
Pankhi Kashyap, Pavni Tandon, Sunny Gupta, Abhishek Tiwari, Ritwik Kulkarni, Kshitij Sharad Jadhav
14:45 517
Interpretable Long-term Action Quality Assessment
Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert
15:00 545
Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection
Christian Fruhwirth-Reisinger, Wei Lin, Dušan Malić, Horst Bischof, Horst Possegger
Room: M1

sponsors-logos