Schedule Mon Tue Wed Thu
Keynote - Laura Sevilla
09:00 - 10:00
09:00 - 10:00 Title: Frontiers of Video Understanding

Abstract: Video Understanding is a fundamental skill of intelligent systems. From autonomous robots to virtual assistants, understanding the world in motion is necessary to be able to move and interact with it. The last few years have seen amazing improvements in Video Understanding research. Still there is a remarkable gap between the almost uncanny performance of models in other modalities such as language and still images, and the performance of video. In this talk I will discuss what I believe are the current barriers for video, including efficiency, a tricky relationship with language and finding the right tasks. For each of these topics I will discuss both my recent work on them, as well as what I believe are interesting directions that I hope can be inspiring for the community.

https://laurasevilla.me/

Room: M1
Poster Sessions
10:00 - 11:45 / 15:15 - 17:00
10:00 - 11:45
Papers Presented
28 SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters Hao Wang, Shohei Tanaka, Yoshitaka Ushiku
31 COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation Ankit Jha, Biplab Banerjee, Jocelyn Chanussot, Mainak Singha, Munish Monga, Sachin Kumar Giroh
32 Can CLIP help CLIP in learning 3D? Cristian Sbrolli, Matteo Matteucci
70 Advancing Anomaly Detection: The IDW dataset and MC algorithm Jonathan James Morrison, Phillip Tregidgo, Alexander D. J. Taylor, Neill D. F. Campbell
104 MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression Arindam Das, Kaixin Xu, Nushrat Hussain, Sudip Das, Ujjwal Bhattacharya, Weisi Lin, Ziyuan Zhao
111 Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients Frances Fengyi Yang, Juho Kannala, Maximilian Krahn, Michele Sasdelli, Tolga Birdal, Vladislav Golyanik, Tat-Jun Chin
115 Multi-modal Crowd Counting via Modal Emulation Chenhao Wang, Xiaopeng Fan, Xiaopeng Hong, Yabin Wang, Yupeng Wei, Zhiheng Ma
135 Acoustic-based 3D human pose estimation robust to human position Akisato Kimura, Go Irie, Mariko Isogawa, Yoshimitsu Aoki, Yusuke Oumi, Yuto Shibata
137 InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth Chin-Cheng Hsu, Cho-Ying Wu, Jing-Wen Chen, Quankai Gao, Te-Lin Wu, Ulrich Neumann
216 Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning Masane Fuchi, Tomohiro Takagi
218 RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance Avideep Mukherjee, Piyush Rai, Soumya Banerjee, Vinay P Namboodiri
228 Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction Haowen Tang, Huan Li, Jin Yang, Ping Wei, Ziyang Ren
257 Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery Jordan Lam
308 Effective Message Hiding with Order-Preserving Mechanisms Gao Yu, Xuchong QIU, Zihan Ye
329 Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training Kebin Wu, Wenbin LI, Ameera Ali Bawazir
362 Into the Fog: Evaluating Robustness of Multiple Object Tracking Horst Bischof, Horst Possegger, Muhammad Jehanzeb Mirza, Nadezda Kirillova
369 Benchmarking and Optimizing Federated Learning with Hardware-related Metrics Kai Pan, Yapeng Tian, Yiming Gan, Yinhe Han
432 Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution Dao Duy Hung, Dinh Phu Tran, Daeyoung Kim
482 Unsupervised Hashing Network with Hyper Quantization Tree Jongbin Ryu, Sungeun Kim
534 Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning Francesco Girlanda, Neda Davoudi, Olga V. Demler, bjoern menze
546 Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs Andromachi Maria Delfaki, Brooks Paige, Denis Hadjivelichkov, Dimitrios Kanoulas, Linghong Yao, Yuanchang Liu
563 As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP Anjun Hu, Francesco Pinto, Jindong Gu, Konstantinos Kamnitsas, Philip Torr
566 SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models Guanghui Wang, Jing Liu, Matthew Brand, Toshiaki Koike-Akino, Xiangyu Chen, Ye Wang, Pu Perry Wang
579 Neural Collapse Inspired Contrastive Continual Learning Antoine Montmaur, Ngoc-Son Vu, Nicolas Larue
611 STPose: 6D object pose estimation network based on sparse attention and cross-layer connection Dongxu Gao, Keduo Yan, Shihao Chen, Xiaobing Li, Yong Li
619 Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection Chia-Wen Lin, Chih-Chung Hsu, Ching-Yi Lai, Chiou-ting Hsu
627 GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt Bin Fu, Chengming Liu, Lei Shi, Yufei Gao, yucheng shi
659 GN-FR: Generalizable Neural Radinace Fields for Flare Removal Gopi Raju Matta, Kaushik Mitra, RONGALI SIMHACHALA VENKATA GIRISH, Sumit Sharma, Rahul Siddartha
678 Content and Style Aware Audio-Driven Facial Animation Gaurav Bharaj, Hyeongwoo Kim, QINGJU LIU
680 May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels Angelo Porrello, Jacopo Credi, Lorenzo Bonicelli, Monica Millunzi, Petter N. Kolm, Simone Calderara
681 On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models Asif Hanif, Fahad Shahbaz Khan, Hashmat Shadab Malik, Mohammad Yaqub, Muzammal Naseer, Numan Saeed, Salman Khan
689 AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning Jayateja Kalla, Soma Biswas
695 Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies Djamila Aouada, Enjie Ghorbel, Marcella Astrid
721 Sign Stitching: A Novel Approach to Sign Language Production Ben Saunders, Harry Walsh, Richard Bowden
731 AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance Aniket Joshi, Promod Yenigalla, Pushpendu Ghosh, Soumyajit Chowdhury
753 Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning Hoàng-Ân Lê, Paul Berg, Minh Tan Pham
882 A Multimodal Network on Handwritten Chinese Character Error Correction Chuang Zhang, Haizhao Sun, Ming Wu, Yu Ning, jixv
885 Efficient Data Source Relevance Quantification for Multi-Source Neural Networks Jakob Gawlikowski, Nina Maria Gottschling
887 Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models Bin Fu, Jialin Li, Qiyang Wan, Ruiping Wang, Xilin Chen
103 Prompting Diffusion Representations for Cross-Domain Semantic Segmentation Han Sun, Julio Delgado Mangas, Luc Van Gool, Martin Danelljan, Nikolay Marin, Rui Gong
200 Towards Generative Class Prompt Learning for Fine-grained Visual Recognition Emanuele Vivoli, Josep Llados, Sanket Biswas, Soumitri Chattopadhyay
406 When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection Adam Goodge, Bryan Hooi, Wee Siong Ng
615 Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation Jason J Corso, Mahzad Khoshlessan, Nathan Louis
754 Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Davide Caffagni, Lorenzo Baraldi, Marcella Cornia, Nicholas Moratelli, Rita Cucchiara
Room: Hall 2
15:15 - 17:00
Papers Presented
14 Efficiency-preserving Scene-adaptive Object Detection Vu Quang Truong, Zekun Zhang, Minh Hoai
114 Key-point Guided Deformable Image Manipulation Using Diffusion Model Guil Jung, Hyeonmin Bae, Hyuksool Kwon, Myeong-Gee Kim, Seok-Hwan Oh, Young-Min Kim, hyeonjik lee, Sang-yun Kim
416 Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance Abhishek Tiwari, Kshitij Sharad Jadhav, Pankhi Kashyap, Pavni Tandon, Ritwik Kulkarni, Sunny Gupta
517 Interpretable Long-term Action Quality Assessment Andrew Gilbert, Anthony Adeyemi-Ejeye, Wanqing Li, Xinran Liu, Xu Dong
545 Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection Christian Fruhwirth-Reisinger, Wei Lin, Dušan Malić, Horst Bischof, Horst Possegger
23 Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation Ban Chen, Cheul-hee Hahm, Ilhyun Cho, Jie Chen, LONG HAI WU, Xin Jin
34 TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation Jack Saunders, Vinay P Namboodiri
47 Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances Gaojie Li, Haoting He, Runlin Zou, Wei Guo, Yaochen Li, Yutong Wang
74 ControlDreamer: Stylized 3D Generation with Multi-View ControlNet Chaehun Shin, Jooyoung Choi, Minjun Park, Sungroh Yoon, Yeongtak Oh, Yongsung Kim
102 Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes Bin-Bin Gao, Donghao Zhou, Guangyong Chen, Jialin Li, Jiancheng Huang, Jinpeng Li, Pheng-Ann Heng, Qiang Nie, Qiong Wang, Yong Liu
108 MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds Boming Zhao, Guofeng Zhang, Tianxing Fan, Zhaopeng Cui, Ziqiang Dang, 王 磊, Xujie Shen
140 Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space Jeongwoo Shin, Joonseok Lee, Junho Lee, Seongsu Ha, Seung Woo Ko
147 MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion Angel Villar-Corrales, Moritz Austermann, Sven Behnke
180 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Aggelina Chatziagapi, Dimitris Samaras, Sai Tanmay Reddy Chakkera
184 Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization Alexandru Drimbarean, Colm O'Riordan, James McDermott, Roisin Luo
211 Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss Di Huang, Guodong Wang, Songtao Liu, Xiangyu Zhang, Zeming Li, Zheng Ge, Zhi Cai
245 Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network Qiang Wang, SoonYong Cho, Weiming Li, Xiaoshuai Hao, Yamin Mao, Zhihua Liu
288 PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition Aman Bahuguna, Anandavardhan Hegde, Ankit Raja, Apnesh Rawat, Hema Sathiamurthy, Narayan Kothari, Sudha Velusamy
307 Discovering an Image-Adaptive Coordinate System for Photography Processing Lin Gu, Tatsuya Harada, Ziteng Cui
318 Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection Bo Peng, Huiyu Zhou, Jiaran Zhou, Junyu Dong, Ying Zhang, Yuezun Li
323 Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition DongJae Lee, Myeong-Yeon Yi, Naeun Ko, Sang-goo Lee, Seunggyu Chang, Yonghyun Jeong
335 SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning Bian Wu, Chenyong Guan, Donghao Zhou, Guangyong Chen, Hao Chen, Jiaze Wang, Jinpeng Li, Pheng-Ann Heng, Ziyu Guo
352 DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation Dan Casas, Elena Garces, Raquel Vidaurre
384 Few-Shot Classification of Interactive Activities of Daily Living (InteractADL) Edward Vendrow, Ehsan Adeli, Kazuki Kozuka, Li Fei-Fei, Robathan Harries, Yuta Kyuragi, Zane Durante, Zelun Luo
388 ACIL: Active Class Incremental Learning for Image Classification Aditya Bhattacharya, Debanjan Goswami, Shayok Chakraborty
414 NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning Sree Rama Vamsidhar S, Gorthi Rama Krishna Sai Subrahmanyam
421 Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow Canran Li, Dongnan Liu, Weidong Cai
426 Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition Duy Hung Tran, Thao Minh Le, Truyen Tran, Tuyen Tran
433 Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning Dongyoung Kim, Jeong-Gun Lee, WonSook Lee
448 Learning to Project for Cross-Task Knowledge Distillation Benedikt Kolbeinsson, Dylan Auty, Krystian Mikolajczyk, Roy Miles
493 Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition Hong Chen, Mingxuan Liu, Xinxu Lin, Kezhuo Liu
499 MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders Bin Wen, Deng Huang, Haosen Yang, Hongxun Yao, Jiannan Wu, Xiatian Zhu, Yi Jiang, Zehuan Yuan
505 FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging Fakhri Karray, Mohammed Talha Alam, Mohsen Guizani, Raza Imam
524 A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction Alina Marcu, Dragos Costea, Marius Leordeanu
537 Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework Liuyuan Wen
599 VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection Changkang Li, Yalong Jiang
657 G3FA: Geometry-guided GAN for Face Animation Alain Pagani, Alireza Javanmardi, Didier Stricker
723 $ControlEdit: A MultiModal Local Clothing Image Editing Method$ Di Cheng, JiaFu Zhang, YULiu, Yingjie Shi, sun shixin, weijing wang
727 Optimising Diffusion Models for Histopathology Image Synthesis Jacqueline James, Richard Gault, Stephanie G Craig, Victoria Porter
746 Adapting MIMO video restoration networks to low latency constraints Valéry Dewil, Zhe Zheng, Arnaud Barral, Lara Raad, Nao Nicolas, Ioannis Cassagne, Jean-michel Morel, Gabriele Facciolo, Bruno Galerne, Pablo Arias
755 PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition Chenhongyi Yang, Elliot J. Crowley, Jiaming Liu, Linus Ericsson, Miguel Espinosa, Zehui Chen, Zhenyu Wang
790 FILS: Self-Supervised Video Feature Prediction In Semantic Language Space Andrew Gilbert, Frank Guerin, Mona Ahmadian
900 Direct-Sum Approach to Integrate Losses Via Classifier Subspace Takumi Kobayashi
911 A simple Color Correction Matrix for RAW Reconstruction Anqi Liu, Shugong Xu, Shiyi Mu
Room: Hall 2
Doctoral Consortium
10:00 - 13:00
Chair: Richard Menzies and George Killick 10:00 - 10:15 Fatemeh Amerehi Toward Comprehensive Neural Network Robustness
10:15 - 10:30 Zahra Babaiee Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels
10:30 - 10:45 Jack Saunders Style and Speech in Facial Animation
10:45 - 11:00 Muhammad Akhtar Munir Exploring Advanced Calibration Loss Techniques for Vision-Language Models
11:00 - 11:15 Break Break
11:15 - 11:30 Filippos Gouidis Recognizing object states by combining data-driven and symbolic methods
11:30 - 11:45 Remco Royen Addressing labelling, complexity, latency, and scalability in deep learning-based processing of point clouds
11:45 - 12:15 Speaker: Md. Mostafa Kamal Sarker (Technovative Solutions LTD) Dr Sarker is the Lead AI Research Scientist at Technovative Solutions LTD (TVS) and a Visiting Fellow at the University of Oxford. He's an expert in artificial intelligence, computer vision, and deep learning. His research has significantly impacted clinical AI, biomedical image analysis, and digital healthcare, evident in his 40+ peer-reviewed publications. At BMVC2024, he'll share his valuable insights and guide aspiring researchers on transitioning from academia to industry and discuss the exciting opportunities this path offers.
12:15 - 13:00 Mentor Session
Room: M2
Workshop Sessions
09:00 - 18:00
09:00 - 18:00 Robust Recognition in the Open World

https://rrow2024.github.io
Room: M3
14:00 - 18:00 DIFA: Deep Learning-based Image Fusion and Its Applications

https://difa2024.github.io
Room: M2
Oral Session - Machine Vision in Challenging Scenarios
11:45 - 13:00
Chair: Amey Pore 11:45 103
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation
Han Sun, Julio Delgado Mangas, Luc Van Gool, Martin Danelljan, Nikolay Marin, Rui Gong
12:00 200
Towards Generative Class Prompt Learning for Fine-grained Visual Recognition
Emanuele Vivoli, Josep Llados, Sanket Biswas, Soumitri Chattopadhyay
12:15 406
When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
Adam Goodge, Bryan Hooi, Wee Siong Ng
12:30 615
Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
Jason J Corso, Mahzad Khoshlessan, Nathan Louis
12:45 754
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Davide Caffagni, Lorenzo Baraldi, Marcella Cornia, Nicholas Moratelli, Rita Cucchiara
Room: M1
Oral Session - Image Quality Algorithms
14:00 - 15:15
Chair: Jefersson A. dos Santos 14:00 14
Efficiency-preserving Scene-adaptive Object Detection
Vu Quang Truong, Zekun Zhang, Minh Hoai
14:15 114
Key-point Guided Deformable Image Manipulation Using Diffusion Model
Guil Jung, Hyeonmin Bae, Hyuksool Kwon, Myeong-Gee Kim, Seok-Hwan Oh, Young-Min Kim, hyeonjik lee, Sang-yun Kim
14:30 416
Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance
Abhishek Tiwari, Kshitij Sharad Jadhav, Pankhi Kashyap, Pavni Tandon, Ritwik Kulkarni, Sunny Gupta
14:45 517
Interpretable Long-term Action Quality Assessment
Andrew Gilbert, Anthony Adeyemi-Ejeye, Wanqing Li, Xinran Liu, Xu Dong
15:00 545
Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection
Christian Fruhwirth-Reisinger, Wei Lin, Dušan Malić, Horst Bischof, Horst Possegger
Room: M1