The 35th British Machine Vision Conference 2024: Tuesday, 26th November

BMVC conference papers, supplementary material and video presentations can be found at: BMVC Papers

BMVC workshop papers can be found at: BMVC Workshop Papers

Keynote - Mubarak Shah

09:00 - 10:00

09:00 - 10:00	Title: Privacy Preservation and Bias Mitigation in Human Action Recognition Abstract: Advances in action recognition have enabled a wide range of real-world applications, e.g. elderly person monitoring systems, autonomous vehicles, sports analysis. As these techniques are being used in the real world two important issues have emerged: privacy and bias. Most of these video understanding applications involve extensive computation, for which a user needs to share the video data to the cloud computation server, where the user also ends up sharing the private visual information like gender, skin color, clothing, background objects etc. Therefore, there is a pressing need for solutions to privacy preserving action recognition. Beyond privacy protection, bias in video understanding can lead to unfair and incorrect decision making. Action recognition models may predict specific actions based on gender stereotypes, such as associating a perceived female subject with hands near her face as applying makeup or brushing hair, even with nothing in hand, or they may suffer from background bias (i.e., inferring actions based on background cues) and foreground bias (i.e., relying on subject appearance). In this talk, I will present our recent work on Privacy Preservation and Bias Mitigation in human action recognition. https://www.crcv.ucf.edu/person/mubarak-shah/ Room: M1

Title: Privacy Preservation and Bias Mitigation in Human Action Recognition

Abstract: Advances in action recognition have enabled a wide range of real-world applications, e.g. elderly person monitoring systems, autonomous vehicles, sports analysis. As these techniques are being used in the real world two important issues have emerged: privacy and bias. Most of these video understanding applications involve extensive computation, for which a user needs to share the video data to the cloud computation server, where the user also ends up sharing the private visual information like gender, skin color, clothing, background objects etc. Therefore, there is a pressing need for solutions to privacy preserving action recognition. Beyond privacy protection, bias in video understanding can lead to unfair and incorrect decision making. Action recognition models may predict specific actions based on gender stereotypes, such as associating a perceived female subject with hands near her face as applying makeup or brushing hair, even with nothing in hand, or they may suffer from background bias (i.e., inferring actions based on background cues) and foreground bias (i.e., relying on subject appearance). In this talk, I will present our recent work on Privacy Preservation and Bias Mitigation in human action recognition.

https://www.crcv.ucf.edu/person/mubarak-shah/

Room: M1

Keynote Session - Salauddin Sohag

14:00 - 14:30

14:00 - 14:30	Title: Technovative Solutions Abstract: In this keynote, Sohag will chart the company's journey, recounting major innovations, achievements, and the challenges it has navigated to reach its current position. Sohag will also connect Technovative Solutions' advancements to emerging industry trends, illustrating how the company is strategically positioned to address current and future market demands through cutting-edge solutions and agile methodologies. https://technovativesolutions.co.uk/ Room: M1

Title: Technovative Solutions

Abstract: In this keynote, Sohag will chart the company's journey, recounting major innovations, achievements, and the challenges it has navigated to reach its current position. Sohag will also connect Technovative Solutions' advancements to emerging industry trends, illustrating how the company is strategically positioned to address current and future market demands through cutting-edge solutions and agile methodologies.

https://technovativesolutions.co.uk/

Room: M1

Poster Sessions

10:00 - 11:45 / 15:45 - 17:30

10:00 - 11:45

Papers Presented

9	Federated Learning for Face Recognition via Intra-subject Self-supervised Learning	Hansol Kim, Hoyeol choi, Youngjun Kwak
42	Spatial-Temporal NAS for Fast Surgical Segmentation	Matthew Lee, Felix John Samuel Bragman, Ricardo Sanchez-Matilla, Imanol Luengo, Danail Stoyanov
76	SagaGAN: Style Applied using Gram matrix Attribution based on StarGAN v2	Yongseon Yoo, Seonggyu Kim, Jong-Min Lee
136	PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows	Joaquim Comas Martínez, Antonia Alomar, Adria Ruiz, Federico Sukno
145	Privacy-preserving datasets by capturing feature distributions with Conditional VAEs	Francesco Di Salvo, David Tafler, Sebastian Doerrich, Christian Ledig
201	Infrared and Visible Image Fusion Using Multi-level Adaptive Fractional Differential	Kang Zhang, Xinnian Guo
205	From Black-box to Label-only: a Plug-and-Play Attack Network for Model Inversion	Huan Bao, Kaimin Wei, Yao Chen, Hanting Hou, Jinpeng Chen, Yongdong WU
287	A Super-pixel-based Approach to the Stable Interpretation of Neural Networks	Shizhan Gong, Jingwei Zhang, Qi Dou, Farzan Farnia
330	Towards Better Zero-Shot Anomaly Detection under Distribution Shift with CLIP	Jiyao Gao, Chengxin He, Lei Duan, Jie Zuo
339	FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection	zhangyangxiang, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong
346	Backdoor Defense through Self-Supervised and Generative Learning	Ivan Sabolic, Ivan Grubišić, Siniša Šegvić
375	A Novel Divide and Merge Approach for Improved Classification of Functional Data	wei zhao, Xiao-Jun Zeng, Chengdong shi, Ching-Hsun Tseng, Yue Chang
420	Layout Free Scene Graph to Image Generation	RAMESHWAR MISHRA, A. Subramanyam
427	Lightweight Human Pose Estimation with Enhanced Knowledge Review	Hao Xu, Shengye Yan, Wei Zheng
440	Explaining Multi-modal Large Language Models by Analyzing their Vision Perception	Loris Giulivi, Giacomo Boracchi
457	LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps	Andrey Palaev, Adil Khan, Syed M Ahsan Kazmi
492	Multimodal base distributions in conditional flow matching generative models	Shane Josias, Willie Brink
508	Semantic Image Synthesis of Anime Characters Based on Conditional Generative Adversarial Networks	Xuhui Zhu, feng jiang, Jing Wen, yi wang, qiang gao
510	ML-2SN: A Hybrid Two-Stream System for Sitting Posture Detection	Kehang Jia, Gaorui Zhang, Yixuan Yang, Guangwei Huang, Penghuan Wang, Cheng Cheng
532	Input-dependent Input-Prompts for Adapting Frozen Vision Transformers	Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M Asano
595	A Prototype Unit for Image De-raining using Time-Lapse Data	Jaehoon Cho, Minjung Yoo, Jini Yang, Sunok Kim
606	VEMIC: View-aware Entropy model for Multi-view Image Compression	Susmija Jabbireddy, Davit Soselia, Max Ehrlich, Christopher Metzler, Amitabh Varshney
609	Guidance-base Diffusion Models for Improving Photoacoustic Image Quality	Tatsuhiro Eguchi, Shumpei Takezaki, Mihoko Shimano, Takayuki Yagi, Ryoma Bise
686	TransHuPR: Cross-View Fusion Transformer for Human Pose Estimation Using mmWave Radar	Niraj Prakash Kini, Ruey-Horng Shiue, ryan chandra, Wen-Hsiao Peng, Ching-Wen Ma, Jenq-Neng Hwang
692	Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation	Cheng Chen, Jiang Liu, Liaoyuan Zeng, Fang Duan, Sean McGrath, Tian Dan
707	QUD: Unsupervised Knowledge Distillation for Deep Face Recognition	Jan Niklas Kolf, Naser Damer, Fadi Boutros
736	Rectifying Shortcut Learning through Cellular Differentiation in Deep Learning Neurons	Hongjing Niu, Hanting Li, Guoping Wu, Bin Li, Feng Zhao
738	CosFairNet:A Parameter-Space based Approach for Bias Free Learning	Rajeev Ranjan Dwivedi, Priyadarshini Kumari, Vinod K. Kurmi
740	Frequency Decomposition to Tap the Potential of Single Domain for Generalization	Hongjing Niu, Qingyue Yang, Pengfei Xia, Wei Zhang, Bin Li, Feng Zhao
745	Task-Related Feature Enhancement Network for Neuronal Morphology Classification	Chunli Sun, Feng Zhao
769	Flexible Graph Convolutional Network for 3D Human Pose Estimation	Abu Taib Mohammed Shahjahan, Abdessamad Ben Hamza
828	Multi-Scale Semantic Enrichment and Dual Angular Margin Contrast for Few-Shot Class Incremental Learning	Riya Verma, Sukhendu Das
833	Anomaly Detection Based on Semi-Formula Driven Pre-training Dataset to Represent Subtle Difference and Anomaly Score	Hiroki Kobayashi, Naoki Murakami, Naoto Hiramatsu, Takahiro Suzuki, Manabu Hashimoto
853	Budget-aware Dynamic Spatially Adaptive Inference	Georgios Zampokas, Christos-Savvas Bouganis, Dimitris Tzovaras
857	Enhancing Radiology Report Generation: The Impact of Locally Grounded Vision and Language Training	Sergio Sanchez Santiesteban, Muhammad Awais, Yi-Zhe Song, Josef Kittlers
865	APTPose: Anatomy-aware Pre-Training for 3D Human Pose Estimation	Qing-Wen Yang, Kai-Wen Duan, Ting-Yi Lu, Kevin Lin, Cheng-Yen Yang, Lijuan Wang, Jenq-Neng Hwang, Shang-Hong Lai
866	A Deep Belief Network Approach to Scalable Compression of Light Field Data for Auto-Stereoscopic Displays	Sally Khaidem, Mansi Sharma
902	Knowledge Distillation with Global Filters for Efficient Human Pose Estimation	Kaushik Bhargav Sivangi, Fani Deligianni
922	UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters	Kovvuri Sai Gopal Reddy, Saran Bodduluri, A. Mudit Adityaja, Saurabh Shigwan, Nitin Kumar, Snehasis Mukherjee
584	ATLANTIS: A Framework for Automated Targeted Language-guided Augmentation Training for Robust Image Search	Inderjeet Singh, Roman Vainshtein, Alon Zolfi, Asaf Shabtai, Tu Bui, Jonathan Brokman, Omer Hofman, Fumiyoshi Kasahara, Kentaro Tsuji, Hisashi Kojima
670	Leveraging Inductive Bias in ViT for Medical Image Diagnosis	Jungmin Ha, Euihyun-yoon, Sungsik Kim, Jinkyu Kim, Jaekoo Lee
863	CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning	Emanuele Frascaroli, Aniello Panariello, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara
913	Examining the Threat Landscape: Foundation Models and Model Theft	Ankita Raj, Deepankar Varma, Chetan Arora
967	CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation	Jianyu Zhao, Wei Quan, Bogdan Matuszewski

Room: Hall 2

15:45 - 17:30

Papers Presented

328	DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning	Dino Ienco, Cassio Fraga Dantaso
133	MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM	Ren-Wu Li, Wenjing Ke, Dong Li, Lu Tian, Emad Barsoum
787	MSA2Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation	Sina Ghorbani Kolahi, Seyed Kamal Chaharsooghi, Toktam Khatibi, Afshin Bozorgpour, Reza Azad, Moein Heidari, Ilker Hacihaliloglu, Dorit Merhof
188	A self-supervised and adversarial approach to hyperspectral demosaicking and RGB reconstruction in surgical imaging	Peichao Li, Oscar MacCormac, Jonathan Shapey, Tom Vercauteren
113	Text Removal In E-Commerce Images: A Comparison Of Inpainting Methods	Hiya Roy, Bjorn Stenger
486	DAVINCI: A Single-Stage Architecture for Constrained CAD Sketch Inference	Ahmet Serdar Karadeniz, Dimitrios Stefanos Mallis, Nesryne Mejri, Kseniya Cherenkova, Anis Kacem, Djamila Aouada
568	Beyond Static and Dynamic Quantization - Hybrid Quantization of Vision Transformers	Piotr Kluska, Florian Scheidegger, Cristiano Malossi, Enrique S. Quintana-Orti
597	FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model	Yuanwei Li, Elizaveta Ivanova, Martins Bruveris
299	RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields	Mihnea-Bogdan Jurca, Remco Royen, Ion Giosan, Adrian Munteanu
303	MixMask: Revisiting Masking Strategy for Siamese ConvNets	Kirill Vishniakov, Eric P. Xing, Zhiqiang Shen
305	PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization	Hasib Zunair, Abdessamad Ben Hamza
317	EIANet: A Novel Domain Adaptation Approach to Maximize Class Distinction with Neural Collapse Principles	Zicheng Pan, Xiaohan Yu, Yongsheng Gao
358	Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning	Muhammad Salman Ali, Maryam Qamar, Sung-Ho Bae, Enzo Tartaglione
361	Seg-HGNN: Unsupervised and Light-Weight Image Segmentation with Hyperbolic Graph Neural Networks	Debjyoti Mondal, Rahul Mishra, Chandan Kumar Pandey
374	Text-Guided Mixup Towards Long-Tailed Image Categorization	Richard Franklin, Jiawei Yao, Deyang Zhong, Qi Qian, Juhua Hu
391	PatchRot: Self-Supervised Training of Vision Transformers by Rotation Prediction	Sachin Chhabra, Hemanth Venkateswara, Baoxin Li
425	GLCM-Adapter: Global-Local Content Matching for Few-shot CLIP Adaptation	Shuo Wang, Xieenlong, Jinda Lu, Jinghan Li, Yanbin Hao
437	Difflare: Removing Image Lens Flare with Latent Diffusion Models	Tianwen Zhou, Qihao Duan, Zitong YU
452	Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty	Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao
572	Multi-Scope Representation Learning for Causal Relation Discovery with new Challenging Datasets	Jiageng Zhu, Hanchen Xie, Jianhua Wu, Mohamed E. Hussein, Mahyar Khayatkhoei, Jiazhi Li, Wael AbdAlmageed
577	AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field	Rong Liu, Rui Xu, Yue Hu, Meida Chen, Andrew Feng
650	Drawing Insights: Sequential Representation Learning in Comics	Sam Titarsolej, Neil Cohn, Nanne Van Noord
775	SAE: Single Architecture Ensemble Neural Networks	Martin Ferianc, Hongxiang Fan, Miguel R. D. Rodrigues
859	Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes	Dmitry Demidov, Abduragim Shtanchaev, Mihail Minkov Mihaylov, Mohammad Almansoori
878	Learning conditionally untangled latent spaces using Fixed Point Iteration	Victor Enescu, Hichem Sahbi
945	Gaussian Splatting in Mirrors: Reflection-aware Rendering via Virtual Camera Optimization	Zihan Wang, Shuzhe Wang, Matias Turkulainen, Junyuan Fang, Juho Kannala
949	On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods	Hariprasath Govindarajan, Per Sidén, Jacob Roll, Fredrik Lindsten
212	InPer: Whole-Process Domain Generalization via Intervention and Perturbation	Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Xinghao Ding, Yue Huang
933	SceneSAM: Integrating 2D Labels for Weakly Supervised 3D Scene Understanding	Julius Koerner, Dogu Tamgac, David Rozenberszki
936	PV-SLAM: Panoptic Visual SLAM with Loop Closure and Online Bundle Adjustment	Ashok Bandyopadhyay, Pranjal Baranwal, Arijit Sur, Rajeev UP
957	Putting the Segment Anything Model to the Test with 3D Knee MRI - A Comparison with State-of-the-Art Performance	Oliver Mills, Nishant Ravikumar, Philip G Conaghan, Samuel D Relton
998	MV-Match: Multi-View Matching for Domain-Adaptive Identification of Plant Nutrient Deficiencies	Jinhui Yi, Yanan Luo, Marion Deichmann, Gabriel Schaaf, Juergen Gall
929	TrakAthlete4D: Multi-View On-Field Player Position Tracking in Sports	Nitish Agarwal, Steven Cadavid
932	Spatiotemporal Vision Transformer for Weakly Supervised Dense Prediction of Dynamic Brain Maps	Behnam Kazemivash, Armin Iraji, Sergey M. Plis, Vince Calhoun
939	Deep Learning for GPS-Denied SAR Image Focusing and Vehicle Trajectory Estimation	Christopher Beam, Andrew R. Willis, Kevin M Brink
947	Layer-wise Learning of CNNs by Self-tuning Learning Rate and Early Stopping at Each Layer	Melika Sadeghi Tabrizi, Ali Karimi, Ahmad Kalhor, Babak N Araabi, Mona Ahmadian
954	Beyond Face Matching: A Facial Traits based Privacy Score for Synthetic Face Datasets	Robero Leyva, Praveen Selvaraj, Andrew Elliott, Gregory Epiphaniou, Carsten Maple
986	Adaptive Weighted Co-Learning for Cross-Domain Few-Shot Learning	Abdullah Alchihabi, Marzi Heidari, Yuhong Guo
991	iHAST: Integrating Hybrid Attention for Super-Resolution in Spatial Transcriptomics	Xi Li, Jing Zhang, Ziheng Duan, Yi Dai, Siwei Xu
1020	Recovering SLAM Tracking Lost by Trifocal Pose Estimation using GPU-HC++	Chiang-Heng Chien, Ahmad Abdelfattah, Benjamin Kimia
927	GazeHELL: Gaze Estimation with Hybrid Encoders and Localised Losses with weighing	Shubham Dokania, Vasudev Singh, Shuaib Ahmed
977	Improving Multimodal Learning with Multi-Loss Gradient Modulation	Konstantinos Kontras, Christos Chatzichristos, Matthew B. Blaschko, Maarten De Vos
987	Guided Attention for Interpretable Motion Captioning	KARIM RADOUANE, Julien Lagarde, Sylvie RANWEZ, Andon Tchechmedjiev
213	Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis	Theodoros Kouzelis, Emmanouil Plitsis, Mihalis Nicolaou, Yannis Panagakis
959	SR+Codec: a Benchmark of Super-Resolution for Video Compression Bitrate Reduction	Evgeney Bogatyrev, Ivan Molodetskikh, Dmitriy S. Vatolin
1013	Open-Vocabulary Temporal Action Localization using Multimodal Guidance	Akshita Gupta, Aditya Arora, Sanath Narayan, Salman Khan, Fahad Khan, Graham W. Taylor

Room: Hall 2

Oral Session - Real World Applications

11:45 - 13:00

Chair: Carlos Moreno-Garcia	11:45	584	ATLANTIS: A Framework for Automated Targeted Language-guided Augmentation Training for Robust Image Search Inderjeet Singh, Roman Vainshtein, Alon Zolfi, Asaf Shabtai, Tu Bui, Jonathan Brokman, Omer Hofman, Fumiyoshi Kasahara, Kentaro Tsuji, Hisashi Kojima
	12:00	670	Leveraging Inductive Bias in ViT for Medical Image Diagnosis Jungmin Ha, Euihyun-yoon, Sungsik Kim, Jinkyu Kim, Jaekoo Lee
	12:15	863	CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning Emanuele Frascaroli, Aniello Panariello, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara
	12:30	1020	Recovering SLAM Tracking Lost by Trifocal Pose Estimation using GPU-HC++ Chiang-Heng Chien, Ahmad Abdelfattah, Benjamin Kimia
	12:45	967	CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation Jianyu Zhao, Wei Quan, Bogdan Matuszewski
	Room: M1

Industrial Session (Sponsored by Technovative Solutions)

14:30 - 15:45

Chair: Chaitanya Kaul	14:30	486	DAVINCI: A Single-Stage Architecture for Constrained CAD Sketch Inference Ahmet Serdar Karadeniz, Dimitrios Stefanos Mallis, Nesryne Mejri, Kseniya Cherenkova, Anis Kacem, Djamila Aouada
	14:45	787	MSA2Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation Sina Ghorbani Kolahi, Seyed Kamal Chaharsooghi, Toktam Khatibi, Afshin Bozorgpour, Reza Azad, Moein Heidari, Ilker Hacihaliloglu, Dorit Merhof
	15:00	188	A self-supervised and adversarial approach to hyperspectral demosaicking and RGB reconstruction in surgical imaging Peichao Li, Oscar MacCormac, Jonathan Shapey, Tom Vercauteren
	15:15	113	Text Removal In E-Commerce Images: A Comparison Of Inpainting Methods Hiya Roy, Bjorn Stenger
	15:30	328	DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning Dino Ienco, Cassio Fraga Dantas
	Room: M1