The 35th British Machine Vision Conference 2024: Accepted Papers

Below is the list of accepted papers for BMVC 2024. Congratulations! You will receive an email with further information and the next steps soon!

If your paper is not listed, it has been rejected. We understand how disappointing it can be to have a paper rejected—we’ve all been there. We hope the feedback from the reviews (when you receive the email) will provide valuable insights for revising the work and that you will consider resubmitting it in the future.

This year, BMVC received 1020 submissions of which 264 papers were accepted. Each paper had 3 reviews, including a meta-review. All papers were discussed among the reviewers and the assigned Area Chairs (AC). Meta-reviews were verified by our Programme Chairs (PCs). All this was done while preserving author anonymity and avoiding domain conflicts.

Number Table

ID	Title
9	Federated Learning for Face Recognition via Intra-subject Self-supervised Learning
12	CLIP Adaptation by Intra-Modal Overlap Reduction
14	Efficiency-preserving Scene-adaptive Object Detection
15	Sequential Amodal Segmentation via Cumulative Occlusion Learning
16	Region-based Entropy Separation for One-shot Test-Time Adaptation
18	MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation
19	Few-shot Multispectral Segmentation with Representations Generated by Reinforcement Learning
22	HDRSplat: Gaussian Splatting for High Dynmaic Range 3D Scene Reconstruction from Raw Images
23	Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation
25	AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation
26	Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN
28	SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
31	COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation
32	Can CLIP help CLIP in learning 3D?
33	Self-Supervised Real-World Denoising by Jointly Learning Visible and Invisible Noise
34	TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation
37	DRAFT: Direct Radiance Fields Editing with Composable Operations
38	Linear Calibration Approach to Knowledge-free Group Robust Classification
39	HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction
41	Local Implicit Wavelet Transformer for Arbitrary-Scale Super-Resolution
42	Spatial-Temporal NAS for Fast Surgical Segmentation
43	Learning to Segment Publicly Accessible Green Spaces with Visual and Semantic Data
45	D³Nav: Data-Driven Driving Agents for Autonomous Vehicles in Unstructured Traffic
46	FFR-UNet: Feature Filter-Refinement UNet for Medical Image Segmentation
47	Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances
53	NCA-Morph: Medical Image Registration with Neural Cellular Automata
54	"InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning"
60	Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer
64	Multi-Modal Information Bottleneck Attribution with Cross-Attention Guidance
66	Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models
70	Advancing Anomaly Detection: The IDW dataset and MC algorithm
74	ControlDreamer: Stylized 3D Generation with Multi-View ControlNet
76	SagaGAN: Style Applied using Gram matrix Attribution based on StarGAN v2
77	PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images
85	Textual Attention RPN for Open-Vocabulary Object Detection
100	Painterly Image Harmonization via Bi-Transformation with Dynamic Kernels
101	Interactive Image Segmentation with Temporal Information Augmented
102	Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes
103	Prompting Diffusion Representations for Cross-Domain Semantic Segmentation
104	MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression
108	MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds
111	Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients
113	Text Removal In E-Commerce Images: A Comparison Of Inpainting Methods
114	Key-point Guided Deformable Image Manipulation Using Diffusion Model
115	Multi-modal Crowd Counting via Modal Emulation
116	Enhancing Adversarial Robustness and Combating Uncertainty Bias in Transductive Zero-Shot Learning: A Framework of Pseudo-Bidirectional Alignment
133	MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM
135	Acoustic-based 3D human pose estimation robust to human position
136	PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows
137	InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth
140	Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space
142	Recovering Global Data Distribution Locally in Federated Learning
145	Privacy-preserving datasets by capturing feature distributions with Conditional VAEs
147	MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion
150	AISE: Adaptive Input Sampling for Explanation of Black-box Models
152	"Retinex-Inspired Cooperative Game Through Multi-Level Feature Fusion for Robust, Universal Image Restoration"
164	Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds
165	Learning Object Placement via Convolution Scoring Attention
166	Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection
168	Topology-preserving Adversarial Training for Alleviating Natural Accuracy Degradation
180	JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation
183	Hierarchical Prompt Learning for Scene Graph Generation
184	Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
185	Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
188	A self-supervised and adversarial approach to hyperspectral demosaicking and RGB reconstruction in surgical imaging
199	A Revisit to the Decoder for Camouflaged Object Detection
200	Towards Generative Class Prompt Learning for Fine-grained Visual Recognition
201	Infrared and Visible Image Fusion Using Multi-level Adaptive Fractional Differential
203	S³-Match: Common-View Aligned Image Matching via Self-Supervised Keypoint Selection
205	From Black-box to Label-only: a Plug-and-Play Attack Network for Model Inversion
207	Feature Splatting for Better Novel View Synthesis with Low Overlap
210	BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation
211	Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
212	InPer: Whole-Process Domain Generalization via Intervention and Perturbation
213	Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis
215	AttEntropy: On the Generalization Ability of Supervised Semantic Segmentation Transformers to New Objects in New Domains
216	Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning
217	GeoFormer: A Multi-Polygon Segmentation Transformer
218	RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance
223	AUPIMO: Redefining Anomaly Localization Benchmarks with High Speed and Low Tolerance
227	Cost-Sensitive Learning for Long-Tail Temporal Action Segmentation
228	Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction
240	SAM Helps SSL: Mask-guided Attention Bias for Self-supervised Learning
245	Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network
249	Transferable Learned Image Compression-Resistant Adversarial Perturbations
250	Deep Unfolding Network with Spatial-spectral Perception Enhanced for Pan-sharpening
256	IncreLM: Incremental 3D Line Mapping
257	Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery
262	Toward Highly Efficient Semantic-Guided Machine Vision for Low-Light Object Detection
263	Improving Object Detection via Local-global Contrastive Learning
267	Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds
287	A Super-pixel-based Approach to the Stable Interpretation of Neural Networks
288	PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition
290	Are Sparse Neural Networks Better Hard Sample Learners?
295	MxT: Mamba x Transformer for Image Inpainting
297	Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures
299	RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields
303	MixMask: Revisiting Masking Strategy for Siamese ConvNets
304	Interpretable Representation Learning from Videos using Nonlinear Priors
305	PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization
307	Discovering an Image-Adaptive Coordinate System for Photography Processing
308	Effective Message Hiding with Order-Preserving Mechanisms
317	EIANet: A Novel Domain Adaptation Approach to Maximize Class Distinction with Neural Collapse Principles
318	Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
319	Annotation by Clicks: A Point-Supervised Contrastive Variance Method for Medical Semantic Segmentation
323	Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition
328	DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning
329	Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training
330	Towards Better Zero-Shot Anomaly Detection under Distribution Shift with CLIP
335	SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning
339	FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection
342	Unsupervised Domain Adaptation for Tubular Structure Segmentation Across Different Anatomical Sources
346	Backdoor Defense through Self-Supervised and Generative Learning
352	DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation
358	Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning
361	Seg-HGNN: Unsupervised and Light-Weight Image Segmentation with Hyperbolic Graph Neural Networks
362	Into the Fog: Evaluating Robustness of Multiple Object Tracking
365	Cascade Masked Generative Distillation for Dense Prediction Tasks
369	Benchmarking and Optimizing Federated Learning with Hardware-related Metrics
374	Text-Guided Mixup Towards Long-Tailed Image Categorization
375	A Novel Divide and Merge Approach for Improved Classification of Functional Data
384	Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)
388	ACIL: Active Class Incremental Learning for Image Classification
391	PatchRot: Self-Supervised Training of Vision Transformers by Rotation Prediction
392	Label Smoothing++: Enhanced Label Regularization for Training Neural Networks
401	Decoupling Forgery Semantics for Generalizable Deepfake Detection
406	When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
414	NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning
416	Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance
417	Kernel Representation for Dynamic Networks
420	Layout Free Scene Graph to Image Generation
421	Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow
424	RETRO: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning
425	GLCM-Adapter: Global-Local Content Matching for Few-shot CLIP Adaptation
426	Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition
427	Lightweight Human Pose Estimation with Enhanced Knowledge Review
432	Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution
433	Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning
437	Difflare: Removing Image Lens Flare with Latent Diffusion Models
440	Explaining Multi-modal Large Language Models by Analyzing their Vision Perception
448	Learning to Project for Cross-Task Knowledge Distillation
452	Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
457	LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps
472	SAM-EG: Segment Anything Model with Egde Guidance framework for efficient Polyp Segmentation
480	Disparity Estimation Using a Quad-pixel Sensor
482	Unsupervised Hashing Network with Hyper Quantization Tree
486	DAVINCI: A Single-Stage Architecture for Constrained CAD Sketch Inference
492	Multimodal base distributions in conditional flow matching generative models
493	Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition
499	MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
500	Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences
505	FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging
508	Semantic Image Synthesis of Anime Characters Based on Conditional Generative Adversarial Networks
510	ML-2SN: A Hybrid Two-Stream System for Sitting Posture Detection
517	Interpretable Long-term Action Quality Assessment
524	A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction
528	SOFI: Multi-Scale Deformable Transformer for Camera Calibration with Enhanced Line Queries
532	Input-dependent Input-Prompts for Adapting Frozen Vision Transformers
533	TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training
534	Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning
537	Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework
545	Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection
546	Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs
557	Hybrid-CSR: Coupling Explicit and Implicit Reconstruction of Cortical Surface
563	As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP
566	SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models
568	Beyond Static and Dynamic Quantization - Hybrid Quantization of Vision Transformers
572	Multi-Scope Representation Learning for Causal Relation Discovery with new Challenging Datasets
577	AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field
579	Neural Collapse Inspired Contrastive Continual Learning
584	ATLANTIS: A Framework for Automated Targeted Language-guided Augmentation Training for Robust Image Search
595	A Prototype Unit for Image De-raining using Time-Lapse Data
597	FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model
599	VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection
601	Training-Free Zero-Shot Semantic Segmentation with LLM Refinement
606	VEMIC: View-aware Entropy model for Multi-view Image Compression
609	Guidance-base Diffusion Models for Improving Photoacoustic Image Quality
611	STPose: 6D object pose estimation network based on sparse attention and cross-layer connection
615	Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
619	Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection
622	The Attempt on Combining Three Talents by KD with Enhanced Boundary in Co-salient Object Detection
627	GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt
630	CPDR: Towards Highly-Efficient Salient Object Detection via Crossed Post-decoder Refinement
637	3D Point Cloud Network Pruning: When Some Weights Do not Matter
642	Revitalizing Legacy Video Content: Deinterlacing with Bidirectional Information Propagation
648	3D Blur Kernel on Gaussian Splatting
650	Drawing Insights: Sequential Representation Learning in Comics
657	G3FA: Geometry-guided GAN for Face Animation
659	GN-FR: Generalizable Neural Radinace Fields for Flare Removal
663	Unsupervised Point Cloud Registration with Self-Distillation
667	ICAF-4: An Integrated Framework of Category-level Articulated Object Perception and Manipulation for Embodied Intelligence
670	Leveraging Inductive Bias in ViT for Medical Image Diagnosis
678	Content and Style Aware Audio-Driven Facial Animation
680	May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels
681	On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
685	Boundary Contrastive Learning for Label-Efficient Medical Image Segmentation
686	TransHuPR: Cross-View Fusion Transformer for Human Pose Estimation Using mmWave Radar
689	AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning
692	Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation
695	Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies
697	Inverse Rendering of Outdoor Scenes with under Time-variant Illumination
707	QUD: Unsupervised Knowledge Distillation for Deep Face Recognition
721	Sign Stitching: A Novel Approach to Sign Language Production
723	$ControlEdit: A MultiModal Local Clothing Image Editing Method$
727	Optimising Diffusion Models for Histopathology Image Synthesis
729	Reconstructing Spheres by Fitting Planes
731	AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance
736	Rectifying Shortcut Learning through Cellular Differentiation in Deep Learning Neurons
737	Pseudo Labelling for Enhanced Masked Auto Encoders
738	CosFairNet:A Parameter-Space based Approach for Bias Free Learning
740	Frequency Decomposition to Tap the Potential of Single Domain for Generalization
745	Task-Related Feature Enhancement Network for Neuronal Morphology Classification
746	Adapting MIMO video restoration networks to low latency constraints
753	Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning
754	Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
755	PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
762	Open-World Semi-Supervised Learning under Compound Distribution Shifts
763	Horospherical Learning with Smart Prototypes
769	Flexible Graph Convolutional Network for 3D Human Pose Estimation
775	SAE: Single Architecture Ensemble Neural Networks
779	Outlier detection by ensembling uncertainty with negative objectness
787	MSA$^\text{2}$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation
790	FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
797	Calibration of 2D LiDAR sensors using cylindrical target
828	Multi-Scale Semantic Enrichment and Dual Angular Margin Contrast for Few-Shot Class Incremental Learning
833	Anomaly Detection Based on Semi-Formula Driven Pre-training Dataset to Represent Subtle Difference and Anomaly Score
853	Budget-aware Dynamic Spatially Adaptive Inference
854	CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection
857	Enhancing Radiology Report Generation: The Impact of Locally Grounded Vision and Language Training
859	Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes
863	CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning
865	APTPose: Anatomy-aware Pre-Training for 3D Human Pose Estimation
866	A Deep Belief Network Approach to Scalable Compression of Light Field Data for Auto-Stereoscopic Displays
878	Learning conditionally untangled latent spaces using Fixed Point Iteration
882	A Multimodal Network on Handwritten Chinese Character Error Correction
885	Efficient Data Source Relevance Quantification for Multi-Source Neural Networks
887	Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
895	Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs
897	topK dice loss for medical image segmentation
900	Direct-Sum Approach to Integrate Losses Via Classifier Subspace
902	Knowledge Distillation with Global Filters for Efficient Human Pose Estimation
911	A simple Color Correction Matrix for RAW Reconstruction
913	Examining the Threat Landscape: Foundation Models and Model Theft
922	UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters
927	GazeHELL: Gaze Estimation with Hybrid Encoders and Localised Losses with weighing
929	TrakAthlete4D: Multi-View On-Field Player Position Tracking in Sports
932	Spatiotemporal Vision Transformer for Weakly Supervised Dense Prediction of Dynamic Brain Maps
933	SceneSAM: Integrating 2D Labels for Weakly Supervised 3D Scene Understanding
936	PV-SLAM: Panoptic Visual SLAM with Loop Closure and Online Bundle Adjustment
939	Deep Learning for GPS-Denied SAR Image Focusing and Vehicle Trajectory Estimation
945	Gaussian Splatting in Mirrors: Reflection-aware Rendering via Virtual Camera Optimization
947	Layer-wise Learning of CNNs by Self-tuning Learning Rate and Early Stopping at Each Layer
949	On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods
954	Beyond Face Matching: A Facial Traits based Privacy Score for Synthetic Face Datasets
957	Putting the Segment Anything Model to the Test with 3D Knee MRI - A Comparison with State-of-the-Art Performance
959	SR+Codec: a Benchmark of Super-Resolution for Video Compression Bitrate Reduction
967	CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation
977	Improving Multimodal Learning with Multi-Loss Gradient Modulation
986	Adaptive Weighted Co-Learning for Cross-Domain Few-Shot Learning
987	Guided Attention for Interpretable Motion Captioning
991	iHAST: Integrating Hybrid Attention for Super-Resolution in Spatial Transcriptomics
998	MV-Match: Multi-View Matching for Domain-Adaptive Identification of Plant Nutrient Deficiencies
1013	Open-Vocabulary Temporal Action Localization using Multimodal Guidance
1020	Recovering SLAM Tracking Lost by Trifocal Pose Estimation using GPU-HC++