Below is the list of accepted papers for BMVC 2024. Congratulations! You will receive an email with further information and the next steps soon!

If your paper is not listed, it has been rejected. We understand how disappointing it can be to have a paper rejected—we’ve all been there. We hope the feedback from the reviews (when you receive the email) will provide valuable insights for revising the work and that you will consider resubmitting it in the future.

This year, BMVC received 1020 submissions of which 264 papers were accepted. Each paper had 3 reviews, including a meta-review. All papers were discussed among the reviewers and the assigned Area Chairs (AC). Meta-reviews were verified by our Programme Chairs (PCs). All this was done while preserving author anonymity and avoiding domain conflicts.

Number Table
9Federated Learning for Face Recognition via Intra-subject Self-supervised Learning
12CLIP Adaptation by Intra-Modal Overlap Reduction
14Efficiency-preserving Scene-adaptive Object Detection
15Sequential Amodal Segmentation via Cumulative Occlusion Learning
16Region-based Entropy Separation for One-shot Test-Time Adaptation
18MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation
19Few-shot Multispectral Segmentation with Representations Generated by Reinforcement Learning
22HDRSplat: Gaussian Splatting for High Dynmaic Range 3D Scene Reconstruction from Raw Images
23Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation
25AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation
26Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN
28SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
31COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation
32Can CLIP help CLIP in learning 3D?
33Self-Supervised Real-World Denoising by Jointly Learning Visible and Invisible Noise
34TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation
37DRAFT: Direct Radiance Fields Editing with Composable Operations
38Linear Calibration Approach to Knowledge-free Group Robust Classification
39HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction
41Local Implicit Wavelet Transformer for Arbitrary-Scale Super-Resolution
42Spatial-Temporal NAS for Fast Surgical Segmentation
43Learning to Segment Publicly Accessible Green Spaces with Visual and Semantic Data
45D³Nav: Data-Driven Driving Agents for Autonomous Vehicles in Unstructured Traffic
46FFR-UNet: Feature Filter-Refinement UNet for Medical Image Segmentation
47Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances
53NCA-Morph: Medical Image Registration with Neural Cellular Automata
54"InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning"
60Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer
64Multi-Modal Information Bottleneck Attribution with Cross-Attention Guidance
66Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models
70Advancing Anomaly Detection: The IDW dataset and MC algorithm
74ControlDreamer: Stylized 3D Generation with Multi-View ControlNet
76SagaGAN: Style Applied using Gram matrix Attribution based on StarGAN v2
77PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images
85Textual Attention RPN for Open-Vocabulary Object Detection
100Painterly Image Harmonization via Bi-Transformation with Dynamic Kernels
101Interactive Image Segmentation with Temporal Information Augmented
102Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes
103Prompting Diffusion Representations for Cross-Domain Semantic Segmentation
104MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression
108MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds
111Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients
113Text Removal In E-Commerce Images: A Comparison Of Inpainting Methods
114Key-point Guided Deformable Image Manipulation Using Diffusion Model
115Multi-modal Crowd Counting via Modal Emulation
116Enhancing Adversarial Robustness and Combating Uncertainty Bias in Transductive Zero-Shot Learning: A Framework of Pseudo-Bidirectional Alignment
133MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM
135Acoustic-based 3D human pose estimation robust to human position
136PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows
137InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth
140Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space
142Recovering Global Data Distribution Locally in Federated Learning
145Privacy-preserving datasets by capturing feature distributions with Conditional VAEs
147MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion
150AISE: Adaptive Input Sampling for Explanation of Black-box Models
152"Retinex-Inspired Cooperative Game Through Multi-Level Feature Fusion for Robust, Universal Image Restoration"
164Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds
165Learning Object Placement via Convolution Scoring Attention
166Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection
168Topology-preserving Adversarial Training for Alleviating Natural Accuracy Degradation
180JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation
183Hierarchical Prompt Learning for Scene Graph Generation
184Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
185Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion
188A self-supervised and adversarial approach to hyperspectral demosaicking and RGB reconstruction in surgical imaging
199A Revisit to the Decoder for Camouflaged Object Detection
200Towards Generative Class Prompt Learning for Fine-grained Visual Recognition
201Infrared and Visible Image Fusion Using Multi-level Adaptive Fractional Differential
203S³-Match: Common-View Aligned Image Matching via Self-Supervised Keypoint Selection
205From Black-box to Label-only: a Plug-and-Play Attack Network for Model Inversion
207Feature Splatting for Better Novel View Synthesis with Low Overlap
210BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation
211Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
212InPer: Whole-Process Domain Generalization via Intervention and Perturbation
213Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis
215AttEntropy: On the Generalization Ability of Supervised Semantic Segmentation Transformers to New Objects in New Domains
216Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning
217GeoFormer: A Multi-Polygon Segmentation Transformer
218RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance
223AUPIMO: Redefining Anomaly Localization Benchmarks with High Speed and Low Tolerance
227Cost-Sensitive Learning for Long-Tail Temporal Action Segmentation
228Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction
240SAM Helps SSL: Mask-guided Attention Bias for Self-supervised Learning
245Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network
249Transferable Learned Image Compression-Resistant Adversarial Perturbations
250Deep Unfolding Network with Spatial-spectral Perception Enhanced for Pan-sharpening
256IncreLM: Incremental 3D Line Mapping
257Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery
262Toward Highly Efficient Semantic-Guided Machine Vision for Low-Light Object Detection
263Improving Object Detection via Local-global Contrastive Learning
267Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds
287A Super-pixel-based Approach to the Stable Interpretation of Neural Networks
288PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition
290Are Sparse Neural Networks Better Hard Sample Learners?
295MxT: Mamba x Transformer for Image Inpainting
297Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures
299RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields
303MixMask: Revisiting Masking Strategy for Siamese ConvNets
304Interpretable Representation Learning from Videos using Nonlinear Priors
305PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization
307Discovering an Image-Adaptive Coordinate System for Photography Processing
308Effective Message Hiding with Order-Preserving Mechanisms
317EIANet: A Novel Domain Adaptation Approach to Maximize Class Distinction with Neural Collapse Principles
318Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
319Annotation by Clicks: A Point-Supervised Contrastive Variance Method for Medical Semantic Segmentation
323Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition
328DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning
329Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training
330Towards Better Zero-Shot Anomaly Detection under Distribution Shift with CLIP
335SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning
339FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection
342Unsupervised Domain Adaptation for Tubular Structure Segmentation Across Different Anatomical Sources
346Backdoor Defense through Self-Supervised and Generative Learning
352DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation
358Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning
361Seg-HGNN: Unsupervised and Light-Weight Image Segmentation with Hyperbolic Graph Neural Networks
362Into the Fog: Evaluating Robustness of Multiple Object Tracking
365Cascade Masked Generative Distillation for Dense Prediction Tasks
369Benchmarking and Optimizing Federated Learning with Hardware-related Metrics
374Text-Guided Mixup Towards Long-Tailed Image Categorization
375A Novel Divide and Merge Approach for Improved Classification of Functional Data
384Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)
388ACIL: Active Class Incremental Learning for Image Classification
391PatchRot: Self-Supervised Training of Vision Transformers by Rotation Prediction
392Label Smoothing++: Enhanced Label Regularization for Training Neural Networks
401Decoupling Forgery Semantics for Generalizable Deepfake Detection
406When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
414NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning
416Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance
417Kernel Representation for Dynamic Networks
420Layout Free Scene Graph to Image Generation
421Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow
424RETRO: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning
425GLCM-Adapter: Global-Local Content Matching for Few-shot CLIP Adaptation
426Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition
427Lightweight Human Pose Estimation with Enhanced Knowledge Review
432Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution
433Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning
437Difflare: Removing Image Lens Flare with Latent Diffusion Models
440Explaining Multi-modal Large Language Models by Analyzing their Vision Perception
448Learning to Project for Cross-Task Knowledge Distillation
452Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
457LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps
472SAM-EG: Segment Anything Model with Egde Guidance framework for efficient Polyp Segmentation
480Disparity Estimation Using a Quad-pixel Sensor
482Unsupervised Hashing Network with Hyper Quantization Tree
486DAVINCI: A Single-Stage Architecture for Constrained CAD Sketch Inference
492Multimodal base distributions in conditional flow matching generative models
493Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition
499MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
500Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences
505FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging
508Semantic Image Synthesis of Anime Characters Based on Conditional Generative Adversarial Networks
510ML-2SN: A Hybrid Two-Stream System for Sitting Posture Detection
517Interpretable Long-term Action Quality Assessment
524A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction
528SOFI: Multi-Scale Deformable Transformer for Camera Calibration with Enhanced Line Queries
532Input-dependent Input-Prompts for Adapting Frozen Vision Transformers
533TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training
534Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning
537Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework
545Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection
546Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs
557Hybrid-CSR: Coupling Explicit and Implicit Reconstruction of Cortical Surface
563As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP
566SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models
568Beyond Static and Dynamic Quantization - Hybrid Quantization of Vision Transformers
572Multi-Scope Representation Learning for Causal Relation Discovery with new Challenging Datasets
577AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field
579Neural Collapse Inspired Contrastive Continual Learning
584ATLANTIS: A Framework for Automated Targeted Language-guided Augmentation Training for Robust Image Search
595A Prototype Unit for Image De-raining using Time-Lapse Data
597FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model
599VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection
601Training-Free Zero-Shot Semantic Segmentation with LLM Refinement
606VEMIC: View-aware Entropy model for Multi-view Image Compression
609Guidance-base Diffusion Models for Improving Photoacoustic Image Quality
611STPose: 6D object pose estimation network based on sparse attention and cross-layer connection
615Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
619Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection
622The Attempt on Combining Three Talents by KD with Enhanced Boundary in Co-salient Object Detection
627GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt
630CPDR: Towards Highly-Efficient Salient Object Detection via Crossed Post-decoder Refinement
6373D Point Cloud Network Pruning: When Some Weights Do not Matter
642Revitalizing Legacy Video Content: Deinterlacing with Bidirectional Information Propagation
6483D Blur Kernel on Gaussian Splatting
650Drawing Insights: Sequential Representation Learning in Comics
657G3FA: Geometry-guided GAN for Face Animation
659GN-FR: Generalizable Neural Radinace Fields for Flare Removal
663Unsupervised Point Cloud Registration with Self-Distillation
667ICAF-4: An Integrated Framework of Category-level Articulated Object Perception and Manipulation for Embodied Intelligence
670Leveraging Inductive Bias in ViT for Medical Image Diagnosis
678Content and Style Aware Audio-Driven Facial Animation
680May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels
681On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
685Boundary Contrastive Learning for Label-Efficient Medical Image Segmentation
686TransHuPR: Cross-View Fusion Transformer for Human Pose Estimation Using mmWave Radar
689AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning
692Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation
695Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies
697Inverse Rendering of Outdoor Scenes with under Time-variant Illumination
707UKD: Unsupervised Knowledge Distillation for Face Recognition
721Sign Stitching: A Novel Approach to Sign Language Production
723$ControlEdit: A MultiModal Local Clothing Image Editing Method$
727Optimising Diffusion Models for Histopathology Image Synthesis
729Reconstructing Spheres by Fitting Planes
731AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance
736Rectifying Shortcut Learning through Cellular Differentiation in Deep Learning Neurons
737Pseudo Labelling for Enhanced Masked Auto Encoders
738CosFairNet:A Parameter-Space based Approach for Bias Free Learning
740Frequency Decomposition to Tap the Potential of Single Domain for Generalization
745Task-Related Feature Enhancement Network for Neuronal Morphology Classification
746Adapting MIMO video restoration networks to low latency constraints
753Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning
754Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
755PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
762Open-World Semi-Supervised Learning under Compound Distribution Shifts
763Horospherical Learning with Smart Prototypes
769Flexible Graph Convolutional Network for 3D Human Pose Estimation
775SAE: Single Architecture Ensemble Neural Networks
779Outlier detection by ensembling uncertainty with negative objectness
787MSA$^\text{2}$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation
790FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
797Calibration of 2D LiDAR sensors using cylindrical target
828Multi-Scale Semantic Enrichment and Dual Angular Margin Contrast for Few-Shot Class Incremental Learning
833Anomaly Detection Based on Semi-Formula Driven Pre-training Dataset to Represent Subtle Difference and Anomaly Score
853Budget-aware Dynamic Spatially Adaptive Inference
854CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection
857Enhancing Radiology Report Generation: The Impact of Locally Grounded Vision and Language Training
859Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes
863CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning
865APTPose: Anatomy-aware Pre-Training for 3D Human Pose Estimation
866A Deep Belief Network Approach to Scalable Compression of Light Field Data for Auto-Stereoscopic Displays
878Learning conditionally untangled latent spaces using Fixed Point Iteration
882A Multimodal Network on Handwritten Chinese Character Error Correction
885Efficient Data Source Relevance Quantification for Multi-Source Neural Networks
887Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
895Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs
897topK dice loss for medical image segmentation
900Direct-Sum Approach to Integrate Losses Via Classifier Subspace
902Knowledge Distillation with Global Filters for Efficient Human Pose Estimation
911A simple Color Correction Matrix for RAW Reconstruction
913Examining the Threat Landscape: Foundation Models and Model Theft
922UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters
927GazeHELL: Gaze Estimation with Hybrid Encoders and Localised Losses with weighing
929TrakAthlete4D: Multi-View On-Field Player Position Tracking in Sports
932Spatiotemporal Vision Transformer for Weakly Supervised Dense Prediction of Dynamic Brain Maps
933SceneSAM: Integrating 2D Labels for Weakly Supervised 3D Scene Understanding
936PV-SLAM: Panoptic Visual SLAM with Loop Closure and Online Bundle Adjustment
939Deep Learning for GPS-Denied SAR Image Focusing and Vehicle Trajectory Estimation
945Gaussian Splatting in Mirrors: Reflection-aware Rendering via Virtual Camera Optimization
947Layer-wise Learning of CNNs by Self-tuning Learning Rate and Early Stopping at Each Layer
949On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods
954Beyond Face Matching: A Facial Traits based Privacy Score for Synthetic Face Datasets
957Putting the Segment Anything Model to the Test with 3D Knee MRI - A Comparison with State-of-the-Art Performance
959SR+Codec: a Benchmark of Super-Resolution for Video Compression Bitrate Reduction
967CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation
977Improving Multimodal Learning with Multi-Loss Gradient Modulation
986Adaptive Weighted Co-Learning for Cross-Domain Few-Shot Learning
987Guided Attention for Interpretable Motion Captioning
991iHAST: Integrating Hybrid Attention for Super-Resolution in Spatial Transcriptomics
998MV-Match: Multi-View Matching for Domain-Adaptive Identification of Plant Nutrient Deficiencies
1013Open-Vocabulary Temporal Action Localization using Multimodal Guidance
1020Recovering SLAM Tracking Lost by Trifocal Pose Estimation using GPU-HC++