Below is the list of accepted papers for BMVC 2024. Congratulations! You will receive an email with further information and the next steps soon!
If your paper is not listed, it has been rejected. We understand how disappointing it can be to have a paper rejected—we’ve all been there. We hope the feedback from the reviews (when you receive the email) will provide valuable insights for revising the work and that you will consider resubmitting it in the future.
This year, BMVC received 1020 submissions of which 264 papers were accepted. Each paper had 3 reviews, including a meta-review. All papers were discussed among the reviewers and the assigned Area Chairs (AC). Meta-reviews were verified by our Programme Chairs (PCs). All this was done while preserving author anonymity and avoiding domain conflicts.
ID | Title |
---|---|
9 | Federated Learning for Face Recognition via Intra-subject Self-supervised Learning |
12 | CLIP Adaptation by Intra-Modal Overlap Reduction |
14 | Efficiency-preserving Scene-adaptive Object Detection |
15 | Sequential Amodal Segmentation via Cumulative Occlusion Learning |
16 | Region-based Entropy Separation for One-shot Test-Time Adaptation |
18 | MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation |
19 | Few-shot Multispectral Segmentation with Representations Generated by Reinforcement Learning |
22 | HDRSplat: Gaussian Splatting for High Dynmaic Range 3D Scene Reconstruction from Raw Images |
23 | Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation |
25 | AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation |
26 | Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN |
28 | SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters |
31 | COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation |
32 | Can CLIP help CLIP in learning 3D? |
33 | Self-Supervised Real-World Denoising by Jointly Learning Visible and Invisible Noise |
34 | TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation |
37 | DRAFT: Direct Radiance Fields Editing with Composable Operations |
38 | Linear Calibration Approach to Knowledge-free Group Robust Classification |
39 | HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction |
41 | Local Implicit Wavelet Transformer for Arbitrary-Scale Super-Resolution |
42 | Spatial-Temporal NAS for Fast Surgical Segmentation |
43 | Learning to Segment Publicly Accessible Green Spaces with Visual and Semantic Data |
45 | D³Nav: Data-Driven Driving Agents for Autonomous Vehicles in Unstructured Traffic |
46 | FFR-UNet: Feature Filter-Refinement UNet for Medical Image Segmentation |
47 | Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances |
53 | NCA-Morph: Medical Image Registration with Neural Cellular Automata |
54 | "InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning" |
60 | Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer |
64 | Multi-Modal Information Bottleneck Attribution with Cross-Attention Guidance |
66 | Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models |
70 | Advancing Anomaly Detection: The IDW dataset and MC algorithm |
74 | ControlDreamer: Stylized 3D Generation with Multi-View ControlNet |
76 | SagaGAN: Style Applied using Gram matrix Attribution based on StarGAN v2 |
77 | PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images |
85 | Textual Attention RPN for Open-Vocabulary Object Detection |
100 | Painterly Image Harmonization via Bi-Transformation with Dynamic Kernels |
101 | Interactive Image Segmentation with Temporal Information Augmented |
102 | Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes |
103 | Prompting Diffusion Representations for Cross-Domain Semantic Segmentation |
104 | MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression |
108 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds |
111 | Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients |
113 | Text Removal In E-Commerce Images: A Comparison Of Inpainting Methods |
114 | Key-point Guided Deformable Image Manipulation Using Diffusion Model |
115 | Multi-modal Crowd Counting via Modal Emulation |
116 | Enhancing Adversarial Robustness and Combating Uncertainty Bias in Transductive Zero-Shot Learning: A Framework of Pseudo-Bidirectional Alignment |
133 | MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM |
135 | Acoustic-based 3D human pose estimation robust to human position |
136 | PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows |
137 | InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth |
140 | Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space |
142 | Recovering Global Data Distribution Locally in Federated Learning |
145 | Privacy-preserving datasets by capturing feature distributions with Conditional VAEs |
147 | MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion |
150 | AISE: Adaptive Input Sampling for Explanation of Black-box Models |
152 | "Retinex-Inspired Cooperative Game Through Multi-Level Feature Fusion for Robust, Universal Image Restoration" |
164 | Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds |
165 | Learning Object Placement via Convolution Scoring Attention |
166 | Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection |
168 | Topology-preserving Adversarial Training for Alleviating Natural Accuracy Degradation |
180 | JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation |
183 | Hierarchical Prompt Learning for Scene Graph Generation |
184 | Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization |
185 | Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion |
188 | A self-supervised and adversarial approach to hyperspectral demosaicking and RGB reconstruction in surgical imaging |
199 | A Revisit to the Decoder for Camouflaged Object Detection |
200 | Towards Generative Class Prompt Learning for Fine-grained Visual Recognition |
201 | Infrared and Visible Image Fusion Using Multi-level Adaptive Fractional Differential |
203 | S³-Match: Common-View Aligned Image Matching via Self-Supervised Keypoint Selection |
205 | From Black-box to Label-only: a Plug-and-Play Attack Network for Model Inversion |
207 | Feature Splatting for Better Novel View Synthesis with Low Overlap |
210 | BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation |
211 | Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss |
212 | InPer: Whole-Process Domain Generalization via Intervention and Perturbation |
213 | Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis |
215 | AttEntropy: On the Generalization Ability of Supervised Semantic Segmentation Transformers to New Objects in New Domains |
216 | Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning |
217 | GeoFormer: A Multi-Polygon Segmentation Transformer |
218 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance |
223 | AUPIMO: Redefining Anomaly Localization Benchmarks with High Speed and Low Tolerance |
227 | Cost-Sensitive Learning for Long-Tail Temporal Action Segmentation |
228 | Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction |
240 | SAM Helps SSL: Mask-guided Attention Bias for Self-supervised Learning |
245 | Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network |
249 | Transferable Learned Image Compression-Resistant Adversarial Perturbations |
250 | Deep Unfolding Network with Spatial-spectral Perception Enhanced for Pan-sharpening |
256 | IncreLM: Incremental 3D Line Mapping |
257 | Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery |
262 | Toward Highly Efficient Semantic-Guided Machine Vision for Low-Light Object Detection |
263 | Improving Object Detection via Local-global Contrastive Learning |
267 | Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds |
287 | A Super-pixel-based Approach to the Stable Interpretation of Neural Networks |
288 | PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition |
290 | Are Sparse Neural Networks Better Hard Sample Learners? |
295 | MxT: Mamba x Transformer for Image Inpainting |
297 | Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures |
299 | RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields |
303 | MixMask: Revisiting Masking Strategy for Siamese ConvNets |
304 | Interpretable Representation Learning from Videos using Nonlinear Priors |
305 | PEEKABOO: Hiding Parts of an Image for Unsupervised Object Localization |
307 | Discovering an Image-Adaptive Coordinate System for Photography Processing |
308 | Effective Message Hiding with Order-Preserving Mechanisms |
317 | EIANet: A Novel Domain Adaptation Approach to Maximize Class Distinction with Neural Collapse Principles |
318 | Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection |
319 | Annotation by Clicks: A Point-Supervised Contrastive Variance Method for Medical Semantic Segmentation |
323 | Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition |
328 | DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning |
329 | Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training |
330 | Towards Better Zero-Shot Anomaly Detection under Distribution Shift with CLIP |
335 | SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning |
339 | FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection |
342 | Unsupervised Domain Adaptation for Tubular Structure Segmentation Across Different Anatomical Sources |
346 | Backdoor Defense through Self-Supervised and Generative Learning |
352 | DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation |
358 | Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning |
361 | Seg-HGNN: Unsupervised and Light-Weight Image Segmentation with Hyperbolic Graph Neural Networks |
362 | Into the Fog: Evaluating Robustness of Multiple Object Tracking |
365 | Cascade Masked Generative Distillation for Dense Prediction Tasks |
369 | Benchmarking and Optimizing Federated Learning with Hardware-related Metrics |
374 | Text-Guided Mixup Towards Long-Tailed Image Categorization |
375 | A Novel Divide and Merge Approach for Improved Classification of Functional Data |
384 | Few-Shot Classification of Interactive Activities of Daily Living (InteractADL) |
388 | ACIL: Active Class Incremental Learning for Image Classification |
391 | PatchRot: Self-Supervised Training of Vision Transformers by Rotation Prediction |
392 | Label Smoothing++: Enhanced Label Regularization for Training Neural Networks |
401 | Decoupling Forgery Semantics for Generalizable Deepfake Detection |
406 | When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection |
414 | NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning |
416 | Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance |
417 | Kernel Representation for Dynamic Networks |
420 | Layout Free Scene Graph to Image Generation |
421 | Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow |
424 | RETRO: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning |
425 | GLCM-Adapter: Global-Local Content Matching for Few-shot CLIP Adaptation |
426 | Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition |
427 | Lightweight Human Pose Estimation with Enhanced Knowledge Review |
432 | Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution |
433 | Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning |
437 | Difflare: Removing Image Lens Flare with Latent Diffusion Models |
440 | Explaining Multi-modal Large Language Models by Analyzing their Vision Perception |
448 | Learning to Project for Cross-Task Knowledge Distillation |
452 | Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty |
457 | LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps |
472 | SAM-EG: Segment Anything Model with Egde Guidance framework for efficient Polyp Segmentation |
480 | Disparity Estimation Using a Quad-pixel Sensor |
482 | Unsupervised Hashing Network with Hyper Quantization Tree |
486 | DAVINCI: A Single-Stage Architecture for Constrained CAD Sketch Inference |
492 | Multimodal base distributions in conditional flow matching generative models |
493 | Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition |
499 | MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders |
500 | Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences |
505 | FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging |
508 | Semantic Image Synthesis of Anime Characters Based on Conditional Generative Adversarial Networks |
510 | ML-2SN: A Hybrid Two-Stream System for Sitting Posture Detection |
517 | Interpretable Long-term Action Quality Assessment |
524 | A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction |
528 | SOFI: Multi-Scale Deformable Transformer for Camera Calibration with Enhanced Line Queries |
532 | Input-dependent Input-Prompts for Adapting Frozen Vision Transformers |
533 | TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training |
534 | Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning |
537 | Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework |
545 | Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection |
546 | Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs |
557 | Hybrid-CSR: Coupling Explicit and Implicit Reconstruction of Cortical Surface |
563 | As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP |
566 | SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models |
568 | Beyond Static and Dynamic Quantization - Hybrid Quantization of Vision Transformers |
572 | Multi-Scope Representation Learning for Causal Relation Discovery with new Challenging Datasets |
577 | AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field |
579 | Neural Collapse Inspired Contrastive Continual Learning |
584 | ATLANTIS: A Framework for Automated Targeted Language-guided Augmentation Training for Robust Image Search |
595 | A Prototype Unit for Image De-raining using Time-Lapse Data |
597 | FADE: Few-shot/zero-shot Anomaly Detection Engine using Large Vision-Language Model |
599 | VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection |
601 | Training-Free Zero-Shot Semantic Segmentation with LLM Refinement |
606 | VEMIC: View-aware Entropy model for Multi-view Image Compression |
609 | Guidance-base Diffusion Models for Improving Photoacoustic Image Quality |
611 | STPose: 6D object pose estimation network based on sparse attention and cross-layer connection |
615 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation |
619 | Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection |
622 | The Attempt on Combining Three Talents by KD with Enhanced Boundary in Co-salient Object Detection |
627 | GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt |
630 | CPDR: Towards Highly-Efficient Salient Object Detection via Crossed Post-decoder Refinement |
637 | 3D Point Cloud Network Pruning: When Some Weights Do not Matter |
642 | Revitalizing Legacy Video Content: Deinterlacing with Bidirectional Information Propagation |
648 | 3D Blur Kernel on Gaussian Splatting |
650 | Drawing Insights: Sequential Representation Learning in Comics |
657 | G3FA: Geometry-guided GAN for Face Animation |
659 | GN-FR: Generalizable Neural Radinace Fields for Flare Removal |
663 | Unsupervised Point Cloud Registration with Self-Distillation |
667 | ICAF-4: An Integrated Framework of Category-level Articulated Object Perception and Manipulation for Embodied Intelligence |
670 | Leveraging Inductive Bias in ViT for Medical Image Diagnosis |
678 | Content and Style Aware Audio-Driven Facial Animation |
680 | May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels |
681 | On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models |
685 | Boundary Contrastive Learning for Label-Efficient Medical Image Segmentation |
686 | TransHuPR: Cross-View Fusion Transformer for Human Pose Estimation Using mmWave Radar |
689 | AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning |
692 | Spatio-Temporal Transformer with Rotary Position Embedding and Bone Priors for 3D Human Pose Estimation |
695 | Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies |
697 | Inverse Rendering of Outdoor Scenes with under Time-variant Illumination |
707 | QUD: Unsupervised Knowledge Distillation for Deep Face Recognition |
721 | Sign Stitching: A Novel Approach to Sign Language Production |
723 | $ControlEdit: A MultiModal Local Clothing Image Editing Method$ |
727 | Optimising Diffusion Models for Histopathology Image Synthesis |
729 | Reconstructing Spheres by Fitting Planes |
731 | AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance |
736 | Rectifying Shortcut Learning through Cellular Differentiation in Deep Learning Neurons |
737 | Pseudo Labelling for Enhanced Masked Auto Encoders |
738 | CosFairNet:A Parameter-Space based Approach for Bias Free Learning |
740 | Frequency Decomposition to Tap the Potential of Single Domain for Generalization |
745 | Task-Related Feature Enhancement Network for Neuronal Morphology Classification |
746 | Adapting MIMO video restoration networks to low latency constraints |
753 | Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning |
754 | Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization |
755 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition |
762 | Open-World Semi-Supervised Learning under Compound Distribution Shifts |
763 | Horospherical Learning with Smart Prototypes |
769 | Flexible Graph Convolutional Network for 3D Human Pose Estimation |
775 | SAE: Single Architecture Ensemble Neural Networks |
779 | Outlier detection by ensembling uncertainty with negative objectness |
787 | MSA$^\text{2}$Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation |
790 | FILS: Self-Supervised Video Feature Prediction In Semantic Language Space |
797 | Calibration of 2D LiDAR sensors using cylindrical target |
828 | Multi-Scale Semantic Enrichment and Dual Angular Margin Contrast for Few-Shot Class Incremental Learning |
833 | Anomaly Detection Based on Semi-Formula Driven Pre-training Dataset to Represent Subtle Difference and Anomaly Score |
853 | Budget-aware Dynamic Spatially Adaptive Inference |
854 | CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection |
857 | Enhancing Radiology Report Generation: The Impact of Locally Grounded Vision and Language Training |
859 | Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes |
863 | CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning |
865 | APTPose: Anatomy-aware Pre-Training for 3D Human Pose Estimation |
866 | A Deep Belief Network Approach to Scalable Compression of Light Field Data for Auto-Stereoscopic Displays |
878 | Learning conditionally untangled latent spaces using Fixed Point Iteration |
882 | A Multimodal Network on Handwritten Chinese Character Error Correction |
885 | Efficient Data Source Relevance Quantification for Multi-Source Neural Networks |
887 | Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models |
895 | Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs |
897 | topK dice loss for medical image segmentation |
900 | Direct-Sum Approach to Integrate Losses Via Classifier Subspace |
902 | Knowledge Distillation with Global Filters for Efficient Human Pose Estimation |
911 | A simple Color Correction Matrix for RAW Reconstruction |
913 | Examining the Threat Landscape: Foundation Models and Model Theft |
922 | UnSeGArmaNet: Unsupervised Image Segmentation using Graph Neural Networks with Convolutional ARMA Filters |
927 | GazeHELL: Gaze Estimation with Hybrid Encoders and Localised Losses with weighing |
929 | TrakAthlete4D: Multi-View On-Field Player Position Tracking in Sports |
932 | Spatiotemporal Vision Transformer for Weakly Supervised Dense Prediction of Dynamic Brain Maps |
933 | SceneSAM: Integrating 2D Labels for Weakly Supervised 3D Scene Understanding |
936 | PV-SLAM: Panoptic Visual SLAM with Loop Closure and Online Bundle Adjustment |
939 | Deep Learning for GPS-Denied SAR Image Focusing and Vehicle Trajectory Estimation |
945 | Gaussian Splatting in Mirrors: Reflection-aware Rendering via Virtual Camera Optimization |
947 | Layer-wise Learning of CNNs by Self-tuning Learning Rate and Early Stopping at Each Layer |
949 | On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods |
954 | Beyond Face Matching: A Facial Traits based Privacy Score for Synthetic Face Datasets |
957 | Putting the Segment Anything Model to the Test with 3D Knee MRI - A Comparison with State-of-the-Art Performance |
959 | SR+Codec: a Benchmark of Super-Resolution for Video Compression Bitrate Reduction |
967 | CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation |
977 | Improving Multimodal Learning with Multi-Loss Gradient Modulation |
986 | Adaptive Weighted Co-Learning for Cross-Domain Few-Shot Learning |
987 | Guided Attention for Interpretable Motion Captioning |
991 | iHAST: Integrating Hybrid Attention for Super-Resolution in Spatial Transcriptomics |
998 | MV-Match: Multi-View Matching for Domain-Adaptive Identification of Plant Nutrient Deficiencies |
1013 | Open-Vocabulary Temporal Action Localization using Multimodal Guidance |
1020 | Recovering SLAM Tracking Lost by Trifocal Pose Estimation using GPU-HC++ |