Schedule Mon Tue Wed Thu
Poster 5
Chair: TBC
10:00 - 11:45 28 SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
31 COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation
32 Can CLIP help CLIP in learning 3D?
70 Advancing Anomaly Detection: The IDW dataset and MC algorithm
104 MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression
111 Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients
115 Multi-modal Crowd Counting via Modal Emulation
135 Acoustic-based 3D human pose estimation robust to human position
137 InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth
216 Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning
218 RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance
228 Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction
257 Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery
308 Effective Message Hiding with Order-Preserving Mechanisms
329 Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training
362 Into the Fog: Evaluating Robustness of Multiple Object Tracking
369 Benchmarking and Optimizing Federated Learning with Hardware-related Metrics
432 Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution
482 Unsupervised Hashing Network with Hyper Quantization Tree
534 Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning
546 Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs
563 As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP
566 SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models
579 Neural Collapse Inspired Contrastive Continual Learning
611 STPose: 6D object pose estimation network based on sparse attention and cross-layer connection
619 Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection
627 GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt
659 GN-FR: Generalizable Neural Radinace Fields for Flare Removal
678 Content and Style Aware Audio-Driven Facial Animation
680 May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels
681 On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models
689 AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning
695 Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies
721 Sign Stitching: A Novel Approach to Sign Language Production
731 AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance
753 Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning
882 A Multimodal Network on Handwritten Chinese Character Error Correction
885 Efficient Data Source Relevance Quantification for Multi-Source Neural Networks
887 Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
103 Prompting Diffusion Representations for Cross-Domain Semantic Segmentation
200 Towards Generative Class Prompt Learning for Fine-grained Visual Recognition
406 When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
615 Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
754 Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Oral 4
Chair: TBC
11:45 103 Prompting Diffusion Representations for Cross-Domain Semantic Segmentation
12:00 200 Towards Generative Class Prompt Learning for Fine-grained Visual Recognition
12:15 406 When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection
12:30 615 Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
12:45 754 Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Oral 5
Chair: TBC
14:00 14 Efficiency-preserving Scene-adaptive Object Detection
14:15 114 Key-point Guided Deformable Image Manipulation Using Diffusion Model
14:30 416 Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance
14:45 517 Interpretable Long-term Action Quality Assessment
15:00 545 Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection
Poster 6
Chair: TBC
15:15 - 17:00 14 Efficiency-preserving Scene-adaptive Object Detection
114 Key-point Guided Deformable Image Manipulation Using Diffusion Model
416 Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance
517 Interpretable Long-term Action Quality Assessment
545 Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection
23 Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation
34 TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation
47 Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances
74 ControlDreamer: Stylized 3D Generation with Multi-View ControlNet
102 Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes
108 MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds
140 Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space
147 MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion
180 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation
184 Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
211 Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss
245 Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network
288 PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition
307 Discovering an Image-Adaptive Coordinate System for Photography Processing
318 Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
323 Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition
335 SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning
352 DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation
384 Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)
388 ACIL: Active Class Incremental Learning for Image Classification
414 NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning
421 Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow
426 Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition
433 Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning
448 Learning to Project for Cross-Task Knowledge Distillation
493 Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition
499 MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
505 FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging
524 A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction
537 Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework
599 VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection
657 G3FA: Geometry-guided GAN for Face Animation
723 $ControlEdit: A MultiModal Local Clothing Image Editing Method$
727 Optimising Diffusion Models for Histopathology Image Synthesis
746 Adapting MIMO video restoration networks to low latency constraints
755 PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
790 FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
900 Direct-Sum Approach to Integrate Losses Via Classifier Subspace
911 A simple Color Correction Matrix for RAW Reconstruction