Poster 5 Chair: TBC |
10:00 - 11:45 | 28 | SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters |
---|---|---|---|
31 | COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation | ||
32 | Can CLIP help CLIP in learning 3D? | ||
70 | Advancing Anomaly Detection: The IDW dataset and MC algorithm | ||
104 | MMPrune4U: Regularizing Multimodal Feature Distortion in Weight Pruning for Deep Neural Network Compression | ||
111 | Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients | ||
115 | Multi-modal Crowd Counting via Modal Emulation | ||
135 | Acoustic-based 3D human pose estimation robust to human position | ||
137 | InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth | ||
216 | Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning | ||
218 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance | ||
228 | Learning Scene-Goal-Aware Motion Representation for Trajectory Prediction | ||
257 | Motion Tracking with Rotated Bounding Boxes on Overhead Fisheye Imagery | ||
308 | Effective Message Hiding with Order-Preserving Mechanisms | ||
329 | Uni-Mlip: Unified Self-Supervision for Medical Vision Language Pre-training | ||
362 | Into the Fog: Evaluating Robustness of Multiple Object Tracking | ||
369 | Benchmarking and Optimizing Federated Learning with Hardware-related Metrics | ||
432 | Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution | ||
482 | Unsupervised Hashing Network with Hyper Quantization Tree | ||
534 | Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning | ||
546 | Balancing Calibration and Performance: Stochastic Depth in Segmentation BNNs | ||
563 | As Firm As Their Foundations: Creating Transferable Adversarial Examples Across Downstream Tasks with CLIP | ||
566 | SuperLoRA: Parameter-Efficient Unified Adaptation of Large Foundation Models | ||
579 | Neural Collapse Inspired Contrastive Continual Learning | ||
611 | STPose: 6D object pose estimation network based on sparse attention and cross-layer connection | ||
619 | Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection | ||
627 | GLPI: A Global Layered Prompt Integration approach for Explicit Visual Prompt | ||
659 | GN-FR: Generalizable Neural Radinace Fields for Flare Removal | ||
678 | Content and Style Aware Audio-Driven Facial Animation | ||
680 | May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels | ||
681 | On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models | ||
689 | AggSS: An Aggregated Self-Supervised Approach for Class Incremental Learning | ||
695 | Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies | ||
721 | Sign Stitching: A Novel Approach to Sign Language Production | ||
731 | AutoDOM: Automated Dimension Overlay for Enhanced Measurement-Guidance | ||
753 | Box for Mask and Mask for Box: weak losses for multi-task partially supervised learning | ||
882 | A Multimodal Network on Handwritten Chinese Character Error Correction | ||
885 | Efficient Data Source Relevance Quantification for Multi-Source Neural Networks | ||
887 | Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models | ||
103 | Prompting Diffusion Representations for Cross-Domain Semantic Segmentation | ||
200 | Towards Generative Class Prompt Learning for Fine-grained Visual Recognition | ||
406 | When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection | ||
615 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation | ||
754 | Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization | ||
Oral 4 Chair: TBC |
11:45 | 103 | Prompting Diffusion Representations for Cross-Domain Semantic Segmentation |
12:00 | 200 | Towards Generative Class Prompt Learning for Fine-grained Visual Recognition | |
12:15 | 406 | When Text and Images Don't Mix: Bias-Correcting Language-Image Similarity Scores for Anomaly Detection | |
12:30 | 615 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation | |
12:45 | 754 | Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization | |
Oral 5 Chair: TBC |
14:00 | 14 | Efficiency-preserving Scene-adaptive Object Detection |
14:15 | 114 | Key-point Guided Deformable Image Manipulation Using Diffusion Model | |
14:30 | 416 | Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance | |
14:45 | 517 | Interpretable Long-term Action Quality Assessment | |
15:00 | 545 | Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection | |
Poster 6 Chair: TBC |
15:15 - 17:00 | 14 | Efficiency-preserving Scene-adaptive Object Detection |
114 | Key-point Guided Deformable Image Manipulation Using Diffusion Model | ||
416 | Taming the Tail: Leveraging Asymmetric Loss and Padé Approximation to Overcome Long-Tailed Class Imbalance | ||
517 | Interpretable Long-term Action Quality Assessment | ||
545 | Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection | ||
23 | Alignment-aware Patch-level Routing for Dynamic Video Frame Interpolation | ||
34 | TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation | ||
47 | Group Activity Recognition via Spatio-Temporal Reasoning of Key Instances | ||
74 | ControlDreamer: Stylized 3D Generation with Multi-View ControlNet | ||
102 | Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes | ||
108 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds | ||
140 | Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space | ||
147 | MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion | ||
180 | JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation | ||
184 | Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization | ||
211 | Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss | ||
245 | Enhancing 3D Hand Pose Estimation via Dense Ordinal Regression Network | ||
288 | PawFACS: Leveraging Semi-Supervised Learning for Pet Facial Action Recognition | ||
307 | Discovering an Image-Adaptive Coordinate System for Photography Processing | ||
318 | Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection | ||
323 | Complete the Feature Space: Diffusion-Based Fictional ID Generation for Face Recognition | ||
335 | SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning | ||
352 | DiffusedWrinkles: A Diffusion-Based Model for Data-Driven Garment Animation | ||
384 | Few-Shot Classification of Interactive Activities of Daily Living (InteractADL) | ||
388 | ACIL: Active Class Incremental Learning for Image Classification | ||
414 | NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning | ||
421 | Rethinking Domain Adaptive Optic Disc and Cup Segmentation in Fundus Image through Dynamic Diffusion Flow | ||
426 | Unified Compositional Query Machine with Multimodal Consistency for Video-based Human Activity Recognition | ||
433 | Separated and Independent Contrastive Learning on Labeled and Unlabeled Samples: Boosting Performance on Long-tail Semi-supervised Learning | ||
448 | Learning to Project for Cross-Task Knowledge Distillation | ||
493 | Spike-SLR: An Energy-efficient Parallel Spiking Transformer for Event-based Sign Language Recognition | ||
499 | MotionMAE: Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders | ||
505 | FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging | ||
524 | A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction | ||
537 | Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework | ||
599 | VLAVAD: Vision-Language Models Assisted Unsupervised Video Anomaly Detection | ||
657 | G3FA: Geometry-guided GAN for Face Animation | ||
723 | $ControlEdit: A MultiModal Local Clothing Image Editing Method$ | ||
727 | Optimising Diffusion Models for Histopathology Image Synthesis | ||
746 | Adapting MIMO video restoration networks to low latency constraints | ||
755 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | ||
790 | FILS: Self-Supervised Video Feature Prediction In Semantic Language Space | ||
900 | Direct-Sum Approach to Integrate Losses Via Classifier Subspace | ||
911 | A simple Color Correction Matrix for RAW Reconstruction |