Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection


Ching-Yi Lai (National Tsinghua University), Chiou-ting Hsu (National Tsing Hua University), Chih-Chung Hsu (National Yang Ming Chiao Tung University), Chia-Wen Lin (National Tsing Hua University)
The 35th British Machine Vision Conference

Abstract

In deepfake detection, diverse compression methods employed by social media platforms pose significant challenges due to varying compression rates. These variations hinder the generalization of deepfake detectors across different compression rates, termed cross-compression-rate (CCR) scenario. While existing models demonstrate robustness in cross-dataset evaluation, they often overlook the CCR scenario, which is crucial for ensuring broader applicability in real-world applications. Therefore, we introduce a novel Contrastive Physio-inspired Multi-modalities with Language guidance (CPML) framework for robust CCR deepfake detection. Our approach co-maps remote photoplethysmography (rPPG) signals and facial landmark dynamics into a common latent feature space and then aligns with a set of class prompt-guided in language semantics (e.g., real and fake classes). Specifically, we propose the Cross-Quality Similarity Learning (CQSL) strategy to learn the similarities in the rPPG signals under the variations of visual qualities. Moreover, we utilize a pre-trained vision-language model as our text encoder and propose the Cross-Modality Consistency Learning (CMCL) to pair-wisely align the multi-modal features with the textual features of the corresponding class prompts. Our extensive experiments demonstrate that the proposed achieves superior performance on both seen and unseen manipulation types and datasets, and provide a benchmark for CCR scenarios.

Citation

@inproceedings{Lai_2024_BMVC,
author    = {Ching-Yi Lai and Chiou-ting Hsu and Chih-Chung Hsu and Chia-Wen Lin},
title     = {Prompt-guided Multi-modal contrastive learning for Cross-compression-rate Deepfake Detection},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0619.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection