Beyond Face Matching: A Facial Traits based Privacy Score for Synthetic Face Datasets


Robero Leyva (The university of Warwick), Praveen Selvaraj (University College London, University of London), Andrew Elliott (Alan Turing Institute), Dr Gregory Epiphaniou (University of Warwick), carsten maple (The university of Warwick)
The 35th British Machine Vision Conference

Abstract

Synthetic data is increasingly crucial for training machine learning models, especially in fields where real data is scarce or sensitive. This is particularly true for facial data, given growing privacy concerns and the need for rapid development in face recognition systems. However, synthetic facial data often derives from existing datasets, raising privacy issues as synthesizers may inadvertently expose real training data. Our method is motivated to address this important aspect. In this paper, we develop a model that provides a probabilistic score indicating how likely a synthetic face incorporates elements from the training dataset. We focus on facial traits — eyes, nose, mouth and their fusion —modeling training set membership as a probability. This approach allows us to assess whether a synthesizer captures training set characteristics too closely. In addition to generating whole synthetic faces, we explore the generative models' latent space by creating variations in specific facial traits, to more thoroughly assess whether the synthesizer overly relies on facial features from the training set. This method provides a deeper understanding of the synthesizer's tendency to reproduce learned characteristics. Our findings demonstrate that we can establish boundaries for determining full or partial presence of a sample in the training set, depending on specific facial traits. We also found that combining multiple facial traits in our model improves accuracy. The resulting privacy score indicates how much a synthetic dataset contains identifiable features from its training data, effectively measuring its level of compromise. In summary, our results show that by analyzing individual facial features, we can assess how well a synthetic face dataset preserves privacy, relative to the real dataset used to train its synthesizer.

Citation

@inproceedings{Leyva_2024_BMVC,
author    = {Robero Leyva and Praveen Selvaraj and Andrew Elliott and Dr Gregory Epiphaniou and carsten maple},
title     = {Beyond Face Matching: A Facial Traits based Privacy Score for Synthetic Face Datasets},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0954.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection