Privacy-preserving datasets by capturing feature distributions with Conditional VAEs


Francesco Di Salvo (University of Bamberg), David Tafler (University of Bamberg), Sebastian Doerrich (University of Bamberg), Christian Ledig (University of Bamberg)
The 35th British Machine Vision Conference

Abstract

Large and well-annotated datasets are essential for advancing deep learning applications, however often costly or impossible to obtain by a single entity. In many areas, including the medical domain, approaches relying on data sharing have become critical to address those challenges. While effective in increasing dataset size and diversity, data sharing raises significant privacy concerns. Commonly employed anonymization methods based on the $k$-anonymity paradigm often fail to preserve data diversity, affecting model robustness. This work introduces a novel approach using Conditional Variational Autoencoders ($\texttt{CVAE}$s) trained on feature vectors extracted from large pre-trained vision foundation models. Foundation models effectively detect and represent complex patterns across diverse domains, allowing the $\texttt{CVAE}$ to faithfully capture the embedding space of a given data distribution to generate (sample) a diverse, privacy-respecting, and potentially unbounded set of synthetic feature vectors. Our method notably outperforms traditional approaches in both medical and natural image domains, exhibiting greater dataset diversity and higher robustness against perturbations while preserving sample privacy. These results underscore the potential of generative models to significantly impact deep learning applications in data-scarce and privacy-sensitive environments. The source code is available at github.com/francescodisalvo05/cvae-anonymization .

Citation

@inproceedings{Salvo_2024_BMVC,
author    = {Francesco Di Salvo and David Tafler and Sebastian Doerrich and Christian Ledig},
title     = {Privacy-preserving datasets by capturing feature distributions with Conditional VAEs},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0145.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection