Learning conditionally untangled latent spaces using Fixed Point Iteration


Victor Enescu (LIP6), Hichem Sahbi (Sorbonne University)
The 35th British Machine Vision Conference

Abstract

Normalizing flows (NFs) are powerful generative models that map arbitrary complex (ambient) distributions to simple (latent) ones such as the monomodal gaussian. Despite their ability in modeling and sampling highly nonlinear manifolds, NFs are less effective in assigning labels to the generated data. This stems from the insufficient expressivity of monomodal gaussians, and also the difficulty in learning multimodal distributions in the latent spaces. \\ In this paper, we devise a multimodal NF-based approach suitable both for image generation and classification. The particularity of our method resides in its ability to design multimodal gaussian distributions as a part of NF training using an objective function that mixes a likelihood term and a Kullback-Leibler Divergence (KLD) criterion. The parameters of the trained gaussians (namely means and covariance matrices) are obtained as an interpretable fixed-point solution of this objective function. Besides, our proposed method avoids the overwhelming and sensitive process of tuning the learning rates as required by gradient descent. \textcolor{black}{Extensive experiments conducted on different datasets, including CIFAR10, CIFAR100 and ImageNet, show competitive performances of our method against different baselines as well as the related work.

Citation

@inproceedings{Enescu_2024_BMVC,
author    = {Victor Enescu and Hichem Sahbi},
title     = {Learning conditionally untangled latent spaces using Fixed Point Iteration},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0878.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection