Adapting MIMO video restoration networks to low latency constraints


Valéry Dewil (Ecole Normale Superieure), Zhe Zheng (Ecole Normale Superieure), Arnaud Barral (Ecole Normale Superieure), Lara Raad (Universidad de la Republica), Nao Nicolas (Thales Group), Ioannis Cassagne (Thales Group), Jean-michel Morel (City University of Hong Kong), Gabriele Facciolo (Ecole Normale Superieure Paris-Saclay), Bruno Galerne (Universite d'Orleans), Pablo Arias (Universitat Pompeu Fabra)
The 35th British Machine Vision Conference

Abstract

MIMO (multiple input, multiple output) approaches are a recent trend in neural network architectures for video restoration problems, where each network evaluation produces multiple output frames. The video is split into non-overlapping stacks of frames that are processed independently, resulting in a very appealing trade-off between output quality and computational cost. In this work we focus on the low-latency setting by limiting the number of available future frames. We find that MIMO architectures suffer from problems that have received little attention so far, namely (1) the performance drops significantly due to the reduced temporal receptive field, particularly for frames at the boundaries of the stack, (2) there are strong temporal discontinuities at stack transitions which induce a step-wise motion artifact. We propose two simple solutions to alleviate these problems: recurrence across MIMO stacks to boost the output quality by implicitly increasing the temporal receptive field, and overlapping of the output stacks to smooth the temporal discontinuity at stack transitions. These modifications can be applied to any MIMO architecture. We test them on three state-of-the-art video denoising networks with different computational cost. The proposed contributions result in a new state-of-the-art for low-latency networks, both in terms of reconstruction error and temporal consistency. As an additional contribution, we introduce a new benchmark consisting of drone footage that highlights temporal consistency issues that are not apparent in the standard benchmarks.

Citation

@inproceedings{Dewil_2024_BMVC,
author    = {Valéry Dewil and Zhe Zheng and Arnaud Barral and Lara Raad and Nao Nicolas and Ioannis Cassagne and Jean-michel Morel and Gabriele Facciolo and Bruno Galerne and Pablo Arias},
title     = {Adapting MIMO video restoration networks to low latency constraints},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0746.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection