Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)


Zane Durante (Stanford University), Robathan Harries (Stanford University), Edward Vendrow (Massachusetts Institute of Technology), Zelun Luo (Stanford University), Yuta Kyuragi (Panasonic R&D Company of America), Kazuki Kozuka (Panasonic Corporation), Li Fei-Fei (Stanford University), Ehsan Adeli (Stanford University)
The 35th British Machine Vision Conference

Abstract

Understanding Activities of Daily Living (ADLs) is a crucial step for different applications including assistive robots, smart homes, and healthcare. However, to date, few benchmarks and methods have focused on complex ADLs, especially those involving multi-person interactions in home environments. In this paper, we propose a new dataset and benchmark, InteractADL, for understanding complex ADLs that involve interaction between humans (and objects). Furthermore, complex ADLs occurring in home environments comprise a challenging \textit{long-tailed distribution} due to the rarity of multi-person interactions, and pose \textit{fine-grained} visual recognition tasks due to the presence of semantically and visually similar classes. To address these issues, we propose a novel method for fine-grained few-shot video classification called \textit{Name Tuning} that enables greater semantic separability by learning optimal class name vectors. We show that Name Tuning can be combined with existing prompt tuning strategies to learn the entire input text (rather than only learning the prompt or class names) and demonstrate improved performance for few-shot classification on InteractADL and 4 other fine-grained visual classification benchmarks. For transparency and reproducibility, we release our code here: https://github.com/zanedurante/vlm_benchmark.

Citation

@inproceedings{Durante_2024_BMVC,
author    = {Zane Durante and Robathan Harries and Edward Vendrow and Zelun Luo and Yuta Kyuragi and Kazuki Kozuka and Li Fei-Fei and Ehsan Adeli},
title     = {Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)},
booktitle = {35th British Machine Vision Conference 2024, {BMVC} 2024, Glasgow, UK, November 25-28, 2024},
publisher = {BMVA},
year      = {2024},
url       = {https://papers.bmvc2024.org/0384.pdf}
}


Copyright © 2024 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection