Laura Sevilla
University of EdinburghFrontiers of Video Understanding
Abstract: Video Understanding is a fundamental skill of intelligent systems. From autonomous robots to virtual assistants, understanding the world in motion is necessary to be able to move and interact with it. The last few years have seen amazing improvements in Video Understanding research. Still there is a remarkable gap between the almost uncanny performance of models in other modalities such as language and still images, and the performance of video. In this talk I will discuss what I believe are the current barriers for video, including efficiency, a tricky relationship with language and finding the right tasks. For each of these topics I will discuss both my recent work on them, as well as what I believe are interesting directions that I hope can be inspiring for the community.
Bio: Laura Sevilla is a Reader (Associate Professor) at the University of Edinburgh where she funded and leads the Video Understanding Lab. Her work over the years has advanced the state-of-the-art in many aspects of video understanding, from optical flow, object tracking, video object segmentation, low-shot action classification, action detection, affordance estimation, video captioning and more. She has been the recipient of a Google Faculty Award in 2020 and Google Scholar Award in 2022, as well as other funding from companies such as Meta. She has served as Program Chair for BMVC in 2021 and as Area Chair for ECCV, ICCV, CVPR and AAAI. She's also done work on outreach through the series of Computer Vision for Global Challenges. She's also been elected as an ELLIS Scholar.
Mubarak Shah
University of Central FloridaPrivacy Preservation and Bias Mitigation in Human Action Recognition
Abstract: Advances in action recognition have enabled a wide range of real-world applications, e.g. elderly person monitoring systems, autonomous vehicles, sports analysis. As these techniques are being used in the real world two important issues have emerged: privacy and bias. Most of these video understanding applications involve extensive computation, for which a user needs to share the video data to the cloud computation server, where the user also ends up sharing the private visual information like gender, skin color, clothing, background objects etc. Therefore, there is a pressing need for solutions to privacy preserving action recognition. Beyond privacy protection, bias in video understanding can lead to unfair and incorrect decision making. Action recognition models may predict specific actions based on gender stereotypes, such as associating a perceived female subject with hands near her face as applying makeup or brushing hair, even with nothing in hand, or they may suffer from background bias (i.e., inferring actions based on background cues) and foreground bias (i.e., relying on subject appearance). In this talk, I will present our recent work on Privacy Preservation and Bias Mitigation in human action recognition.
Bio: Dr. Mubarak Shah, the UCF Trustee Chair Professor, is the founding director of Center for Research in Computer Visions at University of Central Florida (UCF). Dr. Shah is a fellow of ACM, IEEE, AAAS, NAI, IAPR, AAIA and SPIE. He has published extensively on topics related to human activity and action recognition, visual tracking, geo localization, visual crowd analysis, object detection and categorization, shape from shading, etc. He has served as ACM and IEEE Distinguished Visitor Program speaker. He is a recipient of 2022 PAMI Mark Everingham Prize for pioneering human action recognition datasets; 2019 ACM SIGMM Technical Achievement award; 2020 ACM SIGMM Test of Time Honorable Mention Award for his paper “Visual attention detection in video sequences using spatiotemporal cues”; 2020 International Conference on Pattern Recognition (ICPR) Best Scientific Paper Award; an honorable mention for the ICCV 2005 Where Am I? Challenge Problem; 2013 NGA Best Research Poster Presentation; 2nd place in Grand Challenge at the ACM Multimedia 2013 conference; and runner up for the best paper award in ACM Multimedia Conference in 2005 and 2010. At UCF he has received Pegasus Professor Award; University Distinguished Research Award; Faculty Excellence in Mentoring Doctoral Students; Faculty Excellence in Mentoring Postdoctoral Scholars, Scholarship of Teaching and Learning award; Teaching Incentive Program award; and Research Incentive Award.
Margarita Chli
University of Cyprus and ETH ZurichVision-based robotic perception: are we there yet?
As vision plays a key role in how we interpret a situation, developing vision-based perception for robots promises to be a big step towards robotic navigation and intelligence, with a tremendous impact on automating robot navigation. This talk will discuss our recent progress in this area at the Vision for Robotics Lab of the University of Cyprus and ETH Zurich (http://www.v4rl.com), and some of the biggest challenges we are faced with.
Bio: Margarita Chli is a Professor of Robotic Vision and the director of the Vision for Robotics Lab, at the University of Cyprus and ETH Zurich. Her work has contributed to the first vision-based autonomous flight of a small drone and the first demonstration of collaborative monocular SLAM for a small swarm of drones. Margarita has given invited keynotes at the World Economic Forum in Davos, TEDx, and ICRA, and she was featured in Robohub's 2016 list of "25 women in Robotics you need to know about". In 2023 she won the ERC Consolidator Grant, one of the most prestigious grants in Europe for blue-sky research, to grow her team at the University of Cyprus to research advanced robotic perception.
Federico Tombari
GoogleThe 3D Revolution: Neural Representations and Diffusion Models to Understand and Synthesise the 3D World
Abstract: 3D Computer Vision has recently witnessed a surge of interest from the ML and CV research community, due to the progress that recently introduced concepts such as neural representations, foundational models, and diffusion models enabled for many traditional 3D Computer Vision tasks. In this talk, we will focus in particular on the capability of understanding and synthesising 3D scenes and objects, which is a key component of applications in the space of Augmented/Mixed Reality and Robotics. We will look at three tasks in 3D Computer Vision that are fundamental components for these applications while being highly influenced by the aforementioned concepts: novel view synthesis, 3D semantic segmentation and 3D asset generation. For each of these three tasks, we will first understand some important practical limitations of current approaches. We will then walk through some solutions I recently explored with my team and designed to overcome such limitations, which include robust novel view synthesis, open set 3D scene segmentation and realistic 3D asset generation.
Bio: Federico Tombari is Senior Staff Research Scientist and Manager at Google where he leads an applied research team in computer vision and machine learning across North America and Europe. He is also a Lecturer (PrivatDozent) at the Technical University of Munich (TUM). He has 250+ peer-reviewed publications in CV/ML and applications to robotics, autonomous driving, healthcare and augmented reality. He got his PhD from the University of Bologna and his Venia Legendi (Habilitation) from Technical University of Munich (TUM). In 2018-19 he was co-founder and managing director of a startup on 3D perception for AR and robotics, then acquired by Google. He regularly serves as Area Chair and Associate Editor for international conferences and journals (IJRR, RA-L, IROS20/21/22, ICRA20/22, 3DV19/20/21/22/24, ECCV22/24, CVPR23/24, NeurIPS23 among others). He was the recipient of two Google Faculty Research Awards, one Amazon Research Award, 5 Outstanding Reviewer Awards (3x CVPR, ICCV21, NeuriIps21), among others. He has been a research partner of private and academic institutions including Google, Toyota, BMW, Audi, Amazon, Univ. Stanford, ETH and MIT.
Salauddin Sohag
Technovative Solutions LTD (TVS)Industrial Keynote
Abstract: In this keynote, Sohag will chart the company’s journey, recounting major innovations, achievements, and the challenges it has navigated to reach its current position. Sohag will also connect Technovative Solutions' advancements to emerging industry trends, illustrating how the company is strategically positioned to address current and future market demands through cutting-edge solutions and agile methodologies.
Bio: Salauddin Sohag is an experienced product leader with a passion for innovation and a proven track record of success. As the Head of Product at Technovative Solutions, he leads a talented team in bringing cutting-edge products to market, from conception and development to launch and beyond. With over 10 years of experience in product development, Sohag is an expert in Agile methodologies, including SAFe and Design Thinking. He leverages his deep understanding of these principles, along with Lean methodologies, to drive impactful business outcomes and deliver exceptional customer experiences. Sohag's career is marked by significant achievements, including leading e-commerce integration solutions at Vertex Inc. and transforming customer journeys with innovative mobile solutions at Lloyds Banking Group. He is also a strong advocate for continuous improvement and cross-functional collaboration, driving the company’s mission to innovate and excel in the realm of product management.
Md. Mostafa Kamal Sarker
Technovative Solutions LTD (TVS)Doctoral Consortium Speaker
Bio: Dr Sarker is the Lead AI Research Scientist at Technovative Solutions LTD (TVS) and a Visiting Fellow at the University of Oxford. He's an expert in artificial intelligence, computer vision, and deep learning. His research has significantly impacted clinical AI, biomedical image analysis, and digital healthcare, evident in his 40+ peer-reviewed publications. At BMVC2024, he'll share his valuable insights and guide aspiring researchers on transitioning from academia to industry and discuss the exciting opportunities this path offers.