Svetlana Lazebnik
Talk: Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
I will present my group's recent work on Robots Imitating Generated Videos (RIGVid), a system that enables robots to perform complex manipulation tasks—such as pouring, wiping, and mixing—purely by imitating AI-generated videos, without requiring any physical demonstrations or robot-specific training. Given a language command and an initial scene image, a video diffusion model generates potential demonstration videos, and a vision-language model (VLM) automatically filters out results that do not follow the command. A 6D pose tracker then extracts object trajectories from the video, and the trajectories are retargeted to the robot in an embodiment-agnostic fashion. Through extensive real-world evaluations, we show that filtered generated videos are as effective as real demonstrations, and that performance improves with generation quality. Our findings suggest that videos produced by a state-of-the-art off-the-shelf model can offer an effective source of supervision for robotic manipulation.
BIO:
Svetlana Lazebnik received a Ph.D. in Computer Science at the University of Illinois in 2006. After serving as an assistant professor at the University of North Carolina at Chapel Hill from 2007 to 2011, she returned as faculty to the University of Illinois, where she is currently a Full Professor in the Siebel School of Computing and Data Science. Her notable awards include the NSF CAREER Award (2008), Microsoft Research Faculty Fellow (2009), Sloan Research Fellow (2013), and IEEE Fellow (2021). Her CVPR 2006 paper on Spatial Pyramid Matching received the 2016 Longuet-Higgins Prize for a paper with a significant impact on computer vision. She served as Program Chair for ECCV 2012, ICCV 2019, and CVPR 2023. She is currently serving as an Editor in Chief of the International Journal of Computer Vision. Her main research themes include scene understanding, joint modeling of images and language, and applications of image generation.