2/3/2025 Bruce Adams
CS PhD student Yunze Man has received a 2025-2026 NVIDIA Graduate Fellowship. His research aims to push embodied AI toward more open and versatile real-world applications by enhancing LLMs abilities to perceive and interpret visual inputs.
Written by Bruce Adams
“A key challenge in embodied reasoning or robotics is generalizability — how quickly an AI system can adapt to new, unseen environments without extensive fine-tuning. This is where foundation models truly shine, thanks to their robust generalization capabilities. My vision is to expand and advance multimodal foundation models so they can conduct world modeling, guide decision-making, and interact more flexibly with both humans and their surroundings.” This is how CS PhD student Yunze Man describes the research that garnered him a 2025-2026 NVIDIA Graduate Fellowship.
Man continues, “One of my recent projects, Lexicon3D, studies new perspectives on leveraging visual foundation models and highlights the need for more flexible encoding strategies in future vision-language models and scene understanding tasks. Several projects in our lab continue in this direction, aiming to push embodied AI toward more open and versatile real-world applications.”
Man is advised by CS professors Liangyan Gui and Yuxiong Wang. The NVIDIA fellowship states that “awardees will participate in a summer internship preceding the fellowship year. Their work puts them at the forefront of accelerated computing — tackling projects in autonomous systems, computer architecture, computer graphics, deep learning, programming systems, robotics, and security.” Ten PhD students will receive fellowships for the 2025-2026 academic year, which provide up to $60,000 per student.
Man notes that “a lot of the recent breakthroughs in AI are driven by the remarkable progress in large language models (sometimes people refer to it as the “GPT moment.”) However, even as language-based AI continues to advance, many real-world tasks still hinge on accurate visual perception and understanding. Challenges like 3D scene comprehension, long video analysis, and robotic interactions remain extremely difficult, even for the most capable models today.
That’s why my work emphasizes a “vision-centric” perspective, focusing on enhancing machine learning models’ abilities to perceive and interpret visual inputs. By integrating and aligning these visual insights with language and other modalities, we move closer to truly versatile intelligent systems capable of tackling complex, real-world tasks with a richer, more holistic understanding. I aim to develop open-world systems that can (a) reliably detect and interpret objects in dynamic indoor and outdoor scenes and (b) construct an internal world model capable of predicting and generating the outcomes of its actions. This approach lays a strong foundation for more advanced tasks in embodied understanding and interactive AI.”
Man has been an intern on the NVIDIA Research team since May 2024. “My internship at NVIDIA has given me a chance to experience how industry tackles research challenges,” he says. “Getting to work side by side with many talented and passionate researchers showed me a whole new perspective on tackling problems, and I have learned a lot from them. One major lesson I took away (and I'm still learning) is how to balance technical novelty with real-world impact. This experience demonstrates the distinct goals and mindsets that drive research in industry compared to academia, and it influences how I view my own work going forward, both in and out of academia.”
Looking to the future, he says, “I will continue with my research along the direction of vision-centric models for multimodal AI agents. The field’s rapid evolution makes it very difficult to predict the exact future projects we will be doing. However, I think some of my future efforts will be in unlocking machine learning models' new capabilities for handling long video sequences and long-trajectory planning. Ultimately, I hope these research endeavors will not only push the frontiers of academia but also create tangible benefits for industry and society.”
Grainger Engineering Affiliations
Liangyan Gui is an Illinois Grainger Engineering professor of computer science.
Yuxiong Wang is an Illinois Grainger Engineering professor of computer science.