New AI model expands possibilities of virtual try-ons

10/24/2025 Jeni Bushman

Illinois Grainger researchers Yuxiong Wang, assistant professor of computer science and Junkun Chen, a CS Ph.D. student have harnessed generative AI to advance virtual try-on technology for e-commerce, enabling longer, higher-resolution and more customizable try-on videos that bring digital shopping closer to real-life experiences.

Written by Jeni Bushman

Illinois Grainger researchers have harnessed generative AI to advance virtual try-on technology for e-commerce, enabling longer, higher-resolution and more customizable try-on videos that bring digital shopping closer to real-life experiences.

Despite the convenience of online shopping, one hurdle persists: the gamble of buying a garment sight unseen. This problem has led to the field of virtual try-ons, in which users can see what a piece of clothing might look like on their own body.

A new AI model from The Grainger College of Engineering at the University of Illinois Urbana-Champaign seeks to transform this virtual try-on process. Researchers from the lab of Yuxiong Wang, assistant professor of computer science at The Siebel School of Computing and Data Science, have used generative AI to create arbitrarily long virtual try-on videos with high resolution. Their findings, in collaboration with e-commerce startup SpreeAI, introduce user-controlled capabilities designed to advance the field of virtual try-ons.

Images of people trying on outfits.

Click on image to animate

“We are harnessing the latest advances in generative AI, a technology that is already transforming image and video creation, to reimagine what is possible in e-commerce,” Wang said. “Our goal is to push the boundaries even further and explore how this technology can truly empower the future of virtual try-ons.”

Yuxiong Wang
Yuxiong Wang

Existing try-on models often rely on carefully calibrated images, struggle to handle varied input conditions, and tend to produce low-resolution, low-quality results. To tackle these challenges, the Illinois researchers created Dress&Dance, a video diffusion framework that generates short, high-quality video clips of users modeling self-selected items of clothing.

Built on a novel attention-based design for multi-modal conditioning, Dress&Dance seamlessly combines information from text, images and motion to produce more realistic and flexible video results. With one photo each of the user and their desired garment, along with a short video of the user’s desired motion, the AI model generates a five-second video of the user dancing or posing in their chosen outfit.

Unlike previous initiatives that require a static photo with strict specifications, the Illinois researchers’ model can use photos of clothing taken in real-life environments: hanging on a rack, folded on a bed, or even worn by another person. The resulting videos achieve the highest resolution and frame rate among existing methods, establishing a new benchmark for visual quality and realism.

Building on their initial success, Wang’s lab advanced their approach by generating longer try-on videos that maintained high resolution. The result was an upgraded model called Virtual Fitting Room.

Images of people trying on outfits.

“Five seconds is not enough to capture the full 360-degree details of a garment,” Wang said. “Our latest model is designed to generate longer, more complete videos that bring the virtual try-on experience closer to reality.”

By combining an autoregressive approach with a global reference video used as an anchor, the researchers generated each segment step by step, building on information from previous frames to create a seamless final video. Each new segment was aligned with the anchor video to ensure a consistent and realistic appearance. Virtual Fitting Room is the first to produce arbitrarily long virtual try-on videos with user-controlled capabilities, such as the ability to layer multiple items of clothing.

Junkun Chen
Junkun Chen

The new and improved model serves three key audiences: consumers wanting to try on clothes without visiting a physical store; retailers wanting to reduce their rate of returns; and fellow academics seeking to augment the existing knowledge scope of virtual try-on modeling.

Going forward, the Illinois researchers are working to accelerate the model’s speed to better match user preferences.

“Generating long videos segment by segment can be slow, and most users would like to instantly see how they will look in the desired garment,” said Junkun Chen, a CS Ph.D. student and lead author of the papers. “They do not want to wait seconds or minutes, so improving the model’s efficiency has been central to integrating it into virtual try-on apps.”

The paper, “Virtual fitting room: Generating arbitrarily long videos of virtual try-on from a single image,” will be published at the Annual Conference on Neural Information Processing Systems (NeurIPS), a leading venue for AI research. It was previously featured as an invited talk at the virtual try-on workshop of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 

This work is supported by a gift from SpreeAI.


Illinois Grainger Engineering Affiliations  

Yuxiong Wang is an Illinois Grainger Engineering assistant professor of computer science in the Siebel School of Computing and Data Science.


Share this story

This story was published October 24, 2025.