7/2/2025 Bruce Adams
CS professor Yongjoo Park is a recipient of an NSF CAREER grant for the Kishu: Checkpointing Data Science with Nonintrusive State Manager project. Kishu makes it easier to undo existing data science notebook frameworks and protect data science systems.
Written by Bruce Adams
Contemporary data science systems are fragile. Data scientists use dozens of libraries at a time. A single bug in any of them can destroy hours or days of computation. This is well known, but no principled mechanisms have been created to address it. Put simply, it’s not so easy, or nearly impossible, to restore past states in existing data science frameworks.
On June 6, 2025, NSF awarded a CAREER grant to computer science (CS) professor Yongjoo Park from The Grainger College of Engineering Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. This five year grant supports Park’s Kishu: Checkpointing Data Science with Nonintrusive State Manager project.
As Park notes, the award arrived at a fortuitous time; “I was thrilled when a program officer from the NSF first reached out to let me know that my project was under positive consideration. I vividly remember sharing the news with my wife, holding up my phone with the email still open, feeling genuinely elated. Interestingly, all the major grants I’ve received have coincided with the arrival of a new baby. Several years ago, I was finalizing a proposal in a Carle hospital room in Urbana while our newborn alternated between sleeping and crying. That proposal resulted in an award. Now, this CAREER award comes just as we’re preparing to welcome our second child this September.”
Park notes that his PhD students, Billy Li and Supawit Chockchowwat contributed much to the foundations of the Kishu project. The Kishu team presented the project at the developer's conference, PyCon US 2025, in Pittsburgh, PA, May 16, 2025. PyCon US attracts a unique audience of Python users and community members, ranging from beginners just learning the language to leading developers in the field, to community organizers, and contributors who guide the development of the language itself. He says that “the talk and demo were well-received at PyCon. Some main themes discussed during the Q&A session were that
(1) Kishu's fast undos enabled the 'quick and dirty prototyping' functionality that many notebook users were looking for, and
(2) KishuBoard's impressive-looking git-like commit graph for object states helped users visualize their journey in a data science workflow to better plan out next steps.”
"I expectedly received interest from attendees who use notebooks in their daily work---data scientists, quant developers, and researchers working in natural sciences, who were keen to try Kishu out themselves, “he notes and adds, “I received feedback from CS educators in both high schools and universities, who were thinking about the possibility of adopting Kishu into classrooms to ease the learning curve of data science libraries for students. As a pleasant surprise, a member of the Python core developer team working on checkpointing in Python expressed interest in the more technical side of Kishu, such as our handling of time-traveling correctness, and mentioned our work on their Bilibili channel (a Chinese social media platform)."
Park's research team met a few conference attendees who were working on notebook systems of their own. Possible collaborations with them are in the works.
Checkpointing data science systems has taken a while to conceive and implement. Park describes the challenge:
“When I look at the history of database systems and data science research, they have mostly evolved as two separate fields, each focusing on different types of applications. Database systems have traditionally emphasized reliability and scalability, leveraging SQL and features like checkpointing. In contrast, data science frameworks have focused on supporting a diverse set of analytical tasks, including regression, visualization, and, more recently, machine learning. While there have been efforts to integrate data science operations into database systems, these attempts have not been particularly successful. Conceptually, this project takes the opposite approach: it aims to bring the reliability and scalability of database systems into data science frameworks by redesigning the underlying data management layer. The success of this project will require deep technical innovations with a deep understanding of computer systems.”
Grainger Engineering Affiliations
Yongjoo Park is an Illinois Grainger Engineering professor of computer science.