CS students received the Jay Lepreau Best Paper Award

8/7/2024 Bruce Adams

Illinois CS students and faculty, with their collaborators, received the Jay Lepreau Best Paper Award at the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI ‘24). OSDI, the premier conference on computer systems research, selects three best papers each year after a double-blind review. The paper presents a framework called Anvil for building formally verified cluster management controllers.

Written by Bruce Adams

Illinois CS students, faculty, and collaborators received the Jay Lepreau Best Paper Award at OSDI 2024.
Illinois CS students, faculty, and collaborators received the Jay Lepreau Best Paper Award at OSDI 2024.

Illinois CS students and faculty, together with their collaborators, received the Jay Lepreau Best Paper Award at the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI ‘24). OSDI, the premier conference on computer systems research, selects three best papers each year after a double-blind review.

The paper, titled “Anvil: Verifying Liveness of Cluster Management Controllers,” presents a framework called Anvil for building formally verified cluster management controllers. It is the first tool to apply formal verification to the problem of controller correctness, with a general specification called Eventually Stable Reconciliation (ESR), written as a concise temporal logic liveness property. With Anvil, developers can write controller implementations in Rust and verify that the controllers correctly implement ESR. Anvil also supports the verification of safety properties. The team used Anvil to verify three Kubernetes controllers, which can readily be deployed in Kubernetes platforms and are comparable in terms of features and performance to widely used unverified controllers.

The project is led by Xudong Sun, a fifth-year PhD student at the Grainger College of Engineering Siebel School of Computing and Data Science, working with Wenjie Ma (a visiting undergraduate student at Illinois, Jiawei Tyler Gu (CS PhD student), and Zicheng Ma (an undergraduate student from ZJUI participating in CS Summer Research Program), and CS professor Tianyin Xu, in collaboration with Tej Chajed from the University of Wisconsin-Madison, Jon Howell, Andrea Lattuada, Oded Padon and Adriana Szekeres from VMware Research, and Lalith Suresh from Feldera.

CS PhD student Sun Xudong presents at OSDI 24.
CS PhD student Sun Xudong presents at OSDI 24.

Sun presented the paper in July at OSDI in Santa Clara, CA, and the work was very well received. Mooly Sagiv, the Chair of Software Systems at Tel Aviv University and a Fellow of the ACM, called the paper “A major achievement for formal verification.” Dominik Tornow, the author of “Think Distributed Systems,” wrote, “Brilliant work and fantastic paper.” The team has been invited to seminars and workshops at Google, Microsoft, and VMware.

“It is very challenging to build provably correct software systems that are practical and readily deployable in real-world cloud/datacenter infrastructures.” says Xu, “The research is very interdisciplinary across computer systems and formal methods.”

Sun attributes the success to the “dream team” he has been working with and learning from. The team represents people from industrial research and academia, with experts in cluster management, distributed computing theory and practice, programming languages, formal verification theory and practice, testing, and systems reliability.

Suresh, CEO and Co-founder of Feldera, and a core member of the project, said, “Practical systems verification is a lofty goal, and you need a highly interdisciplinary team like this one to push the frontier.”

The paper is the result of Sun’s continuous research on testing and verification of cloud infrastructure systems over the past four years. Before Anvil, he had developed the successful testing tools Sieve and Acto, which found hundreds of serious bugs in critical infrastructure code that led to system outages, data loss, and security vulnerabilities. The recent outages caused by bugs in a CrowdStrike security update affected people worldwide and are as vivid an example as needed to demonstrate how minor errors in coding can have serious consequences.

“We are very successful in finding important bugs by testing, but I hope to go one step further – proving that there are no more bugs in these critical systems.” Sun states, “Our vision is to build provably correct cloud infrastructures through formal verification.”

Anvil is only the first step towards Sun’s ambitious research vision. The team will continue to push the boundaries towards probably correct and practical cloud/data center infrastructures.

Sun is entering his final year as a PhD student at Illinois. He plans to find a faculty job and continue this exciting research. As a Mavis Future Faculty Fellow, he is well prepared.

Professor Tianyin Xu
Tianyin Xu

“Xudong is exceptional and is among the best junior researchers I have worked with in my career,” says Xu. “He always targets deep, fundamental problems, thinks deeply and critically, and knows how to work with and learn from his colleagues and team members. I have no doubt that he will be a fantastic professor in the near future.”

Lalith Suresh, CEO/Co-founder at Feldera, tweeted that “The star of the show was Xudong Sun, the lead PhD student on the project, who had to work at the intersection of ALL those areas above! He is absolutely *fearless*, one of the key marker traits of a successful systems researcher. Xudong will be on the academic and industrial research job market this year -- if you're on a hiring committee, you'd be foolish not to hire him.”


Share this story

This story was published August 7, 2024.