5/21/2024 Bruce Adams
Illinois Grainger Engineering Computer Science professor Tianyin Xu and his research team are developing automatic end-to-end testing for large scale cloud systems.
Written by Bruce Adams
Introducing Acto (Automatic, Continuous Testing for (Kubernetes/OpenShift) Operators), an initiative from the Grainger College of Engineering IBM-IL Discovery Accelerator Institute (IIDAI). The project could revolutionize cloud system testing with its innovative use of automatic end-to-end testing techniques. Principal Investigator CDS professor Tianyin Xu succinctly described Acto as a "push button": a fully automatic end-to-end testing tool designed to test across large-scale industrial systems.
The research team consists of Grainger Engineering graduate students Jiawei Tyler Gu, Xudong Sun, Wentao Zhang, Yuxuan Jiang, Chen Wang, and Mandana Vaziri from IBM Research and Owolabi Legunsen from Cornell. Their conference paper Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management was published in SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles in October 2023.
Xu described the core problem his research team is considering: “If you think about the internet, you probably are very concerned about, like, you know, that critical infrastructure, because if they break, everything breaks. The key in the research is to think about what cracking is and fault tolerance because they are not supposed to fail, and they shouldn't always be correct. If your application fails, it is just one application fail. But if it's a system, it's a disaster.”
Xu says the process of testing these systems must be automated: “You want to instruct the system ‘Don't do that’ again, and again, and again. Asking human developers to write code like that is time-consuming and hard to complete.”
Xu continued, “The research that I've done essentially answers the question of how we ensure, but not guarantee, the correctness of the software. Because if they have bugs, you know, ‘you're not reliable, and you can fail.’ And even when you are in an error state, systems are going to recover and be able to tolerate those errors. So, we did research, trying to think about those cloud-like infrastructure systems.” He says this is where IBM came in: “They, of course, are very interested in this type of research because they want to make sure of the correctness and the fault tolerance of the system they are building.”
Xu and his grad students received a one-year, $292.6K IBM grant from the IIDAI Undergraduate Research Experience (URE) that began on January 1, 2024. Three National Science Foundation (NSF) grants also support the research. The work has been integrated into a CS grad-level course on modern computing technologies. Xu says it would be an excellent way to share successfully implemented work and collaboration with graduate students, industry, IIDAI, and Broadening Participation in Computing (BPC) efforts with undergrads.
He noted, “What's interesting about this project is that the students are more ambitious now. I can write a research paper, and then I can open another one. I can ask, ‘Why do we use the community controllers?’ and they find a lot of problems. I can have very serious problems unrelated to availability, like serviceability and event recovery, and report all the bugs, and then I have developers that can actively fix them. So, they get a lot of integrated feedback, plus it's a collaboration with IBM. The collaborators from IBM have always been very supportive. They are very encouraging to where I can say, ‘Hey, we should write a research paper. The paper was published at a prestigious and selective Symposium on Software Performance.”
The 292.6K IIDAI grant is not for URE only. It carries with it the goal of open-sourcing and continuing to develop the research project into a valuable tool for the cloud-native research and engineering community. David Cahill, co-director of IIDAI, observed that IBM is interested in Acto as it is “fundamentally open source, and then they build on top of that their products and consulting services.” He continued that “a pretty large chunk of the budget of the Institute goes to paying the stipends and travel costs of students from here to go work at one of the IBM research division’s sites.”
Xu observed, “The students want to do bigger things. They want to turn this research into a product, which means it's well maintained, it has a path to practice, and it's surely going to be used by many people, not only the researchers or the students who work on the data themselves. So, they're central.”