8/29/2023 Michael O'Boyle
Illinois CS Professor Sasa Misailovic wants to apply natural language processing to professional software development, allowing developers to pinpoint cases where software may exhibit unexpected or undesirable behavior.
Written by Michael O'Boyle
Natural language processing is the field of computer science underpinning the recent boom in chatbot technology, allowing human language to be processed by computers and computational results to be rendered in forms understandable by humans. While its potential uses in nearly all areas of society are frequently discussed, Illinois Computer Science Professor Sasa Misailovic wants to apply it to professional software development.
In particular, he is interested in the development of test code, allowing developers to pinpoint cases where software may exhibit unexpected or undesirable behavior. With his collaborators at the University of Texas at Austin, Misailovic is developing natural language models that can process instructions from developers and return ready-to-use test code. They call their approach “NLP4Test.”
“We’re aiming to improve the practice of software development and testing,” Misailovic said. “We’re looking for how natural language processing can replace time-consuming manual software testing practice and free developers to focus on other tasks.”
The award of $1.2 million is provided through the National Science Foundation’s Software and Hardware Foundations program and will be distributed over four years. Misailovic is a co-principal investigator.
Software developers evaluate their work by imagining ways their code may misbehave or return incorrect answers and writing test cases which create these conditions, taking time away from the actual development. Developers may also find that their test cases fail even though the code works as intended. These so-called “flaky” tests can take developers on wild goose chases that consume even more time and cast doubt on valid work.
The researchers are investigating the use of natural language processing, which can turn human language prompts or comments into test code. Misailovic is interested in developing models for generating tests that expose bugs in machine learning software and modify test code to avoid “flakiness.” He will build on work that he started with former graduate student Saikat Dutta, who will start as an assistant professor of computer science at Cornell University next year.
“For instance, the data generated as the program executes can be a powerful supplement to natural language processing because flakiness is not something that can be predicted with certainty just from the program code,” Misailovic said. “If additional execution data were incorporated into NPL4Test, it would allow developers to ask questions like ‘Here’s a piece of code that I wrote and a test that failed. Is the failure due to a bug or flakiness?’ and ‘What is the best way to start debugging this issue?’”
By studying these problems, the researchers will have a series of techniques that generate test code to improve the reliability of machine learning software and give a better understanding of how to assess flakiness.
Misailovic’s collaborators at UT Austin, computer science professor Milos Gligoric and linguistics professors Jessy Li and Kyle Mahowald, have previously studied the application of natural language processing and software engineering. They are now incorporating recent developments in natural language processing in all stages of software testing.