Keeping chatbots free of biases

7/17/2024 Bruce Adams

CS professor Gangandeep Singh leads the FOrmally Certified Automation and Learning (FOCAL) Lab, which focuses on constructing intelligent computing systems with formal guarantees about their behavior and safety. Isha Chaudhary is an Illinois PhD student who has worked at FOCAL Lab for two years and contributed to a team that released the first framework for certifying bias in the output of LLMs in May 2024.

Written by Bruce Adams

Millions of people worldwide use large language models in the form of chatbots on websites and search engines. Safety and trustworthiness are crucial to chatbots' accuracy and reliability. CS professor Gagandeep Singh says that “large language models (LLMs) are increasingly used to generate content on the web and social media. However, LLM output can be biased, which can easily spread misinformation and widen social gaps between various demographic groups.”

Singh leads the FOrmally Certified Automation and Learning (FOCAL) Lab, which focuses on constructing intelligent computing systems with formal guarantees about their behavior and safety. Isha Chaudhary is an Illinois PhD student who has worked at FOCAL Lab for two years and contributed to a team that released the first frame work for certifying bias in the output of LLMs in May 2024.

Singh notes that Chaudhary “was the only PhD student working on this project. As our work is the first of its kind, we spent a significant amount of time characterizing the problem mathematically, along with researchers from Amazon. Once that was complete, Isha led the development of the certification framework, writing algorithms, code, and paper with inputs from me and Amazon researchers.” The project was funded by a grant from the  Amazon-Illinois Center on Artificial Intelligence (AI) for Interactive Conversational Experiences (AICE) Center.

Isha Chaudhary, PhD student — Isha Chaudhary

Chaudhary says that working with the AICE Center “provided feedback about our intermediate progress” via symposiums and the contributions of co-investigators Qian Hu, Morteza Ziyadi, and Rahul Gupta from Amazon. Manoj Kumar from Pyron rounded out the team. She notes that the paper is “in the preprint, but of course, we will be presenting it in different venues. We'll also have a presentation at the AICE Center symposium, which happens in September.” The team’s website includes the paper and code.

The project team noted in their abstract that “Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it cannot scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework, QuaCer-B (Quantitative Certification of Bias), that provides formal guarantees on obtaining unbiased responses from target LLMs under large sets of prompts. ...In particular, we certify the LLMs for gender and racial bias with distributions developed from samples from the BOLD and Decoding Trust datasets, respectively.”

Singh observes that “the AI industry currently relies on evaluating the safety and trustworthiness of its models by testing them on a small set of benchmark inputs. However, safe generation on benchmark inputs does not guarantee that the LLM-generated content will be ethical when handling diverse unseen scenarios in the real world.” 

The paper describes how the QuaCer-B framework evaluates and certifies LLMs for racial biases and gender biases with African Americans and females as protected groups, respectively.” It can be applied to other forms of discriminatory content. Chaudhary observes, "There is no particular restriction in the framework to be applied to other kinds of bias. One obvious point of expansion is to include more instances of sensitive attributes. For instance, we could introduce more demographic groups in racial bias, and the framework is flexible enough to allow this kind of introduction. All one needs to do is have a bias evaluator function that can take the LLM's responses for the different sensitive attributes. It specifies biases in the given set of responses. If supplied with that function, then QuaCer-B is perfectly extendable to multiple sensitive attributes.”

“We have shown some examples which we thought were more prominent examples of bias in LLM. It is perfectly extendable to other kinds of biases,” she states, pointing to nationality, religion, or disability as examples. Singh said, “QuaCer-B generates high-confidence bounds on the probability of obtaining biased responses from LLMs in unseen scenarios” and “enables LLM developers to make informed decisions about the suitability of their models for real-world deployment, and also identify causes of failures to improve the model.”

On June 26, 2024, Microsoft Azure CTO Mark Russinovich revealed in a blog post that a new jailbreaking technique has been discovered that can “subvert all or most responsible AI (RAI) guardrails built into” generative AI so that a model will not be able to determine malicious or unsanctioned requests from any other.” This relates to FOCAL Lab’s QuaCer-B (Quantitative Certification of Bias) framework. The potential impact of guaranteeing the reliability, safety, and trustworthiness of chatbots and the content in the LLMs behind them will only grow in importance with the revelation of such attacks on them.

Share this story

This story was published July 17, 2024.