CS PhD student Chi Han diagnoses language models

9/11/2024 Bruce Adams

CS professor Heng Ji and team’s LM-steer paper, jointly supported by DARPA INCAS, SemaFor, MIPs, and ITM, has won the Outstanding Paper Award at The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) in Bangkok, Thailand this past August. The work was led by PHD student Chi Han, who also won the Outstanding Paper Award at the NAACL 2024 Annual Conference for "LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models."

Written by Bruce Adams

Illinois Grainger Engineering Siebel School of Computing and Data Science PhD student Chi Han compares his interest in large language models to modern medicine: “The motivation for me to dive into this topic is to be able to understand and cure the disease of language models, just like how we do with humans.” He continues the metaphor, saying, “One advantage of modern medicine compared with traditional medicine hundreds of years ago originates from its systematic understanding of the human body, the functions of different organs, and the ability to locate the source of diseases in order to target them more accurately.” Han says, “Instead of helping language models in a way similar to traditional medicines, I aim to provide the community with a more scientific perspective: what are the functions of different LLM components? When might they go wrong? How can we accurately cure or improve them without sacrificing other abilities?”

CS professor Heng Ji and team’s LM-steer paper, jointly supported by DARPA INCAS, SemaFor, MIPs, and ITM, has won the Outstanding Paper Award at The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) in Bangkok, Thailand this past August. The work was led by Han, who also won the Outstanding Paper Award at the NAACL 2024 Annual Conference for "LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models."

Describing potential uses of LM-steer, Han says that LLMs are “trained to approximate an averaged personality or tone using data from almost all the humans. To model the diversity in human tones, I found out that word embeddings of language models are responsible for relating words with personalities. By linearly transforming the word embedding space, language models are able to steer their tones flexibly and efficiently while highlighting the words that are most evident to this personality, a tool which is even transferable between models.”

Ji adds, “Content producers urgently need a way to efficiently estimate the potential impact of their messages, aiding with prevention of unexpected negative outcomes, and attain their communication goals for risk management, such as the avoidance of undesirable backlash. Sun Tzu’s Art of War theory tells us if you know the enemy and know yourself, you need not fear the result of a hundred battles. We developed a novel message ‘wargaming’ framework that consists of two key components from these two outstanding papers by Chi Han. First, we can steer the sender’s stance/sentiment/value during message rewriting using LM-Steer. It is theoretically grounded, lightweight, and simple for generative language model conditioning. You can even combine multiple dimensions for message rewriting. Second, previous large language models typically train on short text segments, but their performance suffers drastically on longer inputs. Chi developed LM-Infinite, a simple and effective method to allow LLMs pre-trained with 2K or 4K-long segments to generalize to up to 200M length inputs.”

The abstract for the LM-Infinite paper notes, “Today's large language models (LLMs) typically train on short text segments (e.g., <4K tokens) due to the quadratic complexity of their Transformer architectures. As a result, their performance suffers drastically on inputs longer than those encountered during training, substantially limiting their applications in real-world tasks involving long contexts such as encoding scientific articles, code repositories, or long dialogues. “

As Professor Ji describes the LM-Infinite paper, “When we apply the model to longer contents than training time, they will start to lose focus and coherence and generate nonsense.” What Han proposed, she explains, was “a very simple solution, where he keeps the first few tokens in the context because those are usually important tokens like titles or starting words. He also keeps a few thousand tokens in the local context because that part is important for a meaningful generation. By doing that, the language model is able to understand and generate almost infinite lengths of text.”

Han says, “LM-Infinite is trying to deal with the limitations of the context window (or the length window) of language models.” Most previous LLMs, he observes, “only allow you to converse up to a limited length, and after that, you will have to refresh the model to start a new conversation session. This is unimaginable for humans. You cannot imagine talking with friends for an hour and the friend saying, “Sorry, but my mind is full of memories. Forgive me that I need to forget everything before we talk from the start again.”

White brick entrance wall of the Thomas M. Siebel Center for Computer Science. Ji elaborates, “If I want the model to understand an event in Harry Potter, volume three, maybe that event is related to a previous event in volume one. Many models cannot do that backward and forward reasoning because that information is already forgotten after that long content.” LM-Infinite, she says, “can dynamically make the connection and assign weights to key parts. We humans might go back to look for related pieces instead of reading the whole book from the start again. LM-Infinite allows the model to find the crucial points and highlights while still being able to absorb long contexts of information without having to stop to restart. This has great potential, especially in the scientific domain. One of my other projects, NSF AI Institute for Molecule Synthesis (MMLI), is using AI for science to discover new drugs and materials. I'm excited because the purpose of reading a paper is not memorizing tokens from left to right. When we read a paper, we look at the terminology for clarification. We go back to the previous section or look at the results and intentionally and actively pick the most important information to fill in the blanks and generate new ideas and hypotheses. I think this work mimics the human reading mechanism.”

Speaking of his experience at the University of Illinois Urbana-Champaign, Han says, “The thing that I particularly like here is the academic atmosphere. It allows me to explore different directions and select freely. When I first came in 2021, I was lost about what research to do. It was in the middle of the pandemic, and the area was already very crowded. I tried out a few different directions at first, so I published slowly. There was some peer pressure because many students are excellent, smart, and productive, but professor Ji did not pressure me to get more papers out. Instead, I was allowed to explore at my own pace, finding out my true interests in understanding the science of language models.”

Professor Ji concludes by saying, “I’m extremely lucky and grateful to be able to work with amazing students like Chi Han,” when describing Han’s contribution to her research team.

Share this story

This story was published September 11, 2024.