Revolutionary AI tool: poised to redefine drug research and materials science landscape

7/17/2025 Bruce Adams

Computer Science (CS) professor Heng Ji has been invited to be a keynote speaker at upcoming top AI conferences about her team’s work on AI for Scientific Discovery and Science-inspired AI, mainly under the Molecule Maker Lab Institute and the Center for Advanced Bioenergy and Bioproducts Innovation. She will  be describing the development of mCLM, a Modular Chemical Language Model that is friendly to automatable block-based chemistry.

Written by Bruce Adams

The scientists who synthesize drugs have a lot of raw material to choose from.

“It turns out that if you look at about 35 atoms, which is about the size of a drug,” says Martin Burke, University of Illinois Urbana-Champaign College of Liberal Arts & Sciences, May and Ving Lee Professor for Chemical Innovation, and Professor of Chemistry, “there are theoretically 10 to the 60th power possible small molecules.”

“The goal is not to make all of them. The goal is to team up and figure out how to make the right ones. Better, faster, stronger.”

There are approximately 970 million small molecules that are drug-like. For example, 89 tyrosine kinase inhibitors are used in targeted therapies for cancer and other conditions across various healthcare systems worldwide. There are 184 chemical blocks based on them. Designing and synthesizing small molecules with drug-like properties is a slow and expensive process.

How do human scientists do this today? They play with Legos.

—Heng Ji computer science professor in the Siebel School of Computing and Data Science at the Grainger College of Engineering, University of Illinois Urbana-Champaign and an Amazon Scholar, from her keynote speech "Science-Inspired AI".

Currently, researchers consider all possible arrangements of atoms and focus on creating a unique process assembly for each targeted molecule that leverages a menu of thousands of different reactions, each run under thousands of possible conditions and using millions of possible starting materials. This is a slow, expensive, and high-risk endeavor.

Development costs average around $1.3 billion per drug, meaning that only economically advanced countries can afford to invest in research. Nearly half of all approved drugs are discovered in the United States. This not only excludes potential breakthroughs originating from the developing world but also inhibits drug research that could prevent diseases prevalent there.

Large Language Models (LLMs) can understand chemical knowledge and accurately generate sequential representations. However, they are limited in proposing novel molecules and carry the risk of hallucinations.

 

It is chemistry that machines can do; iteratively assembling small molecules from prefabricated building blocks using simple chemistry that is readily automated.
—Heng Ji

Heng Ji has been invited to be a keynote speaker at the upcoming top AI conferences IJCAI 2025, in Montreal, August 16–22, Semantics 2025 in Vienna, Austria Sept. 3-5 and EMNLP 2025, in Suzhou, China, Nov. 5-9 about her team’s work on AI for Scientific Discovery and Science-inspired AI, mainly under the Molecule Maker Lab Institute and the Center for Advanced Bioenergy and Bioproducts Innovation.

Burke has been a partner on the team, bringing his expertise in chemistry. Burke says, “Instead of having the tokens be the words, we figured out how to make the tokens building blocks for making molecules on robots. We created a language that allows the computer to speak chemistry in a way that can be physically translated into the real world with the push of a button on our robot."

 

It's a first of its kind in human history, and I think we have done something that is going to change the world. It's a phenomenal breakthrough that we have got with mCLM.
Martin Burke

Britannica defines code-switching as the “process of shifting from one linguistic code (a language or dialect) to another, depending on the social context or conversational setting.

A block-based approach to chemistry for small molecule synthesis has recently emerged. It is chemistry that machines can do; iteratively assembling small molecules from prefabricated building blocks using simple chemistry that is readily automated. In this approach, structural fragments and functional groups serve as “chemical words” or “tokens.” As with written language, these words can be broken down and reassembled, perform different functions in context, and are diverse, with different structures capable of performing the same function. The linguistic parallels suggest a role for LLMs trained to place atoms, and then molecules, in a modular sequence for robots to assemble.

CS professors Hao Peng, Jiawei Han and Ge Liu and department of chemical & biomolecular engineering professor Ying Diao contributed their expertise in representation and complex reasoning chains to continue training the models. The aim is to teach computers to speak two complementary languages: one that represents molecular subgraph structures indicative of specific functions, and another that describes these functions in natural language. What resulted, they say, is a function- and synthesis-aware modular chemical language model (mCLM).

Four headshots in a row.
Left to right: computer science professors Hao Peng, Jiawei Han and Ge Liu and department of chemical & biomolecular engineering professor Ying Diao.

Ji, Peng, Han and Diao are collaborating with Burke, whose research group is pioneering the synthesis of small molecules. In contrast to the traditional, complex, and customized approach, Burke has demonstrated that his simple and increasingly general platform for synthesizing small molecules is well-suited for automation and integration with artificial intelligence, thus opening a path toward democratized molecular innovation. The team came together under the auspices of the Molecule Maker Lab Institute (MMLI).

Burke summarizes his pitch to the NSF-funded MMLI:

“Look, we've got one of the world's best chemistry departments and we've got one of the world's best computer science departments. And most of us don't even know each other. Give us a chance to work together, and we'll make friends, speak each other’s language, and try to do interface science that'll hopefully be very impactful. Holy mackerel, this has been a success.” The project is five years on from that initial pitch.

Burke and Ji aided each other by exchanging information in their respective fields. Burke says, “I think she's taught me a lot more than I've taught her. I love the chance to work with brilliant game-changing people. This chance to work with her is a phenomenally appreciated opportunity. I've learned so much from her. I knew nothing about any of this stuff when we started. I feel like she's helped me see under the hood to understand how large language models work.”

A man stands in front of green chemistry equipment.
Photo Credit: University of Illinois Urbana-Champaign / Fred Zwicky
Martin Burke, May and Ving Lee Professor for Chemical Innovation, and Professor of Chemistry

Likewise, Ji says, “Marty has been the most exceptional collaborator I’ve had the privilege to work with throughout my career. From the very beginning, we discovered a strong alignment in our values around doing science—with integrity, curiosity, and a deep commitment to impact. He is remarkably creative, unfailingly patient, intellectually rigorous, and open to new perspectives. His pursuit of excellence is truly inspiring. Collaborating with Marty has been transformative for me—I went from having little background in chemistry to feeling confident in developing chemistry-inspired AI, thanks to his mentorship and encouragement. His support and partnership have been a major reason I’ve chosen to remain at the University of Illinois.”

Using a vivid metaphor, Burke explains that: “Drugs are unicorns. You look at that forest that we just described, the 10 to the 60th power number of molecules, there's the forest. There are like unicorns running around. And how do you find the unicorns? High output brute force, make as many things as possible. That's not the way to do it, and the world has learned that lesson. That's how people did it for a very long time. It doesn't work very well.

We're on the cusp of a revolution in drug discovery as we speak, where AI, in concert with automated modular chemistry, has the chance to finally transform the way we find tomorrow's medicines where we take the complexity of a human person and can be able to intentionally create precise medicines even for an individual patient, on demand with all the data and AI guidance that we need. I think we'll look back 1000 years from now and recognize this five to ten-year period as the inflection point. And I think the mCLM that has been created is going to be one of the key catalysts that make this possible.”

The future work to this first attempt to jointly model natural language model sequences with modular chemical language and perform chemical reasoning includes incorporating richer information, such as 3D structures of molecules and physical constraints, other modalities such as protein, nucleic acid sequences and cell lines, knowledge from physical simulation tools, protein interaction dynamics, chemical and reaction knowledge bases, and scientific charts into mCLM. The work also includes investigating long reasoning chains across a wider variety of functions, extending to material discovery and the synthesis of molecules, as well as conducting physical testing.

The next step beyond that is for the team to enable the mCLM model to enhance itself and co-evolve with human scientists.

Burke summarizes that, “one of the amazing things that the mCLM is allowing us to now start to dream about, and again, Heng is leading on this, is to create what you would call an AI scientist that is really a collaborator. The goal in no way, shape, or form is to replace humans’ imagination and creativity. It's to intentionally create a fantastic collaborator. Stuff comes along, it's got hype, and it goes away. This is not that. I've been doing this for 20 years, and with the things I've seen already, I lie in bed at night realizing I need to change my career goals, because the places we can reach are far beyond what I was even dreaming about five years ago. There are certain things that humans still do really, really well. But there are certain things that AI does way better than we can, and I think learning how to synergize is one of the themes of the Molecule Maker Lab Institute.”

Burke concludes,

We have three of these AI institutes in Illinois. To my knowledge, we're the only university in the country that has three of them. Illinois is a place where there are no barriers between disciplines. They just don't exist. We're grateful for that opportunity. We don't take it for granted, and we're trying to run as fast as we can while the lights are on to get as much done as we can.


Grainger Engineering Affiliations

Ying Diao is an Illinois Grainger Engineering professor of chemical & biomolecular engineering. 

Jiawei Han is an Illinois Grainger Engineering professor of computer science. Jiawei Han holds the Michael Aiken Chair. 

Heng Ji is an Illinois Grainger Engineering professor of computer science and is affiliated with electrical and computer engineering, Coordinated Science Laboratory.  She is the founding director of two AI centers at UIUC – the Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE), and the CapitalOne-Illinois Center on AI Safety and Knowledge Systems (ASKS).

Ge Liu is an Illinois Grainger Engineering professor of computer science. 

Hao Peng is an Illinois Grainger Engineering professor of computer science.


Share this story

This story was published July 17, 2025.