4/22/2025 Bruce Adams
CS professor Josep Torrellas and his students and postdocs in the Siebel School of Computing and Data Science, together with academic and industry collaborators, have produced several papers that will directly impact cloud computing enterprises and users. These papers have received acclamation at conferences, and one industry partner has already applied their proposals.
Written by Bruce Adams
Data centers running Large Language Models (LLMs) consume substantial energy as they require many computing resources to provide service to their billions of users. At the same time, their execution is unsafe, as attackers sharing the cloud with LLMs can steal secret information.
CS professor Josep Torrellas and his students and postdocs in The Grainger College of Engineering Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign, together with academic and industry collaborators, have produced several papers that will directly impact cloud computing enterprises and users. These papers have received acclamation at conferences, and one industry partner has already applied their proposals.
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency, a paper from Torrellas’ team, led by CS PhD student Jovan Stojkovic, together with researchers from Microsoft Azure Research, received the Best Paper Award at the 2025 IEEE International Symposium on High-Performance Computer Architecture (HPCA), on March 5 in Las Vegas, NV.
DynamoLLM is the first energy-management framework for LLM inference environments. It automatically and dynamically reconfigures an inference computing cluster to optimize the energy of LLM serving under the performance service level objectives (SLOs) of the services.
As Stojkovic explains, “LLMs are running very power-hungry GPUs and have very large software stacks. They use lots of energy, produce lots of carbon emissions, and have a bad impact on our environment. The work describes the proper way to design these inference clusters to minimize energy consumption while ensuring that both the quality of the results and the performance of the systems are at acceptable levels.”
Stojkovic (who has accepted an Assistant Professor position at the Department of Computer Science at the University of Texas at Austin) interned at Microsoft Azure Research during the summer of 2024 and said, “We started by characterizing whatever was open source. Then, we ported these ideas to Microsoft’s own production-level data. This work and its extensions are currently being implemented in production at Microsoft. They will soon be available in Azure and will serve billions of user requests so that we can get good performance while minimizing the cost for data centers.”
The paper "CXLfork: Fast Remote Fork over CXL Fabrics," from Torrellas’ team, led by CS Postdoctoral Researcher Chloe Alverti, CS PhD students Burak Ocalan and Shashwat Jaiswal, CS professor Tianyin Xu and PhD student Stratos Psomadakis from the National Technical University of Athens, received the Best Paper Award at the 2025 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) on April 1 in Rotterdam, Netherlands.
Torrellas says, “CXL is a new type of memory system that allows multiple nodes in a computing cluster to share memory. CXL is being pushed forward by Intel and other companies. This technology enables very fast communication between the different nodes.” “In this work,” he continues, “we come up with a new use for CXL. It enables starting new processes in the different nodes very quickly, which speeds-up many cloud programs. We also hope this work will enable many new uses in cloud operations, as it is now cheaper to start processes on different nodes.”
The paper "Everywhere All at Once: Co-Location Attacks on Public Cloud FaaS" from Torrellas’ team, led by CS student Neil Zhao (CS PhD 2024, who will be an Assistant Professor in the Department of Electrical and Computer Engineering at the University of Texas at Austin in Fall 2025) and collaborating professors Adam Morrison and Christopher W. Fletcher, was selected as an IEEE Micro 2025 Top Picks in Computer Architecture Conferences. IEEE notes that “This issue collects some of last year’s most significant research papers in computer architecture based on novelty and potential for long-term impact.”
Torrellas says, “In this paper, we show that clouds are not safe, as an attacker program can exfiltrate information from nearly any other cloud program.” As he puts it, two largest cloud service providers, Google and AWS “said it is not possible to attack programs in the cloud simply because there is too much noise in the cloud, and you don't know where your jobs are running.” AWS even had a white paper claiming this.
This work shows how to reverse-engineer the algorithm that Google uses to place jobs in different places in the cloud. After that, the authors devised techniques to co-locate attacker jobs in the same server as a victim program and exfiltrate secret information. This work is making a real-world impact. Google and AWS adjusted their cloud services because, as Torrellas says, “we showed them that this is possible. Google has filed a critical bug, and AWS has changed their white paper. They are making changes.”
Finally, the paper "Untangle: A Principled Framework to Design Low-Leakage, High-Performance Dynamic Partitioning Schemes," by CS student Neil Zhao, Adam Morrison, Christopher W. Fletcher, and Josep Torrellas was named a Top Pick in Hardware and Embedded Security in 2024.
This work considers how to solve the security problem mentioned above. When you run two programs on the same cloud server sharing the last level cache, the attacker program can exfiltrate information from the victim. For example, if the victim uses a large amount of cache space, the attacker can induce what the victim is doing. The state-of-the art approach to fix this problem is to statically partition the cache, where each program gets a fixed fraction of the cache space. That's safe, but it has low performance because a program might need a lot more space during certain parts of its execution.
The alternative to this is to dynamically change the size of the cache partition assigned to the program as the program executes. Unfortunately, such changes leak information to the other programs sharing the cache. In this work, the authors designed new dynamic partitioning hardware that minimizes the leakage that occurs. As a result, programs in the cloud can now run fast and in a safe manner.
Grainger Engineering Affiliations
Josep Torrellas is an Illinois Grainger Engineering professor of computer science and is the director of the SRC/DARPA JUMP 2.0 ACE Center for Evolvable Computing. Josep Torrellas holds the Saburo Muroga Professorship in Computer Science.