Varun Chandrasekaran

 

 

Varun Chandrasekaran

 

 

 

Talk: Enhancing Safety in LLMs and

other Foundation Models

Foundation models are increasingly deployed in high-stakes environments, yet ensuring their safety remains a pressing challenge. This talk explores recent advancements in understanding and mitigating their risks, drawing on four key studies. We will examine (1) new frameworks for evaluating and aligning model behavior with human intent, (2) the security and reliability of watermarking techniques in foundation models, including their role in provenance tracking and their vulnerabilities to adversarial removal and evasion, and (3) novel approaches for detecting and mitigating high-risk model outputs before deployment. By synthesizing these findings, we will discuss the broader implications of foundation model security, trade-offs between robustness and control, and future directions for improving AI safety at scale.

Workshop Home Page

Return to the TAU-UIUC Workshop Home Page

Home→