Rishub Tamirisa

Rishub Tamirisa

I recently graduated from the University of Illinois at Urbana-Champaign. I’m currently working on research toward improving both AI capabilities and alignment.

My prior research has focused on developing adversarially robust safeguards for LLMs, while ensuring benign capabilities can improve safely and reliably. My work has been accepted at ICLR, CVPR, and ICML; also featured in Wired.

Previously, I was a founding engineer at Mindy Group, a Sequoia-backed AI startup. I also co-led Lapis Labs, a student-led research group, and worked at NASA.


News



Feb 11th, 2025.We release our work on Emergent Value Systems in AIs.
Jan 22nd, 2025.Our work on Tamper-Resistant LLM Safeguards is accepted at ICLR 2025.
Aug 2nd, 2024.Our work on Tamper-Resistant Safeguards is featured in Wired. article.
May 20th, 2024.I'll be joining the Center for AI Safety as a Research Engineer Intern.
May 1st, 2024.WMDP is accepted at ICML 2024.
March 5th, 2024.WMDP is featured in TIME. article.
March 4th, 2024.Our work on Robust Unlearning is accepted at SeT LLM @ ICLR 2024.
February 26th, 2024.FedSelect is accepted at CVPR 2024.
February 13th, 2024.Announcing our $6M seed round and the launch of Mindy. blog post.
October 12th, 2023.I gave a talk internally at Google Research. post.
June 19th, 2023.FedSelect is accepted at FL @ ICML 2023.

Research



Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Mantas Mazeika, Xuwang Yin, Rishub Tamirisa, Jaehyuk Lim, Bruce W. Lee, Richard Ren, Long Phan, Norman Mu, Adam Khoja, Oliver Zhang, Dan Hendrycks
Preprint | arxiv | site | code | thread
Image caption

Tamper-Resistant Safeguards for Open-Weight LLMs

Rishub Tamirisa*, Bhrugu Bharathi*, Long Phan, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika
ICLR 2025 | arxiv | Wired | code | site
Image caption

Toward Robust Unlearning for LLMs

Rishub Tamirisa*, Bhrugu Bharathi*, Andy Zhou, Bo Li, Mantas Mazeika
Secure and Trustworthy LLMs @ ICLR 2024 | paper
Image caption

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian
CVPR 2024 | arxiv | code
Image caption

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Nathaniel Li*, Alexander Pan*, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
ICML 2024 | arxiv | Time | blog | site
Image caption

FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning

Rishub Tamirisa, John Won, Chengjun Lu, Ron Arel, Andy Zhou
Federated Learning @ ICML 2023 | arxiv
Image caption