Rishub Tamirisa

Rishub Tamirisa

I am a 4th-year undergraduate at the University of Illinois at Urbana-Champaign. I’m interested in research toward improving both AI capabilities and alignment.

I’m currently working on improving reasoning in LLM agents, as well as white-box adversarial robustness for LLMs. My work has been accepted at ICML, CVPR, and ICLR; also featured in WIRED.

I co-run a research organization called Lapis Labs (prev. AI@UIUC), and was a founding engineer at a Sequoia-backed AI startup, Mindy Group . I also previously worked at NASA.

News



Aug 2nd, 2024.Our work on Tamper-Resistant Safeguards is featured in Wired. article.
Aug 1st, 2024.We release our work on Tamper-Resistant Safeguards. arxiv.
May 20th, 2024.I'll be joining the Center for AI Safety as a Research Engineer Intern.
May 1st, 2024.WMDP is accepted at ICML 2024.
March 5th, 2024.WMDP is featured in TIME. article
March 4th, 2024.Our work on Robust Unlearning is accepted at SeT LLM @ ICLR 2024.
February 26th, 2024.FedSelect is accepted at CVPR 2024.
February 13th, 2024.Announcing our $6M seed round and the launch of Mindy. blog post.
October 12th, 2023.I gave a talk internally at Google Research. linkedin post.
June 19th, 2023.FedSelect is accepted at FL @ ICML 2023.

Research



Image caption

Tamper-Resistant Safeguards for Open-Weight LLMs

Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika
Preprint.
Image caption

Toward Robust Unlearning for LLMs

Rishub Tamirisa, Bhrugu Bharathi, Andy Zhou, Bo Li, Mantas Mazeika
Secure and Trustworthy LLMs @ ICLR 2024.
Image caption

FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian
CVPR 2024.
Image caption

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
ICML 2024.
Image caption

FedSelect: Customized Selection of Parameters for Fine-Tuning during Personalized Federated Learning

Rishub Tamirisa, John Won, Chengjun Lu, Ron Arel, Andy Zhou
Federated Learning @ ICML 2023.