LLM Pen Testing Tools for Jailbreaking and Prompt Injection

Large Language Models (LLMs) present a complex array of opportunities and vulnerabilities. Prompt injection and jailbreaking techniques have emerged as indispensable methodologies for probing the resilience of these models, expanding the horizons of their capabilities while uncovering potential weaknesses.

Amidst the recent spotlight on Microsoft's PyRIT, a constellation of LLM pen testing tools has gained more attention:

1. garak by Leon Derczynski

Derczynski's garak stands as a testament to the potency of prompt injection techniques, providing a lens through which to scrutinize the fortitude of LLMs in the face of adversarial inputs.

2. HouYi by YI LIU and Gelei Deng

HouYi's contribution lies in its innovative approach to LLM security, illuminating potential vulnerabilities and avenues of exploitation within these models.

3. JailbreakingLLMs by Patrick Chao, Alex Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, and Eric Wong

A collaborative endeavor, JailbreakingLLMs delves into the intricacies of jailbreaking tailored for LLMs, offering invaluable insights into fortifying these systems against malicious incursions.

4. llm-attacks by Andy Zou, Zifan Wang, and Zico Kolter

The llm-attacks framework presents a comprehensive toolkit for assessing the security posture of LLMs, encompassing a myriad of attack vectors and defensive strategies.

5. PromptInject by Fábio Perez and Ian Ribeiro

PromptInject emerges as a beacon of vigilance against prompt injection attacks, furnishing pragmatic solutions for bolstering the security resilience of LLMs.

6. LLM-Canary by Jamie Cohen and Jackson Gor

LLM-Canary introduces novel methodologies for detecting anomalous behaviors within LLMs, serving as a preemptive safeguard against potential breaches.

7. PyRIT by Microsoft

Microsoft's PyRIT commands attention as a recent addition to the list of LLM pen testing tools, underscoring the burgeoning significance of AI security in the contemporary cybersecurity discourse.

Credits to Idan Gelbourt and Simo Jaanus for researching this list.

For those inclined towards deeper exploration, please see our past research on prompt injection detection:

Link to previous research on prompt injection detection