Prompt||GTFO Season 1 AI Security Conversations

Written by Knostic Team | Sep 24, 2025 5:16:15 PM

If you've been wondering what happens when cybersecurity professionals stop theorizing about AI threats and start actually breaking things, welcome to Prompt||GTFO. This isn't your typical conference where vendors pitch solutions to problems that might exist someday. Season 1 delivered a live-fire exercise in the messy, evolving reality of AI security, not some vendor-ridden pitch fest.

Think of it as DEF CON's scrappy younger sibling, one that grew up entirely in the age of large language models. Over five sessions, researchers, red teamers, and security practitioners shared real attacks, working defenses, and the kind of practical insights you can't get from vendor whitepapers. It’s a snapshot of where AI security stands today, and a preview of where it's heading fast.

Offensive Prompts Are Outpacing Your Guardrails

Season 1's most sobering theme was how quickly adversaries are evolving past standard defenses. The talks showed working exploits that make "just add a content filter" look dangerously naive, not theoretical vulnerabilities.

Aur Saraf kicked things off with a protocol confusion attack that should make any security architect nervous. By pre-encoding malicious prompts in ROT-13, he tricked multi-layer systems into trusting their own echoed output, completely bypassing guardrails. The LLM essentially became its own worst enemy, validating dangerous content because it came from "itself."

Michael Shalyt took a different approach, proving that mathematical manipulation doesn't need complex exploits, just subtle notation shifts. Extra zeros here, LaTeX formatting there, and suddenly you've got critical miscalculations that could sabotage financial systems or scientific research. The scariest part is that these attacks look completely innocent to human reviewers.

Perhaps even more unsettling was Nathan Case's demonstration of model persistence attacks. By carrying a conversation across different ChatGPT versions, he managed to extract dangerous bioweapons instructions that should have been blocked. The attack exploits the gap between model updates, and serves as a grim reminder that AI systems have memory, and attackers are learning to exploit it.

RSnake rounded out this theme with a clever detection technique that turns AI transparency against itself. Using self-incrimination prompts, he could prove an AI was behind responses, then extract its hidden system prompt. It's social engineering for the AI age and it works disturbingly well.

Red Teaming Goes Industrial

Gone are the days of manually testing one model at a time. Season 1 showed how both attackers and defenders are scaling their efforts across providers and configurations, turning jailbreak discovery into an engineering discipline.

Dragos Ruiu demonstrated this evolution perfectly, orchestrating multiple LLMs to reveal each other's filtering rules. It's like having a team of expert interrogators, each specialized in breaking different types of defenses. It’s essentially mapping gaps at a scale no human team could match.

Michael Bargury took this concept even further with his multi-model jailbreak platform. Running a single prompt across 30+ models simultaneously, he could quickly benchmark weaknesses and defenses across the entire AI landscape. What used to take weeks of manual testing now happens in minutes. The implications for both red teams and AI providers are staggering.

AI Building AI: The Self-Amplifying Loop

One of Season 1's most fascinating insights was how AI is accelerating its own development. We're seeing a self-amplifying loop where AI creates tools that, in turn, make AI stronger, faster, and more capable. The possibility of human coding capacity to catch up is becoming rather distant.

Justin Borland showed the practical power of this approach with LLM-generated deterministic parsers. The AI wrote the parser once, then deterministic code took over, running 500 times faster than the original LLM-based approach. It's the best of both worlds: AI creativity with traditional software reliability.

AI as Invisible Infrastructure

Beyond flashy chatbots and obvious AI applications, Season 1 revealed how AI is quietly sliding into everyday workflows. When AI becomes plumbing rather than product, security and reliability concerns become both more subtle and more critical.

Pedram Amini's voice cloning demonstration perfectly captured this shift. By chaining shell commands to turn structured data into personalized audio, he showed how AI disappears into routine automation. The technology operates invisibly, which makes it both more powerful and harder to secure.

Dean Valentine tackled the debugging challenge this creates with his AI request logging system. By capturing prompts and hashes, he made non-deterministic AI applications debuggable like traditional software. It's the kind of unglamorous but essential work that makes AI infrastructure actually viable in production.

Psychology Meets Technology

Not every attack in Season 1 was purely technical. Some of the most effective demonstrations exploited human psychology, whether in legal contracts or social engineering scenarios. AI lowers the cost of these attacks and makes them scalable in ways that should worry every security professional.

Jonathan Braverman's contract psychology analysis was particularly clever. By prompting AI to identify what opposing parties fear most, he turned routine legal review into negotiation intelligence gathering. It's the kind of insight that traditionally required expensive consultants and deep domain expertise.

Fred Heiding took this concept to its logical extreme with automated social engineering. AI-driven OSINT combined with persuasion techniques to make spear-phishing both cheap and highly scalable. We're not just talking about better phishing emails. We're talking about automatic, personalized manipulation campaigns that adapt in real-time.

Jeremy Snyder's self-prompting research added another wrinkle: LLMs that write their own prompts often outperform human-crafted ones. When AI can optimize its own instructions for sensitive data classification, the traditional boundaries between tool and user start to blur.

The Real Takeaway

Prompt||GTFO's first season captured something unique: a frontier where offense and defense leapfrog each other weekly. From stealthy protocol confusion to AI-built infrastructure and large-scale social manipulation, these talks show that both attackers and defenders are weaponizing AI's fundamental flexibility.

No vendors in sight; just practitioners sharing what actually works in the wild. The community is pushing past theory into hands-on tactics, making Season 1 less a conference and more a live-fire exercise in the next era of AI security.

The message is clear: AI security is happening now, evolving fast, and the people on the bleeding edge are sharing their insights at events like Prompt||GTFO. If you're serious about understanding where AI security is heading, this community is where the real work is being done.

Want to keep pace? Check out Prompt||GTFO, follow the experiments, share your own, and help push the field forward.

View full post