Cross icon
Test your LLM for oversharing!  Test for real-world oversharing risks with role-specific prompts that mimic  real workplace questions. FREE - Start Now
protect icon

A new era requires a new set of solutions
Knostic delivers it

Skip to main content
Skip to main content

Let's talk about prompt injection detection, and how with relative ease, we can improve it significantly.

Most detection systems are trained on specific public datasets. We asked: Why shouldn’t we train a model based on all available datasets? A straight forward approach, but it yielded significant results.

Full credit goes to Andreas Pung and Eddie Aronovich.

Our Process: Evaluating Leading Pre-trained Models

Our new model, enhancing Microsoft’s DeBERTaV3, boasted approximately 99% accuracy, compared to 90% we observed with other models, a significant reduction in false positives. In our evaluation, the model achieved a false positive rate of 0.002 which means an average of one false positive for every 500 detections, surpassing a well-accepted public model rate of 0.017 which stands for a false positive every 60 detections. 



Challenges in Dataset Integration: Overcoming Data Engineering Hurdles

We initially assessed two leading pre-trained models: protectai/deberta-v3-base-prompt-injection and deepset/deberta-v3-base-injection, which boast an accuracy of over 99% on the HuggingFace platform. We then evaluated their performance on 17 HuggingFace and GitHub datasets, including deepset/prompt-injections, JasperLS/prompt-injections, Harelix/Prompt-Injection-Mixed-Techniques-2024, imoxto/prompt_injection_cleaned_dataset-v2, and others, to gauge their real-world efficacy.

During testing, we couldn't replicate the model performance to the published HuggingFace metrics, when tested for our needs. The accuracy achieved while running the 'protectai/deberta-v3-base-prompt-injection' model was 90%, compared to the reported 99.99%, and 84% for the 'deepset/deberta-v3-base-injection' model, compared to the reported 99.14%.

Given the critical role of dataset diversity in fine-tuning optimization, we consolidated the 17 HuggingFace and GitHub models which contain prompt injection and jailbreak data into a single, comprehensive dataset, yielding superior accuracy in detection.



Challenges in Dataset Integration: Overcoming Data Engineering Hurdles

Integrating disparate datasets posed data engineering challenges, including gathering, standardizing, preprocessing, and rectifying label discrepancies across multiple open-source and disparate datasets.

For example, we standardized the representation of True/False values across various boolean notations, account for missing values, and clean up duplicates that resulted from merging processes that could potentially improve the reliability of our data split, ensuring robust training and evaluation processes.



Summary: The Effectiveness of Our Technique

The technique proved effective, and we hope our efforts help the community in building better detection systems. We’d love to geek out on similar ideas!

Lastly, if you had better success with replicating the model performance results from Hugging Face, please do let us know!

For regular updates and insights from Knostic research, follow us on Linkedin.

 

Data Leakage Detection and Response for Enterprise AI Search

Learn how to assess and remediate LLM data exposure via Copilot, Glean and other AI Chatbots with Knostic.

Get Access

Mask group-Oct-30-2025-05-23-49-8537-PM

The Data Governance Gap in Enterprise AI

See why traditional controls fall short for LLMs, and learn how to build policies that keep AI compliant and secure.

Download the Whitepaper

data-governance

Rethinking Cyber Defense for the Age of AI

Learn how Sounil Yu’s Cyber Defense Matrix helps teams map new AI risks, controls, and readiness strategies for modern enterprises.

Get the Book

Cyber Defence Matrix - cover

Extend Microsoft Purview for AI Readiness

See how Knostic strengthens Purview by detecting overshared data, enforcing need-to-know access, and locking down AI-driven exposure.

Download the Brief

copilot-img

Build Trust and Security into Enterprise AI

Explore how Knostic aligns with Gartner’s AI TRiSM framework to manage trust, risk, and security across AI deployments.

Read the Brief

Image-1

Real Prompts. Real Risks. Real Lessons.

A creative look at real-world prompt interactions that reveal how sensitive data can slip through AI conversations.

Get the Novella

novella-book-icon

Stop AI Data Leaks Before They Spread

Learn how Knostic detects and remediates oversharing across copilots and search tools, protecting sensitive data in real time.

Download the Brief

Solution Brief

Accelerate Copilot Rollouts with Confidence

Equip your clients to adopt Copilot faster with Knostic's AI security layer, boosting trust, compliance, and ROI.

Get the One-Pager

cover 1

Reveal Oversharing Before It Becomes a Breach

See how Knostic detects sensitive data exposure across copilots and search, before compliance and privacy risks emerge.

View the One-Pager

cover 1

Unlock AI Productivity Without Losing Control

Learn how Knostic helps teams harness AI assistants while keeping sensitive and regulated data protected.

Download the Brief

safely-unlock-book-img

Balancing Innovation and Risk in AI Adoption

A research-driven overview of LLM use cases and the security, privacy, and governance gaps enterprises must address.

Read the Study

mockup

Secure Your AI Coding Environment

Discover how Kirin prevents unsafe extensions, misconfigured IDE servers, and risky agent behavior from disrupting your business.

Get the One-Pager

cover 1
bg-shape-download

See How to Secure and Enable AI in Your Enterprise

Knostic provides AI-native security and governance across copilots, agents, and enterprise data. Discover risks, enforce guardrails, and enable innovation without compromise.

195 1-min
background for career

What’s next?

Want to solve oversharing in your enterprise AI search? Let's talk.

Knostic offers the most comprehensively holistic and impartial solution for enterprise AI search.

protect icon

Knostic leads the unbiased need-to-know based access controls space, enabling enterprises to safely adopt AI.