GPT-5 “Retry” Behavior and Cross-Session Context Contamination

Written by Shayell Aharon | Aug 14, 2025 6:41:14 PM

In AI security, small interface features can sometimes surface unexpected behaviors. Our research team observed — and reproduced across multiple accounts and sessions — an unusual GPT-5 interaction pattern that, under specific error conditions, could lead to cross-session context contamination. While we have not confirmed exposure of actual sensitive user data, the mechanics are noteworthy for both their novelty and their potential implications.

The Behavior in Detail: Error State + Retry

Our testing focused on how ChatGPT handles message length limits and the Retry button in GPT-5 sessions.

Sequence observed:

Trigger Condition – A user submits a message exceeding the system’s length limit, causing a message_length_exceeds_limit error.
No Turn State Stored – For this failed turn, no valid messages state is committed on the backend.
Retry Request Without Context – When Retry is clicked, the chat client sends an action: "variant" API call without the messages array.
Server Fallback – The server reconstructs context using only parent_message_id and cached data from prior interactions.
Unexpected Response Source – In some cases, the resulting reply appears unrelated to the user’s original prompt, suggesting it may have been generated using stale or mismatched context from a different conversation.

This is not the expected behavior for Retry, which should be deterministic and tied to the same conversation input.

Technical Factors Potentially Involved

The observed phenomenon likely arises from a combination of:

Cache reuse between sessions under certain key collisions
Race conditions in conversation state retrieval
Misbinding in parent_message_id to session mapping when the originating turn is invalid
Absence of explicit message payloads in variant requests

While these factors are speculative without full backend visibility, the repeated reproduction across different accounts strengthens the likelihood of a structural handling gap rather than an isolated glitch.

Reproduction Summary

Test Coverage: Multiple accounts, multiple GPT-5 sessions
Outcome: In all reproduction cases, an over-length prompt → Retry sequence led to an unrelated response
Variability: The unrelated response content differed run-to-run, but consistently failed to align with the triggering prompt

Why This Matters

Even without confirmed sensitive data exposure, such behavior represents a cross-context contamination risk:

User Trust & Reliability: Responses may contain irrelevant or unexpected material, reducing reliability in enterprise or regulated contexts.
Potential Data Leakage Vector: If context reconstruction pulls from other active sessions, there is a theoretical path to exposing other users’ content.

This type of fault highlights that error handling in LLM systems must be designed with the same rigor as their mainline conversation paths — especially when UI shortcuts like Retry are involved.

Conclusion

Our findings show that under specific conditions — an oversized prompt followed by Retry — GPT-5 can produce responses apparently sourced from unrelated context. This was observed across multiple accounts and sessions, suggesting a repeatable backend handling issue. While further investigation is needed to quantify the actual data exposure risk, the repeatability and nature of the fault make it worth the attention of both AI developers and security practitioners.

What’s Next?

Worried about “Retry” causing cross-session bleed? Knostic applies policy at the moment of generation, preventing oversharing even when chat sessions misbehave. Download the Solution Brief to see controls like context ring-fencing, prompt-level policy, and full audit trails.

View full post