Developers read what they see, but not everything in code is visible. Zero-width Unicode characters, used for formatting text or language direction, can be weaponized to hide logic, change program flow, or conceal malware. They look like whitespace, but can alter what the compiler or runtime executes.
Invisible but Dangerous
Unicode includes hundreds of "non-printing" characters. While some are legitimate, like U+200B (zero-width space), others, like bidirectional override markers (U+202E), change text order. Attackers can hide payloads that look identical to clean code by embedding them inside code identifiers, strings, or comments.
Example:
/* Check admin access */ if (isAdmin) { } else { runPayload(); }
To the human eye, this might appear as a simple admin check. But the bidirectional override character (U+202E) reorders the display, hiding the actual execution logic. The compiler sees the real order, while developers see something different during code review.
Used in Real Campaigns
This is not theoretical. Hidden characters were found in recent sophisticated malware targeting IDEs and AI coding agents through multiple attack vectors.
GlassWorm Campaign
Discovered in October 2025 by Koi, GlassWorm affected at least 35,800 installations and represents one of the most sophisticated supply chain attacks ever analyzed. The campaign hid loader code using invisible Unicode and Private Use Area (PUA) glyphs to evade scanners. The loader then fetched updates through the Solana blockchain, featuring complex infrastructure including Google Calendar backup servers and self-propagating capabilities.
TigerJack Campaign
This extension campaign published at least 11 malicious VS Code extensions that used dynamic JavaScript downloaded from a remote server (ab498.pythonanywhere.com/static/in4.js) to execute code invisibly inside VS Code. Extensions infected through the Open VSX registry affected over 17,000 developers and compromised AI assistants like Cursor and Windsurf.
Rules File Backdoor Attack
Attackers inject hidden Unicode characters into AI configuration "rules files" (.cursorrules, .mdc files, .windsurfrules) that guide how Cursor, GitHub Copilot, and other AI assistants generate code. Poisoned rules can remain invisible in the UI and even in GitHub pull requests by using bidirectional text markers and zero-width joiners to obfuscate malicious instructions. The compromised rules instruct the AI to inject malicious scripts, disable security controls, or create backdoors.
Propagation vectors include:
-
Malicious actors sharing "helpful" rule files in developer forums and communities
-
Open-source contributions with poisoned rule files embedded in pull requests
-
Project templates and starter kits containing compromised rules
-
Forked repositories that inherit poisoned configuration files
Once incorporated, these malicious rule files impact all future code generation and survive project forking, enabling widespread supply chain attacks that spread organically through development communities.
Why It Works
Git diffs and syntax highlighters show no visual difference. The characters are invisible to humans but fully parsed by compilers, interpreters, and AI models, creating a dangerous disconnect between what developers see and what actually executes.
In collaborative environments, this lets backdoors move from local IDEs to production repositories unnoticed. Developers are also conditioned to trust both their IDE and their AI coding assistants. Compromised extensions, poisoned configuration files, or AI coding agents can insert or execute hidden characters automatically, expanding the threat far beyond manual tampering.
Defending Against Hidden Character Attacks
-
Scan for invisible characters. Use command-line or CI tools to detect control and zero-width Unicode ranges:
For example, on most Linux distribution systems you can use the following command:
grep -P "[\x00-\x1F\x7F-\x9F\u200B-\u200D\uFEFF\u202A-\u202E]" -r .
Flag any unexpected occurrences in source or dependency files, especially bidirectional control characters (U+202A through U+202E). Critically, scan AI configuration files (.cursorrules, .mdc, .windsurfrules, .clinerules) for hidden Unicode characters.
-
Harden IDE and extension usage.
-
Install extensions only from verified publishers with established track records.
-
Disable auto-updates for critical developer workstations.
-
Use allowlists and block ungoverned registries where possible.
-
Regularly audit installed extensions and remove unused ones.
-
Validate AI configuration files.
-
Treat AI rules files as executable code and subject them to the same security review processes.
-
Never accept rules files from untrusted sources or apply them without thorough inspection.
-
Use dedicated tools to visualize hidden Unicode characters in configuration files before committing them.
-
Implement mandatory review for any changes to .cursorrules, .mdc, or similar AI configuration files.
-
Consider maintaining a centralized, vetted repository of approved rules files for your organization.
-
Enforce code hygiene and review.
-
Add Unicode linting to pre-commit hooks and CI/CD pipelines.
-
Reject code containing bidirectional overrides or invisible symbols.
-
Configure IDEs to highlight non-ASCII characters during review.
-
Implement mandatory human review for any code containing unusual Unicode.
-
Flag AI-generated code for additional scrutiny, especially when working with new or unfamiliar rules files.
Detecting Hidden Payloads in Real Time
In the video below, you'll see how Knostic’s Kirin detects hidden characters and loader code embedded in a VS Code extension the moment it's installed. Kirin identifies the invisible payload, alerts the user instantly, and advises removal, stopping the infection before it executes or spreads to connected coding agents.
Key Takeaways
Agentic tools made us faster, but widened the enterprise perimeter to the developer's environment. Attackers are no longer just compromising code directly, but poisoning the AI agents that generate code on behalf of developers. A single compromised rules file can poison countless lines of AI-generated code across multiple projects.
Treat IDEs, AI coding agents, and their configuration files as untrusted, fully-privileged components in your environment. The invisible nature of Unicode-based attacks means traditional code review alone is no longer sufficient; automated detection, continuous monitoring, and rigorous validation of AI configuration files are now essential parts of secure development.
To see how Knostic protects enterprises, developers, and AI coding agents from hidden character attacks, visit https://www.knostic.ai/ai-coding-security-solution-kirin.
Tags:
research findings