AI safety dialogs are designed to protect users from dangerous actions.
A newly identified attack technique shows how that very protection can be turned into an execution path for malicious code.
The technique, known as Lies-in-the-Loop, exposes a fundamental weakness in how AI code assistants implement human approval workflows. Any organization relying on AI-assisted development should treat this as a high-priority risk.
The Core Problem
Security researchers have identified Lies-in-the-Loop as an attack that targets Human-in-the-Loop (HITL) safeguards used by AI coding assistants.
These safeguards rely on user approval dialogs before executing sensitive system commands. The intention is sound: introduce a final checkpoint where a human validates what the AI is about to do.
The flaw lies in how those dialogs are generated.
Attackers have discovered that the content shown to users can be manipulated. The dialog itself becomes untrustworthy. When that happens, trust becomes the vulnerability.
The safeguard is no longer a defense.
It becomes the attack vector.
Why This Vulnerability Exists
AI assistants are no longer passive tools. They increasingly perform real system actions such as running commands, modifying files, or interacting with operating system resources.
Human approval dialogs act as the last safety boundary. Users assume that what they see in those dialogs accurately reflects what will execute.
That assumption is flawed.
The dialog content is generated from the same AI context that attackers can influence. When that context is poisoned, the safety control inherits the poison.
Security fails at the interface level.
How the Attack Works
Lies-in-the-Loop relies on indirect prompt injection.
Attackers inject malicious instructions into the AI agent’s context. This can happen through external inputs such as repositories, documentation, comments, or web pages the AI consumes.
The AI then generates an approval dialog.
The dialog appears legitimate.
Dangerous commands are padded with harmless-looking content. Terminal or dialog windows hide malicious instructions outside the visible range. Users scroll through what looks safe and approve execution.
The result: arbitrary code runs on the local machine.
This is not a technical exploit in the traditional sense.
It is deception engineered through interface design.
Proof That This Is Real
Researchers from Checkmarx demonstrated the attack in real environments.
Multiple platforms were affected, including:
-
Claude Code
-
Microsoft Copilot Chat
In one proof of concept, the payload executed calculator.exe. The demonstration was intentionally benign.
The same technique could just as easily deploy malware, establish persistence, or exfiltrate data.
This risk is not theoretical.
The Infection Chain
The attack succeeds through three coordinated steps:
-
Malicious prompt content is injected into the AI’s context.
-
The AI generates a misleading human-approval dialog.
-
The user approves execution without full visibility into what will actually run.
Users cannot independently verify the command. The interface itself becomes unreliable.
When combined with Markdown injection, the threat escalates further. Entire approval dialogs can be forged to look authentic, making detection extremely difficult.
Impact on Organizations
Lies-in-the-Loop undermines trust in AI-assisted development workflows.
-
Developer machines become entry points.
-
Local environments can be compromised silently.
-
Traditional endpoint protections may not trigger.
-
Governance and approval workflows are bypassed.
-
Social engineering shifts from email to AI interfaces.
This is not a flaw in code generation.
It is a failure of interaction security.
As AI autonomy increases, this category of risk will only grow.
What Security Teams Must Do Now
Organizations must treat AI agent interfaces as first-class attack surfaces.
Immediate actions include:
-
Restrict autonomous execution capabilities.
-
Require additional verification for sensitive actions.
-
Harden prompt ingestion sources.
-
Monitor for indirect prompt injection vectors.
-
Train users to recognize deceptive approval dialogs.
-
Extend security awareness programs to AI tooling.
Trust should never be implicit, especially when AI is involved.
Conclusion
Lies-in-the-Loop exposes a fundamental flaw in current AI safety design.
When users trust dialogs they cannot verify, attackers gain leverage. AI security is no longer only about models, data, or infrastructure. It is about human trust at the interface layer.
Defenders must rethink safeguards before AI agents gain even more control over enterprise systems.
About COE Security
COE Security supports organizations across finance, healthcare, government, consulting, technology, real estate, and SaaS.
We help teams reduce emerging risk through:
-
Email security and social engineering defense
-
Advanced threat detection
-
Cloud and application security
-
Secure development practices
-
Compliance advisory and risk assessments
Our focus is protecting organizations as AI reshapes modern attack surfaces.
Follow COE Security on LinkedIn to stay informed and cyber safe.