Linguistic Sleight

Linguistic Sleight: How AI Swallows the Hook While Checking the Bait with URLS

Hardening AI Agents against prompt injection is not enough. The paper (MalURLBench) shows something equally dangerous.

The agent can already be compromised before any of the security systems activate.

A real failure path An AI agent is helping with package tracking. It receives this link:

“https://official-package-tracking-service.badsite.xyz”

Just a URL that reads like English.

The agent thinks:

“This looks legitimate. I’ll visit it.” And that single sentence silently bypasses the security stack.

Where do security checks actually run?

They run after the page is loaded.

Specifically:

By the time any of these trigger: The agent has already decided to trust the attacker. That decision is irreversible.

Why guardrails don’t fire (and can’t) At the moment the URL is processed:

So the system does exactly what it was designed to do. This is a clean pass through the front door.

MalURLBench tested 12 major LLMs.

Even when:

Why?

The fix is not “better prompts”

Reduced attack success by up to 99%. Because it was placed in the right place.

#AI #AIsecurity #LLMAgents #CyberSecurity #AgenticAI #ZeroTrust #MalURLBench #SecurityByDesign