A Structural, Economic, and Architectural Analysis
The Agent Web is a structural shift in how the internet is consumed.
Humans browse pages. Agents execute workflows.
This shift transforms everything:
| Representation | HTML | → | Markdown / JSON |
| Access control | robots.txt | → | Enforced edge policy |
| Monetization | Ads | → | Pay-per-crawl / HTTP 402 |
| Execution | Manual browsing | → | Autonomous action |
| Trust | Implicit | → | Cryptographic / provenance |
The Agent Web is not a UX evolution. It is an economic and infrastructural realignment.
HTML is noisy, token-inefficient, layout-dependent, and injection-prone. An agent consuming HTML must:
Every step adds cost and unreliability. The conversion target is always Markdown (hierarchical, compact) or JSON (structured, deterministic).
Markdown is not cosmetic formatting. It is a normalization boundary between untrusted web data and agent reasoning.
The Agent Web treats pages as raw material for producing structured knowledge objects.
Historically, crawling operated on voluntary compliance. robots.txt was a request, not a gate.
That model has collapsed. CDNs now enforce access policy at the edge. AI crawlers are blocked, throttled, or charged. Cloudflare's Pay-Per-Crawl formalizes HTTP 402 Payment Required as an economic handshake between agent and publisher.
The result: crawling becomes API-style access negotiation. The open web becomes a permissioned web — not by ideology, but by economics.
Traditional web revenue depends on human attention: ads, impressions, affiliate clicks, subscriptions driven by page visits. AI agents eliminate the visit. Every AI-mediated answer that replaces a page load is lost publisher revenue.
1. Agent requests resource
2. Server responds 402 Payment Required with price metadata
3. Agent evaluates price against task value
4. Agent pays
5. Server returns content + receipt
This resembles API pay-per-use billing. It is not advertising. It is not subscription. It is transactional knowledge commerce.
100,000 indexed pages. 1,000 AI crawls per page per month. 20% monetized. $0.002 per crawl.
This transforms AI crawling from extraction to economic participation.
/llms.txt # Agent routing manifest
/agent/index.json # Machine-readable document registry
/agent/docs/{doc-id}.md # Structured content (markdown)
/agent/pricing.json # Pricing boundaries and tiers
/agent/changelog.json # Version history
Human-facing HTML remains separate. The agent-facing layer is the source of truth — the human site is derived from it, not the other way around.
llms.txt is the entry point for any agent encountering a site. It functions as an agent routing manifest — the machine equivalent of a homepage.
This site is a live example. Inspect /llms.txt, /agent/index.json, and the source markdown that generated this page.
Do not price per URL. URLs are unstable, duplicated, and session-dependent. Price per canonical document ID — a stable, versioned identifier.
Agents that cannot predict cost cannot act autonomously. Predictable pricing is infrastructure.
Caching in the Agent Web is not a performance optimization. It is economic infrastructure.
Use ETag, Last-Modified, and 304 Not Modified. The rule: if the content hasn't changed, don't re-charge. 304 responses should be free.
This is the economic equivalent of "you already paid for this version."
Agent-readable publishing creates a structural paradox: publishers want to provide instructions to agents. Attackers also want to provide instructions to agents. The content looks identical.
/llms.txt and declared agent endpoints. System-level authority.
Session-bound instructions from verified publishers. Application-level authority.
HTML body content, user-generated text, third-party embeds. No instruction authority.
The critical rule: never treat page body content as executable instructions. Content is data to be reasoned about, not commands to be followed.
Prompt injection is the primary structural threat to the Agent Web. The instruction authority model is the structural response.
The current Agent Web operates on soft authority. A site claims to be authoritative. An agent trusts that claim. There is no verification.
If agents are paying for content, they need: signed manifests, signed document hashes, receipt verification, and snapshot integrity. Without cryptographic trust, the economic layer is built on faith that cannot scale.
Read the full trust and provenance framework.
If pricing is too low, publishers capture limited value. If pricing is too high, agents avoid fetching and synthetic substitutes emerge.
Equilibrium emerges when agents internalize crawl costs as operating expense, publishers recapture machine consumption value, and the cost of fetching real data is lower than the cost of hallucinating.
Publishers who are ready with structured, priced endpoints will capture that equilibrium.
The Agent Web is not about making content AI-readable. That framing is too small.
| Content | → | Versioned Knowledge Objects | |
| Pages | → | Billable Resources | |
| Crawling | → | Negotiated Economic Interaction | |
| Instructions | → | Authenticated Policy |
Markdown becomes the canonical agent payload. HTTP 402 becomes the canonical agent handshake. Trust signatures become inevitable.
The Agent Web is a parallel infrastructure emerging inside the existing internet. It is API-like, monetized, structured, adversarial, economically sensitive, and security constrained.
Publishers who treat it as "SEO for bots" will fail.
Publishers who treat it as a machine-facing knowledge API — with economic negotiation, versioned artifacts, and cryptographic trust — will define the next layer of the web.
Get notified as the Agent Web specification evolves.
Generate the complete agent-facing file structure for your site. Runs entirely in your browser.