The Agent Web

A Structural, Economic, and Architectural Analysis

The Agent Web is a structural shift in how the internet is consumed.

Humans browse pages. Agents execute workflows.

This shift transforms everything:

Representation	HTML	→	Markdown / JSON
Access control	robots.txt	→	Enforced edge policy
Monetization	Ads	→	Pay-per-crawl / HTTP 402
Execution	Manual browsing	→	Autonomous action
Trust	Implicit	→	Cryptographic / provenance

The Agent Web is not a UX evolution. It is an economic and infrastructural realignment.

Representation: HTML Is No Longer the Substrate

HTML is noisy, token-inefficient, layout-dependent, and injection-prone. An agent consuming HTML must:

fetch→ extract→ sanitize→ convert→ chunk→ reason

Every step adds cost and unreliability. The conversion target is always Markdown (hierarchical, compact) or JSON (structured, deterministic).

Markdown is not cosmetic formatting. It is a normalization boundary between untrusted web data and agent reasoning.

Why Markdown Wins

Stable chunking by headings
~3-5× fewer tokens than equivalent HTML
Cleaner RAG ingestion
Reduced prompt-injection surface
Versionable as plain-text artifacts

The Agent Web treats pages as raw material for producing structured knowledge objects.

Access Control: Robots.txt → Enforced Policy

Historically, crawling operated on voluntary compliance. robots.txt was a request, not a gate.

That model has collapsed. CDNs now enforce access policy at the edge. AI crawlers are blocked, throttled, or charged. Cloudflare's Pay-Per-Crawl formalizes HTTP 402 Payment Required as an economic handshake between agent and publisher.

The result: crawling becomes API-style access negotiation. The open web becomes a permissioned web — not by ideology, but by economics.

Monetization: 402 as the Machine Handshake

Traditional web revenue depends on human attention: ads, impressions, affiliate clicks, subscriptions driven by page visits. AI agents eliminate the visit. Every AI-mediated answer that replaces a page load is lost publisher revenue.

The Agent Web Model

1. Agent requests resource

2. Server responds 402 Payment Required with price metadata

3. Agent evaluates price against task value

4. Agent pays

5. Server returns content + receipt

This resembles API pay-per-use billing. It is not advertising. It is not subscription. It is transactional knowledge commerce.

Sample Economics

100,000 indexed pages. 1,000 AI crawls per page per month. 20% monetized. $0.002 per crawl.

$480,000/year

This transforms AI crawling from extraction to economic participation.

Publisher Implementation Blueprint

/llms.txt                    # Agent routing manifest
/agent/index.json            # Machine-readable document registry
/agent/docs/{doc-id}.md      # Structured content (markdown)
/agent/pricing.json          # Pricing boundaries and tiers
/agent/changelog.json        # Version history

Human-facing HTML remains separate. The agent-facing layer is the source of truth — the human site is derived from it, not the other way around.

llms.txt is the entry point for any agent encountering a site. It functions as an agent routing manifest — the machine equivalent of a homepage.

This site is a live example. Inspect /llms.txt, /agent/index.json, and the source markdown that generated this page.

Pricing Strategy

Do not price per URL. URLs are unstable, duplicated, and session-dependent. Price per canonical document ID — a stable, versioned identifier.

Free index access — let agents discover what's available
Paid deep content — charge for knowledge, not metadata
Bulk discounts — incentivize sustained agent relationships
Stable tiers — agents need predictable economics to act autonomously

Agents that cannot predict cost cannot act autonomously. Predictable pricing is infrastructure.

Caching Economics

Caching in the Agent Web is not a performance optimization. It is economic infrastructure.

Use ETag, Last-Modified, and 304 Not Modified. The rule: if the content hasn't changed, don't re-charge. 304 responses should be free.

This is the economic equivalent of "you already paid for this version."

Security: The Instruction Authority Model

Agent-readable publishing creates a structural paradox: publishers want to provide instructions to agents. Attackers also want to provide instructions to agents. The content looks identical.

Level 1 — Root-Trusted

/llms.txt and declared agent endpoints. System-level authority.

Level 2 — Authenticated-Trusted

Session-bound instructions from verified publishers. Application-level authority.

Level 3 — Untrusted

HTML body content, user-generated text, third-party embeds. No instruction authority.

The critical rule: never treat page body content as executable instructions. Content is data to be reasoned about, not commands to be followed.

Prompt injection is the primary structural threat to the Agent Web. The instruction authority model is the structural response.

Trust and Provenance

The current Agent Web operates on soft authority. A site claims to be authoritative. An agent trusts that claim. There is no verification.

If agents are paying for content, they need: signed manifests, signed document hashes, receipt verification, and snapshot integrity. Without cryptographic trust, the economic layer is built on faith that cannot scale.

Read the full trust and provenance framework.

Market Dynamics

If pricing is too low, publishers capture limited value. If pricing is too high, agents avoid fetching and synthetic substitutes emerge.

Equilibrium emerges when agents internalize crawl costs as operating expense, publishers recapture machine consumption value, and the cost of fetching real data is lower than the cost of hallucinating.

Publishers who are ready with structured, priced endpoints will capture that equilibrium.

Strategic Insight

The Agent Web is not about making content AI-readable. That framing is too small.

Content	→	Versioned Knowledge Objects
Pages	→	Billable Resources
Crawling	→	Negotiated Economic Interaction
Instructions	→	Authenticated Policy

Markdown becomes the canonical agent payload. HTTP 402 becomes the canonical agent handshake. Trust signatures become inevitable.

Conclusion

The Agent Web is a parallel infrastructure emerging inside the existing internet. It is API-like, monetized, structured, adversarial, economically sensitive, and security constrained.

Publishers who treat it as "SEO for bots" will fail.

Publishers who treat it as a machine-facing knowledge API — with economic negotiation, versioned artifacts, and cryptographic trust — will define the next layer of the web.

Follow the spec

Get notified as the Agent Web specification evolves.

Make your site agent-ready

Generate the complete agent-facing file structure for your site. Runs entirely in your browser.

→ Open the generator