Make Your Docs Agent-Ready: MDX to Markdown

TL;DR: The 7x Efficiency Gain

We rebuilt our documentation pipeline to treat AI agents as a first-class audience. By natively compiling MDX into clean, fully-resolved Markdown—rather than heavy HTML or unresolved source code—agents and LLMs can now ingest our docs 7x faster and with far higher accuracy.

Test it yourself:
Request our docs with the Markdown accept header to see the agent-optimized view:

curl -s -H "Accept: text/markdown" https://img.ly/docs/cesdk/js/settings-970c98/

The Documentation Mismatch: Humans vs. Agents

If you build developer tools today, you have two distinct audiences: human engineers and the AI agents they use to write code. Recently, we realized we were treating them exactly the same.

Our docs pipeline is built on MDX and optimized for the browser. But when an LLM tries to ingest a page, it has to wade through navigation chrome, layout wrappers, and raw JSX. We didn't just need a "clean text" mode; we needed a fundamentally different architecture.

Humans vs. Agents: A Comparative Overview

Feature	Humans (Browser)	Agents (Context Window)
Primary Format	HTML + CSS + JS	Clean Markdown
Navigation	Visual UI (Sidebar/Tabs)	Explicit links + hierarchy
Context	Implicit (Site layout)	Explicit (Frontmatter + headers)
Constraints	Performance / Core Web Vitals	Token / Context Window budgets
Payload Size	~222 KB	~31 KB (7x smaller)

The Problem: Why HTML and Raw MDX Fail Agents

1. HTML is expensive "Chrome"

A documentation site is an application. Even static sites ship navigation chrome, scripts, and styling hooks. For AI, this is "token noise." You are paying for bytes that provide zero value to the LLM and forcing the agent to reconstruct a hierarchy that you already had at authoring time.

2. Raw MDX is unresolved source code

Serving raw MDX files doesn't solve the problem either. MDX is for maintainers—it is full of imports and unresolved dependencies. Our documentation often pulls code from external, tested repositories:

## Initialize the Engine
<CodeBlock file="examples/getting-started/src/index.ts" lines="12-24" />

In the browser, this renders beautifully. In raw MDX, it’s a pointer to a file the agent cannot see. The actual code isn't there.

The Solution: Treat Markdown as a Compilation Target

We stopped thinking about this as "exporting text" and started treating it as what it really is: a second build target.

Old Pipeline: MDX (Source) → HTML (Browser View)
New Pipeline: MDX (Source) → HTML AND MDX (Source) → Markdown (Agent View)

Why we didn't just "convert" HTML

We tried rendering to HTML and then running a converter. It failed at code blocks. Syntax highlighting adds spans and wrappers around tokens; reversing that into clean, trustworthy code is nearly impossible.

<pre><code class="language-js">
  <span class="token keyword">const</span>
  <span class="token variable">engine</span>
  ...
</code></pre>

By working at the AST (Abstract Syntax Tree) level before rendering, we avoid this entirely.

Implementation: Transforming MDX at the AST Level

Rather than trying to reconstruct meaning from rendered markup, we preserve it directly. We built a remark plugin that transforms MDX at the MDAST level.

import { remark } from "remark";
import remarkMdx from "remark-mdx";
import remarkStringify from "remark-stringify";
import { remarkTransformForExport } from "./remarkTransformForExport";

const processor = remark()
  .use(remarkMdx) // Parse MDX/JSX nodes
  .use(remarkTransformForExport, {
    baseUrl: "[https://img.ly/docs/cesdk/](https://img.ly/docs/cesdk/)",
    paths: resolvedPathMap,
  })
  .use(remarkStringify); // Back to markdown

const markdown = await processor.process(mdxContent);

The Component Contract

Every MDX component must define how it exports to the agent view. We colocate the transform directly with the component: Aside.astro (Human UI) ↔ Aside.toMarkdown.ts (Agent logic).

Example: Aside Component → Blockquote

Input: <Aside title="Pro Tip">Use the basePath...</Aside>
Output: > **Pro Tip:** Use the basePath...

Example: Resolving Code References This is the most critical transform. Instead of a file pointer, the agent gets the actual, inlined code block fetched during the build process.

When you strip the sidebar and footer, you lose context. We reintroduce "navigational chrome" as text-native primitives at the top and bottom of every Markdown file:

---
title: "Working with Filters"
platform: "react"
url: "[https://docs.example.com/react/guides/filters/](https://docs.example.com/react/guides/filters/)"
---

> You’re reading the React docs. For the full corpus, see llms-full.txt.

**Path:** [Home](https://docs.example.com/) > [Guides](https://docs.example.com/guides/)
---

[Self-contained Page Content]

---

## Continue Reading
- [Color Adjustments](https://docs.example.com/react/guides/color-adjustments/)
- [API Reference](https://docs.example.com/api/)

Lessons for Scaling Agentic DX

If you maintain docs at scale, "optimizing for bots" is no longer optional—it is the new standard for Developer Experience (DX).

Don't serve HTML and hope: Reverse-engineering structure is hard for AI. Give it the structure directly.
Every component needs an identity: If a component carries meaning, it needs a Markdown equivalent. If it’s just layout, unwrap it.
Resolve everything: Assume zero ambient context. Links must be absolute, and code must be inlined.
Content Negotiation: Use the Accept: text/markdown header. It is becoming the industry standard for tools like Claude Code and OpenCode.
The Single-File Corpus: In addition to per-page files, generate a llms-full.txt that concatenates everything. Some agents prefer one large fetch over a crawl.

By acknowledging that AI agents are a primary consumer of our documentation, we’ve made our SDK significantly easier to integrate. The future of docs isn't just "readable". It's "storable and traversable."

Note: We built this for CE.SDK. The implementation uses Astro and Vercel, but the approach is framework-agnostic.

Make Your Docs Agent-Ready: Compiling MDX into Markdown