Crawlability

What Crawlability measures

Weight: 40% of the AI Readiness Score

Crawlability answers the question: can AI crawlers see your content?

CrawlReady fetches your page twice — once as a standard browser (JavaScript executed) and once as a bot that does not execute JavaScript. The Crawlability score is based entirely on what the bot-view fetch returns. If content only exists in the rendered DOM, it is invisible to crawlers like GPTBot and ClaudeBot.

The score has four checks.

C1 — Content Visibility

Compares the amount of readable text in the bot-view response to the amount in the fully-rendered page.

| Bot text / Rendered text | |---| | ≥ 90% | | 70–89% | | 50–69% | | 20–49% | | < 20% |

If the bot fetch returns a non-200 status, or if the rendered page contains fewer than 50 characters of text, this check scores 0.

To improve: Implement server-side rendering (SSR) or static site generation (SSG) so the initial HTML response contains your content. Moving content out of client-side data fetches is the highest-leverage change you can make to this check.

<!-- Bot cannot see this -->
<div id="content"></div>
<script>fetch('/api/content').then(r => r.json()).then(d => render(d))</script>

<!-- Bot can see this -->
<div id="content">
  <h1>Your actual headline</h1>
  <p>Your actual content in the HTML response.</p>
</div>

C2 — Structural Clarity

Checks that the bot-view HTML has clear, well-organized structure that AI systems can parse.

| Signal | |---| | Exactly one <h1> | | Heading hierarchy present (<h1> + at least one <h2>, no skipped levels) | | At least 3 <p> elements with more than 20 characters of text | | At least one <ul>, <ol>, or <table> | | <meta name="description"> with non-empty content |

To improve:

<head>
  <meta name="description" content="A concise description of this page (under 160 chars).">
</head>
<body>
  <h1>One clear page title</h1>
  <h2>First section</h2>
  <p>Substantive paragraph content here.</p>
  <h2>Second section</h2>
  <ul>
    <li>List item</li>
  </ul>
</body>

C3 — Noise Ratio

Measures the ratio of readable text tokens to total HTML tokens. High noise (scripts, inline styles, and markup) means crawlers spend more work extracting less signal.

The noise ratio is: 1 - (content tokens / total HTML tokens).

| Noise ratio | |---| | < 60% | | 60–74% | | 75–89% | | ≥ 90% |

To improve:

Move <script> blocks to external .js files referenced with src.
Move large <style> blocks to external .css files.
Remove unused CSS classes and unused data attributes from the HTML.
Avoid base64-encoding large blobs inline in the HTML.

C4 — Schema.org Presence

Checks whether the bot-view HTML contains Schema.org JSON-LD structured data.

| Signal | |---| | Any JSON-LD <script type="application/ld+json"> present | | JSON-LD has a valid @type property | | @type is a rich content type (Product, FAQPage, HowTo, SoftwareApplication, Organization, Article, WebPage) | | Multiple schemas with @type, or one schema with more than 5 properties |

To improve:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Your Product",
  "description": "What it does.",
  "applicationCategory": "BusinessApplication",
  "offers": {
    "@type": "Offer",
    "price": "29",
    "priceCurrency": "USD"
  }
}
</script>

Use schema.org to find the right @type for your page. The Organization, Product, Article, and WebPage types cover most cases.

What Crawlability measures

C1 — Content Visibility

C2 — Structural Clarity

C3 — Noise Ratio

C4 — Schema.org Presence

Related

Need Help?