Optimizing Discovery

·Tom

Signet is more than just the rollup. We’ve created tools and documentation that help developers work with Signet, but they’re only valuable when developers can discover them. Many companies have failed because nobody knew they existed.

I think about discovery in terms of pushing and pulling information:

  • Businesses push information through marketing, social media, and outreach
  • Users pull information by visiting our documentation directly
  • Robots pull our content with crawlers, then push it through search engines and LLMs

The Research

We need to understand how humans and robots discover content.

What search engines want

Search engines want fast sites. Google’s crawl budget documentation is explicit:

“If the site responds quickly for a while, the limit goes up… If the site slows down or responds with server errors, the limit goes down and Googlebot crawls less.”

Bing determines crawl frequency based on content freshness. Yandex’s 2023 source code leak revealed technical performance factors like server errors and page speed directly impact ranking.

Google also only considers the mobile version of a site for rankings and uses Core Web Vitals to break ties between similar competitors.

What LLMs want

Unlike search engines, LLM crawlers don’t execute JavaScript. Structured data has been found to reduce hallucinations and DOM semantics significantly improve LLM extraction accuracy.

That means that semantic HTML, Schema.org markup, and proper element hierarchy help LLMs understand and cite content.

The overlap

Robots want plain text documents linked together in a predictable, consistent structure, delivered as fast as possible. Humans want the same, plus visuals, interactivity, and tone.

For HumansFor RobotsFor Both
VisualsCrawl directivesSpeed
InteractivityCanonical URLsMobile responsive
Navigation UIStructured metadataSemantic markup
Tone & voiceMeta descriptionsFresh content

That gives us some clear priorities:

  1. Speed - Faster pages get crawled more, rank better, convert better
  2. Semantic structure - Proper HTML and structured data improve both search ranking and LLM accuracy
  3. No JavaScript dependency - LLM crawlers can’t execute JS, so core content must be in HTML

How fast is fast enough?

While faster is always better, there’s a critical breakpoint called IW10.

TCP connections can’t send unlimited data immediately. To prevent network congestion servers use slow start, beginning with a small congestion window that grows with each acknowledged packet. RFC 6928 sets the initial window at 10 segments of 1460 bytes: 14,600 bytes maximum before the server must wait for acknowledgment.

That means that if we keep our total initial bundle under 14.6KB then pages would load in one network round-trip. Each additional 14.6KB requires another round-trip.

TCP Initial Window (14.6KB limit)

First round-trip capacity:
┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐
│ 1460 │ 1460 │ 1460 │ 1460 │ 1460 │ 1460 │ 1460 │ 1460 │ 1460 │ 1460 │ bytes
└──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘
= 14,600 bytes (14.6KB)

Exceed this: Server waits for ACK before sending more
Impact of Multiple Round-Trips

13.5KB page:
░░░░░▐████████▌ 67ms ✓
├──── 67ms ────┤

30KB page:
░░░░░░▐████████▌░░░░░░▐████████▌░░░░░░▐████████▌ 201ms ✓
├──── 67ms ────┤├──── 67ms ────┤├──── 67ms ────┤
RTT 1 RTT 2 RTT 3

50KB page:
░░░░░░▐████████▌░░░░░░▐████████▌░░░░░░▐████████▌░░░░░░▐████████▌ 268ms ✓
├──── 67ms ────┤├──── 67ms ────┤├──── 67ms ────┤├──── 67ms ────┤
RTT 1 RTT 2 RTT 3 RTT 4

Each RTT is about 67ms. On mobile networks, 100ms+ RTT delays are common.

The basics

I often think about Dan Abramov’s “You Might Not Need Redux”. He discusses how frameworks (even his own) are a tradeoff that introduce constraints. “If you trade something off, make sure you get something in return.”

There are many really good reasons to use a framework, but for our purposes, we didn’t feel the return was there. React costs ~46KB gzipped and would be a non-starter for IW10.

With that in mind, we can tackle the basic requirements:

Implementing these basics landed us at about 28KB bundle size per page. Two round-trips.

Finding optimizations

Improve minification and compression (28KB → 21KB)

There were some easy tradeoffs to make:

  • Strip Schema.org JSON-LD to essential fields
  • Inline only critical CSS and JS, defer everything else
  • Improve build-time minification
  • Switch Cloudfront’s compression for build-time pre-compression

Further optimizations came with stronger tradeoffs. For example, we could defer CSS and JS loading entirely, but that harms our core web vitals by introducing layout shifts and creates a FOUC.

Service Worker streaming injection (21KB → 13.5KB)

Browsers only need styles present when rendering and not necessarily in the HTML payload itself.

Service Workers (not to be confused with Web Workers) can intercept network requests and use TransformStream to modify responses in transit rather than waiting for them to fully arrive. That means we can inject CSS and JS during HTML streaming:

 1// Intercept HTML and inject CSS during transfer
 2async function handleNavigation(request) {
 3  const cache = await caches.open("pages-v1")
 4  const response = await cache.match(request)
 5  if (!response) return fetch(request)
 6
 7  return new Response(
 8    response.body
 9      .pipeThrough(new TextDecoderStream())
10      .pipeThrough(
11        new TransformStream({
12          transform(chunk, controller) {
13            if (chunk.includes("</head>")) {
14              chunk = chunk.replace(
15                "</head>",
16                `<style>${componentCSS}</style></head>`
17              )
18            }
19            controller.enqueue(chunk)
20          },
21        })
22      )
23      .pipeThrough(new TextEncoderStream()),
24    { headers: response.headers }
25  )
26}

This change let us aggressively prune our inlined code. The HTML itself was only 10.5KB. The critical CSS and JS we previously inlined are injected during transfer by the Service Worker so that the browser has a fully-styled document by the time it needs it.

This solution wasn’t a panacea. First-time visitors still experience a slight FOUC (typically ~50ms) while the Service Worker installs in the background. After installation, though, all subsequent visits are instant and fully styled.

Instant navigation

We can also improve browsing speed for humans by aggressively caching and prefetching content:

  1. bfcache provides instant restoration when users hit back/forward buttons. It preserves the full page state: JavaScript execution context, scroll position, form inputs.

  2. Speculation Rules API enables declarative prefetching:

     1<script type="speculationrules">
     2  {
     3    "prefetch": [
     4      {
     5        "source": "document",
     6        "where": { "href_matches": "/docs/*" },
     7        "eagerness": "moderate"
     8      }
     9    ]
    10  }
    11</script>
  3. Custom hover prefetching prefetches and caches pages when users hover over links. If they move the mouse away before a timeout window the prefetch cancels, avoiding wasteful prefetches.

Combined with Service Worker injection and standard background prefetching, navigation typically resolves from memory in ~5-10ms.

The complete system flow

Complete Navigation and Caching Flow


┌─────────────────────────────────────────────────────────────────┐
│ CACHE POPULATION (How pages get into cache)                     │
├─────────────────────────────────────────────────────────────────┤
│ • Speculation Rules: /docs/\* prefetch                          │
│ • Custom prefetch: All links after hover                        │
│ • Background prefetching for high-priority pages                │
└─────────────────────────────────────────────────────────────────┘


Navigation Event
│
├──┬─► Back/Forward Button
│  └─► bfcache: Instant full page restore ✓
│
└──┬─► Regular Click
   │
   ├─► Tier 0: Memory Map (JS) ───────────────► ~0ms
   │ 25 pages, session-only, 60% hit rate
   │ └─► HIT: DOM swap ✓ │ MISS ↓
   │
   ├─► Tier 1: Service Worker Cache ──────────► ~10ms
   │ 200+ pages, persistent, 90% hit rate
   │ └─► HIT: CSS Streaming Injection
   │ ┌────────────────────────────────────────┐
   │ │ 1. Fetch HTML (10.5KB, no CSS or JS)   │
   │ │ 2. Pipe through TextDecoderStream      │
   │ │ 3. TransformStream:                    │
   │ │     • Find header tag                  │
   │ │     • Inject CSS and JS                │
   │ │ 4. Pipe through TextEncoderStream      │
   │ │ 5. Browser receives fully styled doc   │
   │ └────────────────────────────────────────┘
   │ Return (10ms, 0ms FOUC) ✓ │ MISS ↓
   │
   ├─► Tier 2: Browser HTTP Cache ──────────────► ~10ms
   │ ~100MB, Cache-Control headers, 95% hit rate
   │ └─► HIT ✓ │ MISS ↓
   │
   ├─► Tier 3: CloudFront Edge ─────────────────► ~12ms
   │ Unlimited storage, 24hr TTL, 99.9% hit rate
   │ └─► HIT: Brotli level 5 compressed ✓ │ MISS ↓
   │
   └─► Tier 4: S3 Origin ───────────────────────► ~118ms
Permanent storage, 0.1% hit rate

Effective latency: First visit ~12ms | After 3 pages ~2ms average

Low cost experiments: llms.txt

No major LLM provider officially supports llms.txt yet, but there are signals that we may be headed toward adoption of the standard. Anthropic maintains comprehensive files for their documentation. The implementation cost is trivial, so it makes sense to include. We also added a “Copy for LLMs” button on documentation pages for direct markdown access.

Start typing to search documentation...