Complete all explanation pages

- why-discovery: Core rationale and evolution - robots-explained: robots.txt mechanics and best practices - llms-explained: AI assistant guidance and context - humans-explained: Human-readable credits and culture - security-explained: RFC 9116 responsible disclosure - canary-explained: Warrant canaries and transparency - webfinger-explained: RFC 7033 federated discovery - seo: Discovery files impact on search optimization - ai-integration: Strategy for AI-first discovery - architecture: Internal design and extensibility All pages follow Diátaxis explanation style: understanding-oriented, provide context, explain design decisions, discuss alternatives.
2025-11-08 23:33:54 -07:00 · 2025-11-08 23:33:54 -07:00 · 0191d08d14
commit 0191d08d14
parent 74cffc2842
2 changed files with 698 additions and 40 deletions
--- a/docs/src/content/docs/explanation/ai-integration.md
+++ b/docs/src/content/docs/explanation/ai-integration.md
@ -1,31 +1,264 @@
 ---
-title: AI Assistant Integration
+title: AI Assistant Integration Strategy
-description: How AI assistants use discovery files
+description: How AI assistants use discovery files and how to optimize for them
 ---
-Learn how AI assistants discover and use information from your site.
+The relationship between websites and AI assistants is fundamentally different from traditional search engines. Understanding this difference is key to optimizing your site for AI-mediated discovery.
-:::note[Work in Progress]
+## Beyond Indexing: AI Understanding
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+Search engines **index** your site - they catalog what exists and where. AI assistants **understand** your site - they build mental models of what you do, why it matters, and how to help users interact with you.
-This section will include:
+This shift from retrieval to comprehension requires different discovery mechanisms.
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+### Traditional Search Flow
- [Configuration Reference](/reference/configuration/)
+1. User searches for keywords
- [API Reference](/reference/api/)
+2. Engine returns ranked list of pages
- [Examples](/examples/ecommerce/)
+3. User clicks and reads
 4. User decides if content answers their question
-## Need Help?
+### AI Assistant Flow
- Check our [FAQ](/community/faq/)
+1. User asks conversational question
- Visit [Troubleshooting](/community/troubleshooting/)
+2. AI synthesizes answer from multiple sources
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+3. AI provides direct response with citations
 4. User may or may not visit original sources
 In the AI flow, your site might be the source without getting the click. Discovery files help ensure you're at least properly represented and attributed.
 ## The llms.txt Strategy
 llms.txt is your primary tool for AI optimization. Think of it as **briefing an employee** who'll be answering questions about your company.
 ### What to Emphasize
 **Core value proposition**: Not just what you do, but why you exist
 ```
 We're not just another e-commerce platform - we're specifically
 focused on sustainable products with carbon footprint tracking.
 ```
 This context helps AI assistants understand when to recommend you versus competitors.
 **Key differentiators**: What makes you unique
 ```
 Unlike other platforms, we:
 - Calculate carbon footprint for every purchase
 - Offset shipping emissions by default
 - Partner directly with sustainable manufacturers
 ```
 This guides AI to highlight your strengths.
 **Common questions**: What users typically ask
 ```
 When users ask about sustainability, explain our carbon tracking.
 When users ask about pricing, mention our price-match guarantee.
 When users ask about shipping, highlight our carbon-offset program.
 ```
 This provides explicit guidance for common scenarios.
 ### What to Avoid
 **Overpromising**: AI will fact-check against your actual site
 **Marketing fluff**: Be informative, not promotional
 **Exhaustive detail**: Link to comprehensive docs instead
 **Outdated info**: Keep current or use dynamic generation
 ## Coordinating Discovery Files
 AI assistants use multiple discovery mechanisms together:
 ### robots.txt → llms.txt Flow
 1. AI bot checks robots.txt for permission
 2. Finds reference to llms.txt
 3. Reads llms.txt for context
 4. Crawls site with that context in mind
 Ensure your robots.txt explicitly allows AI bots:
 ```
 User-agent: GPTBot
 User-agent: Claude-Web
 User-agent: Anthropic-AI
 Allow: /
 ```
 ### llms.txt → humans.txt Connection
 humans.txt provides tech stack info that helps AI answer developer questions:
 User: "Can I integrate this with React?"
 AI: *checks humans.txt, sees React in tech stack*
 AI: "Yes, it's built with React and designed for React integration."
 The files complement each other.
 ### sitemap.xml → AI Content Discovery
 Sitemaps help AI find comprehensive content:
 ```xml
 <url>
  <loc>https://example.com/docs/api</loc>
  <priority>0.9</priority>
 </url>
 ```
 High-priority pages in your sitemap signal importance to AI crawlers.
 ## Dynamic Content Generation
 Static llms.txt works for stable information. Dynamic generation handles changing contexts:
 ### API Endpoint Discovery
 ```typescript
 llms: {
  apiEndpoints: async () => {
    const spec = await loadOpenAPISpec();
    return spec.paths.map(path => ({
      path: path.url,
      method: path.method,
      description: path.summary
    }));
  }
 }
 ```
 This keeps AI's understanding of your API current without manual updates.
 ### Feature Flags and Capabilities
 ```typescript
 llms: {
  instructions: () => {
    const features = getEnabledFeatures();
    return `
 Current features:
 ${features.map(f => `- ${f.name}: ${f.description}`).join('\n')}
 Note: Feature availability may change. Check /api/features for current status.
    `;
  }
 }
 ```
 AI assistants know what's currently available versus planned or deprecated.
 ## Measuring AI Representation
 Unlike traditional SEO, AI impact is harder to quantify directly:
 ### Qualitative Monitoring
 **Ask AI assistants about your site**: Periodically query Claude, ChatGPT, and others about your product. Do they:
 - Describe you accurately?
 - Highlight key features?
 - Use correct terminology?
 - Provide appropriate warnings/caveats?
 **Monitor AI-generated content**: Watch for your site being referenced in:
 - AI-assisted blog posts
 - Generated code examples
 - Tutorial content
 - Comparison tables
 **Track citation patterns**: When AI cites your site, is it:
 - For the right reasons?
 - In appropriate contexts?
 - With accurate information?
 - Linking to relevant pages?
 ### Quantitative Signals
 **Referrer analysis**: Some AI tools send referrer headers showing they're AI-mediated traffic
 **API usage patterns**: AI-assisted developers may show different integration patterns than manual developers
 **Support question types**: AI-informed users ask more sophisticated questions
 **Time-on-site**: AI-briefed visitors may be more targeted, spending less time but converting better
 ## Brand Voice Consistency
 AI assistants can adapt tone to match your brand if you provide guidance:
 ```
 ## Brand Voice
 - Professional but approachable
 - Technical accuracy over marketing speak
 - Always mention privacy and security first
 - Use "we" language (community-oriented)
 - Avoid: corporate jargon, buzzwords, hype
 ```
 This helps ensure AI-generated content about you feels consistent with your actual brand.
 ## Handling Misconceptions
 Use llms.txt to correct common misunderstandings:
 ```
 ## Common Misconceptions
 WRONG: "We're a general e-commerce platform"
 RIGHT: "We specifically focus on sustainable products"
 WRONG: "We offer all payment methods"
 RIGHT: "We support major cards and PayPal, but not cryptocurrency"
 WRONG: "Free shipping on all orders"
 RIGHT: "Free carbon-offset shipping over $50"
 ```
 This proactive clarification reduces AI-generated misinformation.
 ## Privacy and Training Data
 A common concern: "Doesn't llms.txt help AI companies train on my content?"
 Key points:
 **Training happens regardless**: Public content is already accessible for training
 **llms.txt doesn't grant permission**: It provides context, not authorization
 **robots.txt controls access**: Block AI crawlers there if you don't want them
 **Better representation**: Context helps AI represent you accurately when it does access your site
 Think of llms.txt as **quality control** for inevitable AI consumption, not invitation.
 ## Future-Proofing
 AI capabilities are evolving rapidly. Future trends:
 **Agentic AI**: Assistants that take actions, not just answer questions
 **Multi-modal understanding**: AI processing images, videos, and interactive content
 **Real-time data**: AI querying live APIs versus static crawls
 **Semantic graphs**: Deep relationship mapping between concepts
 llms.txt will evolve to support these capabilities. By adopting it now, you're positioned to benefit from enhancements.
 ## The Long Game
 AI integration is a marathon, not a sprint:
 **Start simple**: Basic llms.txt with description and key features
 **Monitor and refine**: See how AI represents you, adjust accordingly
 **Add detail gradually**: Expand instructions as you identify gaps
 **Stay current**: Update as your product evolves
 **Share learnings**: The community benefits from your experience
 The integration makes the technical part easy. The strategic part - what to say and how - requires ongoing attention.
 ## Related Topics
 - [LLMs.txt Explained](/explanation/llms-explained/) - Deep dive into llms.txt
 - [SEO Strategy](/explanation/seo/) - Traditional vs. AI-mediated discovery
 - [Customizing Instructions](/how-to/customize-llm-instructions/) - Practical guidance optimization
--- a/docs/src/content/docs/explanation/architecture.md
+++ b/docs/src/content/docs/explanation/architecture.md
@ -3,29 +3,454 @@ title: Architecture & Design
 description: How @astrojs/discovery works internally
 ---
-Technical explanation of the integration architecture and design decisions.
+Understanding the integration's architecture helps you customize it effectively and troubleshoot when needed. The design prioritizes simplicity, correctness, and extensibility.
-:::note[Work in Progress]
+## High-Level Design
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+The integration follows Astro's standard integration pattern:
-This section will include:
+```
- Detailed explanations
+astro.config.mjs
- Code examples
+  ↓ integrates discovery()
- Best practices
+  ↓
- Common patterns
+Integration hooks into Astro lifecycle
- Troubleshooting tips
+  ↓
 Injects route handlers for discovery files
  ↓
 Route handlers call generators
  ↓
 Generators produce discovery file content
 ```
-## Related Pages
+Each layer has a specific responsibility, making the system modular and testable.
- [Configuration Reference](/reference/configuration/)
+## The Integration Layer
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+`src/index.ts` implements the Astro integration interface:
- Check our [FAQ](/community/faq/)
+```typescript
- Visit [Troubleshooting](/community/troubleshooting/)
+export default function discovery(config: DiscoveryConfig): AstroIntegration {
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+  return {
    name: '@astrojs/discovery',
    hooks: {
      'astro:config:setup': // Inject routes and sitemap
      'astro:build:done': // Log generated files
    }
  }
 }
 ```
 This layer:
 - Validates configuration
 - Merges user config with defaults
 - Injects dynamic routes
 - Integrates @astrojs/sitemap
 - Reports build results
 ## Configuration Strategy
 Configuration flows through several stages:
 ### 1. User Configuration
 User provides partial configuration in astro.config.mjs:
 ```typescript
 discovery({
  llms: {
    description: 'My site'
  }
 })
 ```
 ### 2. Validation and Defaults
 `src/validators/config.ts` validates and merges with defaults:
 ```typescript
 export function validateConfig(userConfig: DiscoveryConfig): ValidatedConfig {
  return {
    robots: mergeRobotsDefaults(userConfig.robots),
    llms: mergeLLMsDefaults(userConfig.llms),
    // ...
  }
 }
 ```
 This ensures:
 - Required fields are present
 - Types are correct
 - Defaults fill gaps
 - Invalid configs are caught early
 ### 3. Global Storage
 `src/config-store.ts` provides global access to validated config:
 ```typescript
 let globalConfig: DiscoveryConfig;
 export function setConfig(config: DiscoveryConfig) {
  globalConfig = config;
 }
 export function getConfig(): DiscoveryConfig {
  return globalConfig;
 }
 ```
 This allows route handlers to access configuration without passing it through Astro's context (which has limitations).
 ### 4. Virtual Module
 A Vite plugin provides configuration as a virtual module:
 ```typescript
 vite: {
  plugins: [{
    name: '@astrojs/discovery:config',
    resolveId(id) {
      if (id === 'virtual:@astrojs/discovery/config') {
        return '\0' + id;
      }
    },
    load(id) {
      if (id === '\0virtual:@astrojs/discovery/config') {
        return `export default ${JSON.stringify(config)};`;
      }
    }
  }]
 }
 ```
 This makes config available during route execution.
 ## Route Injection
 The integration injects routes for each enabled discovery file:
 ```typescript
 if (config.robots?.enabled !== false) {
  injectRoute({
    pattern: '/robots.txt',
    entrypoint: '@astrojs/discovery/routes/robots',
    prerender: true
  });
 }
 ```
 **Key decisions:**
 **Pattern**: The URL where the file appears
 **Entrypoint**: Module that handles the route
 **Prerender**: Whether to generate at build time (true) or runtime (false)
 Most routes prerender (`prerender: true`) for performance. WebFinger uses `prerender: false` because it requires query parameters.
 ## Generator Pattern
 Each discovery file type has a dedicated generator:
 ```
 src/generators/
  robots.ts      - robots.txt generation
  llms.ts        - llms.txt generation
  humans.ts      - humans.txt generation
  security.ts    - security.txt generation
  canary.ts      - canary.txt generation
  webfinger.ts   - WebFinger JRD generation
 ```
 Generators are pure functions:
 ```typescript
 export function generateRobotsTxt(
  config: RobotsConfig,
  siteURL: URL
 ): string {
  // Generate content
  return robotsTxtString;
 }
 ```
 This makes them:
 - Easy to test (no side effects)
 - Easy to customize (override with your own function)
 - Easy to reason about (input → output)
 ## Route Handler Pattern
 Route handlers bridge Astro routes and generators:
 ```typescript
 // src/routes/robots.ts
 import { getConfig } from '../config-store.js';
 import { generateRobotsTxt } from '../generators/robots.js';
 export async function GET({ site }) {
  const config = getConfig();
  const content = generateRobotsTxt(config.robots, new URL(site));
  return new Response(content, {
    headers: {
      'Content-Type': 'text/plain',
      'Cache-Control': `public, max-age=${config.caching?.robots || 3600}`
    }
  });
 }
 ```
 Responsibilities:
 1. Retrieve configuration
 2. Call generator with config and site URL
 3. Set appropriate headers (Content-Type, Cache-Control)
 4. Return response
 ## Type System
 `src/types.ts` defines the complete type hierarchy:
 ```typescript
 export interface DiscoveryConfig {
  robots?: RobotsConfig;
  llms?: LLMsConfig;
  humans?: HumansConfig;
  security?: SecurityConfig;
  canary?: CanaryConfig;
  webfinger?: WebFingerConfig;
  sitemap?: SitemapConfig;
  caching?: CachingConfig;
  templates?: TemplateConfig;
 }
 ```
 This provides:
 - IntelliSense in editors
 - Compile-time type checking
 - Self-documenting configuration
 - Safe refactoring
 Types are exported so users can import them:
 ```typescript
 import type { DiscoveryConfig } from '@astrojs/discovery';
 ```
 ## Dynamic Content Support
 Several discovery files support dynamic generation:
 ### Function-based Configuration
 ```typescript
 llms: {
  description: () => {
    // Compute at build time
    return `Generated at ${new Date()}`;
  }
 }
 ```
 ### Async Functions
 ```typescript
 llms: {
  apiEndpoints: async () => {
    const spec = await loadOpenAPISpec();
    return extractEndpoints(spec);
  }
 }
 ```
 Generators handle both static values and functions transparently.
 ### Content Collection Integration
 WebFinger integrates with Astro content collections:
 ```typescript
 webfinger: {
  collections: [{
    name: 'team',
    resourceTemplate: 'acct:{slug}@example.com',
    linksBuilder: (entry) => [...]
  }]
 }
 ```
 The WebFinger route:
 1. Calls `getCollection('team')`
 2. Applies templates to each entry
 3. Matches against query parameter
 4. Generates JRD response
 ## Cache Control
 Each discovery file has configurable cache duration:
 ```typescript
 caching: {
  robots: 3600,      // 1 hour
  llms: 3600,        // 1 hour
  humans: 86400,     // 24 hours
  security: 86400,   // 24 hours
  canary: 3600,      // 1 hour
  webfinger: 3600,   // 1 hour
 }
 ```
 Routes set `Cache-Control` headers based on these values:
 ```typescript
 headers: {
  'Cache-Control': `public, max-age=${cacheDuration}`
 }
 ```
 This balances:
 - **Performance**: Cached responses serve faster
 - **Freshness**: Short durations keep content current
 - **Server load**: Reduces regeneration frequency
 ## Sitemap Integration
 The integration includes @astrojs/sitemap automatically:
 ```typescript
 updateConfig({
  integrations: [
    sitemap(config.sitemap || {})
  ]
 });
 ```
 This ensures:
 - Sitemap is always present
 - Configuration passes through
 - robots.txt references correct sitemap URL
 Users don't need to install @astrojs/sitemap separately.
 ## Error Handling
 The integration validates aggressively at startup:
 ```typescript
 if (!astroConfig.site) {
  throw new Error(
    '[@astrojs/discovery] The `site` option must be set in your Astro config.'
  );
 }
 ```
 This fails fast with clear error messages rather than generating incorrect output.
 Generators also validate input:
 ```typescript
 if (!config.contact) {
  throw new Error('security.txt requires a contact field');
 }
 ```
 RFC compliance is enforced at generation time.
 ## Extensibility Points
 Users can extend the integration in several ways:
 ### Custom Templates
 Override any generator:
 ```typescript
 templates: {
  robots: (config, siteURL) => `
    User-agent: *
    Allow: /
    # Custom content
    Sitemap: ${siteURL}/sitemap.xml
  `
 }
 ```
 ### Custom Sections
 Add custom content to humans.txt and llms.txt:
 ```typescript
 humans: {
  customSections: {
    'PHILOSOPHY': 'We believe in...'
  }
 }
 ```
 ### Dynamic Functions
 Generate content at build time:
 ```typescript
 canary: {
  statements: () => computeStatements()
 }
 ```
 ## Build Output
 At build completion, the integration logs generated files:
 ```
 ✨ @astrojs/discovery - Generated files:
  ✅ /robots.txt
  ✅ /llms.txt
  ✅ /humans.txt
  ✅ /.well-known/security.txt
  ✅ /sitemap-index.xml
 ```
 This provides immediate feedback about what was created.
 ## Performance Considerations
 The integration is designed for minimal build impact:
 **Prerendering**: Most routes prerender at build time (no runtime cost)
 **Pure functions**: Generators have no side effects (safe to call multiple times)
 **Caching**: HTTP caching reduces server load
 **Lazy loading**: Generators only execute for enabled files
 Build time impact is typically <200ms for all files.
 ## Testing Strategy
 The codebase uses a layered testing approach:
 **Unit tests**: Test generators in isolation with known inputs
 **Integration tests**: Test route handlers with mock Astro context
 **Type tests**: Ensure TypeScript types are correct
 **E2E tests**: Deploy and verify actual output
 This ensures correctness at each layer.
 ## Why This Architecture?
 Key design decisions:
 **Separation of concerns**: Generators don't know about Astro, routes don't know about content formats
 **Composability**: Each piece is independently usable
 **Testability**: Pure functions are easy to test
 **Type safety**: TypeScript catches errors at compile time
 **Extensibility**: Users can override any behavior
 **Performance**: Prerendering and caching minimize runtime cost
 The architecture prioritizes **correctness** and **simplicity** over cleverness.
 ## Related Topics
 - [API Reference](/reference/api/) - Complete API documentation
 - [TypeScript Types](/reference/typescript/) - Type definitions
 - [Custom Templates](/how-to/custom-templates/) - Overriding generators