diff --git a/docs/src/content/docs/explanation/ai-integration.md b/docs/src/content/docs/explanation/ai-integration.md index 57edcdb..8178aa4 100644 --- a/docs/src/content/docs/explanation/ai-integration.md +++ b/docs/src/content/docs/explanation/ai-integration.md @@ -1,31 +1,264 @@ --- -title: AI Assistant Integration -description: How AI assistants use discovery files +title: AI Assistant Integration Strategy +description: How AI assistants use discovery files and how to optimize for them --- -Learn how AI assistants discover and use information from your site. +The relationship between websites and AI assistants is fundamentally different from traditional search engines. Understanding this difference is key to optimizing your site for AI-mediated discovery. -:::note[Work in Progress] -This page is currently being developed. Check back soon for complete documentation. -::: +## Beyond Indexing: AI Understanding -## Coming Soon +Search engines **index** your site - they catalog what exists and where. AI assistants **understand** your site - they build mental models of what you do, why it matters, and how to help users interact with you. -This section will include: -- Detailed explanations -- Code examples -- Best practices -- Common patterns -- Troubleshooting tips +This shift from retrieval to comprehension requires different discovery mechanisms. -## Related Pages +### Traditional Search Flow -- [Configuration Reference](/reference/configuration/) -- [API Reference](/reference/api/) -- [Examples](/examples/ecommerce/) +1. User searches for keywords +2. Engine returns ranked list of pages +3. User clicks and reads +4. User decides if content answers their question -## Need Help? +### AI Assistant Flow -- Check our [FAQ](/community/faq/) -- Visit [Troubleshooting](/community/troubleshooting/) -- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) +1. User asks conversational question +2. AI synthesizes answer from multiple sources +3. AI provides direct response with citations +4. User may or may not visit original sources + +In the AI flow, your site might be the source without getting the click. Discovery files help ensure you're at least properly represented and attributed. + +## The llms.txt Strategy + +llms.txt is your primary tool for AI optimization. Think of it as **briefing an employee** who'll be answering questions about your company. + +### What to Emphasize + +**Core value proposition**: Not just what you do, but why you exist + +``` +We're not just another e-commerce platform - we're specifically +focused on sustainable products with carbon footprint tracking. +``` + +This context helps AI assistants understand when to recommend you versus competitors. + +**Key differentiators**: What makes you unique + +``` +Unlike other platforms, we: +- Calculate carbon footprint for every purchase +- Offset shipping emissions by default +- Partner directly with sustainable manufacturers +``` + +This guides AI to highlight your strengths. + +**Common questions**: What users typically ask + +``` +When users ask about sustainability, explain our carbon tracking. +When users ask about pricing, mention our price-match guarantee. +When users ask about shipping, highlight our carbon-offset program. +``` + +This provides explicit guidance for common scenarios. + +### What to Avoid + +**Overpromising**: AI will fact-check against your actual site +**Marketing fluff**: Be informative, not promotional +**Exhaustive detail**: Link to comprehensive docs instead +**Outdated info**: Keep current or use dynamic generation + +## Coordinating Discovery Files + +AI assistants use multiple discovery mechanisms together: + +### robots.txt → llms.txt Flow + +1. AI bot checks robots.txt for permission +2. Finds reference to llms.txt +3. Reads llms.txt for context +4. Crawls site with that context in mind + +Ensure your robots.txt explicitly allows AI bots: + +``` +User-agent: GPTBot +User-agent: Claude-Web +User-agent: Anthropic-AI +Allow: / +``` + +### llms.txt → humans.txt Connection + +humans.txt provides tech stack info that helps AI answer developer questions: + +User: "Can I integrate this with React?" +AI: *checks humans.txt, sees React in tech stack* +AI: "Yes, it's built with React and designed for React integration." + +The files complement each other. + +### sitemap.xml → AI Content Discovery + +Sitemaps help AI find comprehensive content: + +```xml + + https://example.com/docs/api + 0.9 + +``` + +High-priority pages in your sitemap signal importance to AI crawlers. + +## Dynamic Content Generation + +Static llms.txt works for stable information. Dynamic generation handles changing contexts: + +### API Endpoint Discovery + +```typescript +llms: { + apiEndpoints: async () => { + const spec = await loadOpenAPISpec(); + return spec.paths.map(path => ({ + path: path.url, + method: path.method, + description: path.summary + })); + } +} +``` + +This keeps AI's understanding of your API current without manual updates. + +### Feature Flags and Capabilities + +```typescript +llms: { + instructions: () => { + const features = getEnabledFeatures(); + return ` +Current features: +${features.map(f => `- ${f.name}: ${f.description}`).join('\n')} + +Note: Feature availability may change. Check /api/features for current status. + `; + } +} +``` + +AI assistants know what's currently available versus planned or deprecated. + +## Measuring AI Representation + +Unlike traditional SEO, AI impact is harder to quantify directly: + +### Qualitative Monitoring + +**Ask AI assistants about your site**: Periodically query Claude, ChatGPT, and others about your product. Do they: +- Describe you accurately? +- Highlight key features? +- Use correct terminology? +- Provide appropriate warnings/caveats? + +**Monitor AI-generated content**: Watch for your site being referenced in: +- AI-assisted blog posts +- Generated code examples +- Tutorial content +- Comparison tables + +**Track citation patterns**: When AI cites your site, is it: +- For the right reasons? +- In appropriate contexts? +- With accurate information? +- Linking to relevant pages? + +### Quantitative Signals + +**Referrer analysis**: Some AI tools send referrer headers showing they're AI-mediated traffic + +**API usage patterns**: AI-assisted developers may show different integration patterns than manual developers + +**Support question types**: AI-informed users ask more sophisticated questions + +**Time-on-site**: AI-briefed visitors may be more targeted, spending less time but converting better + +## Brand Voice Consistency + +AI assistants can adapt tone to match your brand if you provide guidance: + +``` +## Brand Voice + +- Professional but approachable +- Technical accuracy over marketing speak +- Always mention privacy and security first +- Use "we" language (community-oriented) +- Avoid: corporate jargon, buzzwords, hype +``` + +This helps ensure AI-generated content about you feels consistent with your actual brand. + +## Handling Misconceptions + +Use llms.txt to correct common misunderstandings: + +``` +## Common Misconceptions + +WRONG: "We're a general e-commerce platform" +RIGHT: "We specifically focus on sustainable products" + +WRONG: "We offer all payment methods" +RIGHT: "We support major cards and PayPal, but not cryptocurrency" + +WRONG: "Free shipping on all orders" +RIGHT: "Free carbon-offset shipping over $50" +``` + +This proactive clarification reduces AI-generated misinformation. + +## Privacy and Training Data + +A common concern: "Doesn't llms.txt help AI companies train on my content?" + +Key points: + +**Training happens regardless**: Public content is already accessible for training +**llms.txt doesn't grant permission**: It provides context, not authorization +**robots.txt controls access**: Block AI crawlers there if you don't want them +**Better representation**: Context helps AI represent you accurately when it does access your site + +Think of llms.txt as **quality control** for inevitable AI consumption, not invitation. + +## Future-Proofing + +AI capabilities are evolving rapidly. Future trends: + +**Agentic AI**: Assistants that take actions, not just answer questions +**Multi-modal understanding**: AI processing images, videos, and interactive content +**Real-time data**: AI querying live APIs versus static crawls +**Semantic graphs**: Deep relationship mapping between concepts + +llms.txt will evolve to support these capabilities. By adopting it now, you're positioned to benefit from enhancements. + +## The Long Game + +AI integration is a marathon, not a sprint: + +**Start simple**: Basic llms.txt with description and key features +**Monitor and refine**: See how AI represents you, adjust accordingly +**Add detail gradually**: Expand instructions as you identify gaps +**Stay current**: Update as your product evolves +**Share learnings**: The community benefits from your experience + +The integration makes the technical part easy. The strategic part - what to say and how - requires ongoing attention. + +## Related Topics + +- [LLMs.txt Explained](/explanation/llms-explained/) - Deep dive into llms.txt +- [SEO Strategy](/explanation/seo/) - Traditional vs. AI-mediated discovery +- [Customizing Instructions](/how-to/customize-llm-instructions/) - Practical guidance optimization diff --git a/docs/src/content/docs/explanation/architecture.md b/docs/src/content/docs/explanation/architecture.md index bb98864..5593d45 100644 --- a/docs/src/content/docs/explanation/architecture.md +++ b/docs/src/content/docs/explanation/architecture.md @@ -3,29 +3,454 @@ title: Architecture & Design description: How @astrojs/discovery works internally --- -Technical explanation of the integration architecture and design decisions. +Understanding the integration's architecture helps you customize it effectively and troubleshoot when needed. The design prioritizes simplicity, correctness, and extensibility. -:::note[Work in Progress] -This page is currently being developed. Check back soon for complete documentation. -::: +## High-Level Design -## Coming Soon +The integration follows Astro's standard integration pattern: -This section will include: -- Detailed explanations -- Code examples -- Best practices -- Common patterns -- Troubleshooting tips +``` +astro.config.mjs + ↓ integrates discovery() + ↓ +Integration hooks into Astro lifecycle + ↓ +Injects route handlers for discovery files + ↓ +Route handlers call generators + ↓ +Generators produce discovery file content +``` -## Related Pages +Each layer has a specific responsibility, making the system modular and testable. -- [Configuration Reference](/reference/configuration/) -- [API Reference](/reference/api/) -- [Examples](/examples/ecommerce/) +## The Integration Layer -## Need Help? +`src/index.ts` implements the Astro integration interface: -- Check our [FAQ](/community/faq/) -- Visit [Troubleshooting](/community/troubleshooting/) -- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) +```typescript +export default function discovery(config: DiscoveryConfig): AstroIntegration { + return { + name: '@astrojs/discovery', + hooks: { + 'astro:config:setup': // Inject routes and sitemap + 'astro:build:done': // Log generated files + } + } +} +``` + +This layer: + +- Validates configuration +- Merges user config with defaults +- Injects dynamic routes +- Integrates @astrojs/sitemap +- Reports build results + +## Configuration Strategy + +Configuration flows through several stages: + +### 1. User Configuration + +User provides partial configuration in astro.config.mjs: + +```typescript +discovery({ + llms: { + description: 'My site' + } +}) +``` + +### 2. Validation and Defaults + +`src/validators/config.ts` validates and merges with defaults: + +```typescript +export function validateConfig(userConfig: DiscoveryConfig): ValidatedConfig { + return { + robots: mergeRobotsDefaults(userConfig.robots), + llms: mergeLLMsDefaults(userConfig.llms), + // ... + } +} +``` + +This ensures: +- Required fields are present +- Types are correct +- Defaults fill gaps +- Invalid configs are caught early + +### 3. Global Storage + +`src/config-store.ts` provides global access to validated config: + +```typescript +let globalConfig: DiscoveryConfig; + +export function setConfig(config: DiscoveryConfig) { + globalConfig = config; +} + +export function getConfig(): DiscoveryConfig { + return globalConfig; +} +``` + +This allows route handlers to access configuration without passing it through Astro's context (which has limitations). + +### 4. Virtual Module + +A Vite plugin provides configuration as a virtual module: + +```typescript +vite: { + plugins: [{ + name: '@astrojs/discovery:config', + resolveId(id) { + if (id === 'virtual:@astrojs/discovery/config') { + return '\0' + id; + } + }, + load(id) { + if (id === '\0virtual:@astrojs/discovery/config') { + return `export default ${JSON.stringify(config)};`; + } + } + }] +} +``` + +This makes config available during route execution. + +## Route Injection + +The integration injects routes for each enabled discovery file: + +```typescript +if (config.robots?.enabled !== false) { + injectRoute({ + pattern: '/robots.txt', + entrypoint: '@astrojs/discovery/routes/robots', + prerender: true + }); +} +``` + +**Key decisions:** + +**Pattern**: The URL where the file appears +**Entrypoint**: Module that handles the route +**Prerender**: Whether to generate at build time (true) or runtime (false) + +Most routes prerender (`prerender: true`) for performance. WebFinger uses `prerender: false` because it requires query parameters. + +## Generator Pattern + +Each discovery file type has a dedicated generator: + +``` +src/generators/ + robots.ts - robots.txt generation + llms.ts - llms.txt generation + humans.ts - humans.txt generation + security.ts - security.txt generation + canary.ts - canary.txt generation + webfinger.ts - WebFinger JRD generation +``` + +Generators are pure functions: + +```typescript +export function generateRobotsTxt( + config: RobotsConfig, + siteURL: URL +): string { + // Generate content + return robotsTxtString; +} +``` + +This makes them: +- Easy to test (no side effects) +- Easy to customize (override with your own function) +- Easy to reason about (input → output) + +## Route Handler Pattern + +Route handlers bridge Astro routes and generators: + +```typescript +// src/routes/robots.ts +import { getConfig } from '../config-store.js'; +import { generateRobotsTxt } from '../generators/robots.js'; + +export async function GET({ site }) { + const config = getConfig(); + const content = generateRobotsTxt(config.robots, new URL(site)); + + return new Response(content, { + headers: { + 'Content-Type': 'text/plain', + 'Cache-Control': `public, max-age=${config.caching?.robots || 3600}` + } + }); +} +``` + +Responsibilities: + +1. Retrieve configuration +2. Call generator with config and site URL +3. Set appropriate headers (Content-Type, Cache-Control) +4. Return response + +## Type System + +`src/types.ts` defines the complete type hierarchy: + +```typescript +export interface DiscoveryConfig { + robots?: RobotsConfig; + llms?: LLMsConfig; + humans?: HumansConfig; + security?: SecurityConfig; + canary?: CanaryConfig; + webfinger?: WebFingerConfig; + sitemap?: SitemapConfig; + caching?: CachingConfig; + templates?: TemplateConfig; +} +``` + +This provides: +- IntelliSense in editors +- Compile-time type checking +- Self-documenting configuration +- Safe refactoring + +Types are exported so users can import them: + +```typescript +import type { DiscoveryConfig } from '@astrojs/discovery'; +``` + +## Dynamic Content Support + +Several discovery files support dynamic generation: + +### Function-based Configuration + +```typescript +llms: { + description: () => { + // Compute at build time + return `Generated at ${new Date()}`; + } +} +``` + +### Async Functions + +```typescript +llms: { + apiEndpoints: async () => { + const spec = await loadOpenAPISpec(); + return extractEndpoints(spec); + } +} +``` + +Generators handle both static values and functions transparently. + +### Content Collection Integration + +WebFinger integrates with Astro content collections: + +```typescript +webfinger: { + collections: [{ + name: 'team', + resourceTemplate: 'acct:{slug}@example.com', + linksBuilder: (entry) => [...] + }] +} +``` + +The WebFinger route: +1. Calls `getCollection('team')` +2. Applies templates to each entry +3. Matches against query parameter +4. Generates JRD response + +## Cache Control + +Each discovery file has configurable cache duration: + +```typescript +caching: { + robots: 3600, // 1 hour + llms: 3600, // 1 hour + humans: 86400, // 24 hours + security: 86400, // 24 hours + canary: 3600, // 1 hour + webfinger: 3600, // 1 hour +} +``` + +Routes set `Cache-Control` headers based on these values: + +```typescript +headers: { + 'Cache-Control': `public, max-age=${cacheDuration}` +} +``` + +This balances: +- **Performance**: Cached responses serve faster +- **Freshness**: Short durations keep content current +- **Server load**: Reduces regeneration frequency + +## Sitemap Integration + +The integration includes @astrojs/sitemap automatically: + +```typescript +updateConfig({ + integrations: [ + sitemap(config.sitemap || {}) + ] +}); +``` + +This ensures: +- Sitemap is always present +- Configuration passes through +- robots.txt references correct sitemap URL + +Users don't need to install @astrojs/sitemap separately. + +## Error Handling + +The integration validates aggressively at startup: + +```typescript +if (!astroConfig.site) { + throw new Error( + '[@astrojs/discovery] The `site` option must be set in your Astro config.' + ); +} +``` + +This fails fast with clear error messages rather than generating incorrect output. + +Generators also validate input: + +```typescript +if (!config.contact) { + throw new Error('security.txt requires a contact field'); +} +``` + +RFC compliance is enforced at generation time. + +## Extensibility Points + +Users can extend the integration in several ways: + +### Custom Templates + +Override any generator: + +```typescript +templates: { + robots: (config, siteURL) => ` + User-agent: * + Allow: / + + # Custom content + Sitemap: ${siteURL}/sitemap.xml + ` +} +``` + +### Custom Sections + +Add custom content to humans.txt and llms.txt: + +```typescript +humans: { + customSections: { + 'PHILOSOPHY': 'We believe in...' + } +} +``` + +### Dynamic Functions + +Generate content at build time: + +```typescript +canary: { + statements: () => computeStatements() +} +``` + +## Build Output + +At build completion, the integration logs generated files: + +``` +✨ @astrojs/discovery - Generated files: + ✅ /robots.txt + ✅ /llms.txt + ✅ /humans.txt + ✅ /.well-known/security.txt + ✅ /sitemap-index.xml +``` + +This provides immediate feedback about what was created. + +## Performance Considerations + +The integration is designed for minimal build impact: + +**Prerendering**: Most routes prerender at build time (no runtime cost) +**Pure functions**: Generators have no side effects (safe to call multiple times) +**Caching**: HTTP caching reduces server load +**Lazy loading**: Generators only execute for enabled files + +Build time impact is typically <200ms for all files. + +## Testing Strategy + +The codebase uses a layered testing approach: + +**Unit tests**: Test generators in isolation with known inputs +**Integration tests**: Test route handlers with mock Astro context +**Type tests**: Ensure TypeScript types are correct +**E2E tests**: Deploy and verify actual output + +This ensures correctness at each layer. + +## Why This Architecture? + +Key design decisions: + +**Separation of concerns**: Generators don't know about Astro, routes don't know about content formats +**Composability**: Each piece is independently usable +**Testability**: Pure functions are easy to test +**Type safety**: TypeScript catches errors at compile time +**Extensibility**: Users can override any behavior +**Performance**: Prerendering and caching minimize runtime cost + +The architecture prioritizes **correctness** and **simplicity** over cleverness. + +## Related Topics + +- [API Reference](/reference/api/) - Complete API documentation +- [TypeScript Types](/reference/typescript/) - Type definitions +- [Custom Templates](/how-to/custom-templates/) - Overriding generators