# @astrojs/discovery > Comprehensive discovery integration for Astro - handles robots.txt, llms.txt, humans.txt, security.txt, canary.txt, and sitemap generation ## Overview This integration provides automatic generation of all standard discovery files for your Astro site, making it easily discoverable by search engines, LLMs, and humans, while providing security contact information and transparency mechanisms. ## Features - πŸ€– **robots.txt** - Dynamic generation with LLM bot support - 🧠 **llms.txt** - AI assistant discovery and instructions - πŸ‘₯ **humans.txt** - Human-readable credits and tech stack - πŸ”’ **security.txt** - RFC 9116 compliant security contact info - 🐦 **canary.txt** - Warrant canary for transparency - πŸ—ΊοΈ **sitemap.xml** - Automatic sitemap generation - ⚑ **Dynamic URLs** - Adapts to your `site` config - 🎯 **Smart Caching** - Optimized cache headers - πŸ”§ **Fully Customizable** - Override any section ## Installation ```bash npx astro add @astrojs/discovery ``` Or manually: ```bash npm install @astrojs/discovery ``` ## Quick Start ### Basic Setup ```typescript // astro.config.mjs import { defineConfig } from 'astro'; import discovery from '@astrojs/discovery'; export default defineConfig({ site: 'https://example.com', integrations: [ discovery() ] }); ``` That's it! This will generate: - `/robots.txt` - `/llms.txt` - `/humans.txt` - `/sitemap-index.xml` To enable security.txt and canary.txt, add their configurations: ```typescript export default defineConfig({ site: 'https://example.com', integrations: [ discovery({ security: { contact: 'security@example.com', }, canary: { organization: 'Example Corp', contact: 'canary@example.com', } }) ] }); ``` This adds: - `/.well-known/security.txt` - `/.well-known/canary.txt` ### With Configuration ```typescript // astro.config.mjs import { defineConfig } from 'astro'; import discovery from '@astrojs/discovery'; export default defineConfig({ site: 'https://example.com', integrations: [ discovery({ // Robots.txt configuration robots: { crawlDelay: 2, additionalAgents: [ { userAgent: 'CustomBot', allow: ['/api'], disallow: ['/admin'] } ] }, // LLMs.txt configuration llms: { description: 'Your site description for AI assistants', apiEndpoints: [ { path: '/api/chat', description: 'Chat endpoint' }, { path: '/api/search', description: 'Search API' } ], instructions: ` When helping users with our site: 1. Check documentation first 2. Use provided API endpoints 3. Follow brand guidelines ` }, // Humans.txt configuration humans: { team: [ { name: 'Jane Doe', role: 'Creator & Developer', contact: 'jane@example.com', location: 'San Francisco, CA' } ], thanks: [ 'The Astro team', 'Open source community' ], site: { lastUpdate: 'auto', // or specific date language: 'English', doctype: 'HTML5', ide: 'VS Code', techStack: ['Astro', 'TypeScript', 'React'] }, story: 'Your project story...', funFacts: [ 'Built with love', 'Coffee-powered development' ] }, // Sitemap configuration sitemap: { // Passed through to @astrojs/sitemap filter: (page) => !page.includes('/admin'), changefreq: 'weekly', priority: 0.7 } }) ] }); ``` ## API Reference ### `discovery(options?)` #### Options ##### `robots` Configuration for robots.txt generation. **Type:** ```typescript interface RobotsConfig { crawlDelay?: number; allowAllBots?: boolean; llmBots?: { enabled?: boolean; agents?: string[]; // Custom LLM bot names }; additionalAgents?: Array<{ userAgent: string; allow?: string[]; disallow?: string[]; }>; customRules?: string; // Raw robots.txt content to append } ``` **Default:** ```typescript { crawlDelay: 1, allowAllBots: true, llmBots: { enabled: true, agents: [ 'Anthropic-AI', 'Claude-Web', 'GPTBot', 'ChatGPT-User', 'cohere-ai', 'Google-Extended' ] } } ``` **Example:** ```typescript discovery({ robots: { crawlDelay: 2, llmBots: { enabled: true, agents: ['CustomAIBot', 'AnotherBot'] }, additionalAgents: [ { userAgent: 'BadBot', disallow: ['/'] } ] } }) ``` ##### `llms` Configuration for llms.txt generation. **Type:** ```typescript interface LLMsConfig { enabled?: boolean; description?: string; keyFeatures?: string[]; importantPages?: Array<{ name: string; path: string; description?: string; }>; instructions?: string; apiEndpoints?: Array<{ path: string; method?: string; description: string; }>; techStack?: { frontend?: string[]; backend?: string[]; ai?: string[]; other?: string[]; }; brandVoice?: string[]; customSections?: Record; } ``` **Example:** ```typescript discovery({ llms: { description: 'E-commerce platform for sustainable products', keyFeatures: [ 'AI-powered product recommendations', 'Carbon footprint calculator', 'Subscription management' ], instructions: ` When helping users: 1. Check product availability via API 2. Suggest sustainable alternatives 3. Calculate shipping costs `, apiEndpoints: [ { path: '/api/products', method: 'GET', description: 'List all products' }, { path: '/api/calculate-footprint', method: 'POST', description: 'Calculate carbon footprint' } ] } }) ``` ##### `humans` Configuration for humans.txt generation. **Type:** ```typescript interface HumansConfig { enabled?: boolean; team?: Array<{ name: string; role?: string; contact?: string; location?: string; twitter?: string; github?: string; }>; thanks?: string[]; site?: { lastUpdate?: string | 'auto'; language?: string; doctype?: string; ide?: string; techStack?: string[]; standards?: string[]; components?: string[]; software?: string[]; }; story?: string; funFacts?: string[]; philosophy?: string[]; customSections?: Record; } ``` **Example:** ```typescript discovery({ humans: { team: [ { name: 'Alice Developer', role: 'Lead Developer', contact: 'alice@example.com', location: 'New York', github: 'alice-dev' } ], thanks: [ 'Coffee', 'Stack Overflow community', 'My rubber duck' ], story: ` This project started when we realized that... `, funFacts: [ 'Written entirely on a mechanical keyboard', 'Fueled by 347 cups of coffee', 'Built during a 48-hour hackathon' ] } }) ``` ##### `security` Configuration for security.txt generation (RFC 9116). **Type:** ```typescript interface SecurityConfig { enabled?: boolean; contact: string | string[]; // Required: security contact (email or URL) expires?: string | 'auto'; // Expiration date (default: 1 year) encryption?: string | string[]; // PGP key URL(s) acknowledgments?: string; // Hall of fame URL preferredLanguages?: string[]; // Preferred languages (e.g., ['en', 'es']) canonical?: string; // Canonical URL policy?: string; // Security policy URL hiring?: string; // Security jobs URL } ``` **Example:** ```typescript discovery({ security: { contact: 'security@example.com', expires: 'auto', // Auto-calculates 1 year from build encryption: 'https://example.com/pgp-key.txt', acknowledgments: 'https://example.com/security/hall-of-fame', preferredLanguages: ['en', 'es'], policy: 'https://example.com/security/policy' } }) ``` **Notes:** - Email contacts automatically get `mailto:` prefix - `expires: 'auto'` sets expiration to 1 year from generation - Generates at `/.well-known/security.txt` per RFC 9116 - Canonical URL defaults to correct .well-known location ##### `canary` Configuration for warrant canary generation. **Type:** ```typescript interface CanaryConfig { enabled?: boolean; organization?: string; // Organization name contact?: string; // Contact email frequency?: 'daily' | 'weekly' | 'monthly' | 'quarterly' | 'yearly'; expires?: string | 'auto'; // Expiration (auto-calculated from frequency) statements?: CanaryStatement[] | (() => CanaryStatement[]); additionalStatement?: string; // Additional context verification?: string; // PGP signature URL previousCanary?: string; // Previous canary URL blockchainProof?: { // Blockchain verification network: string; address: string; txHash?: string; timestamp?: string; }; personnelStatement?: boolean; // Add duress check } interface CanaryStatement { type: string; description: string; received: boolean; } ``` **Example:** ```typescript discovery({ canary: { organization: 'Example Corp', contact: 'canary@example.com', frequency: 'monthly', // Auto-expires in 35 days statements: [ { type: 'nsl', description: 'National Security Letters', received: false }, { type: 'gag', description: 'Gag orders', received: false } ], additionalStatement: 'We are committed to transparency.', verification: 'PGP Signature: https://example.com/canary.txt.asc', personnelStatement: true // Adds duress check } }) ``` **Frequency-based expiration:** - `daily`: 2 days - `weekly`: 10 days - `monthly`: 35 days - `quarterly`: 100 days - `yearly`: 380 days **Notes:** - Only non-received statements appear in output - Statements can be a function for dynamic generation - Generates at `/.well-known/canary.txt` - See [CANARY_SPEC.md](./CANARY_SPEC.md) for full specification ##### `sitemap` Configuration passed to `@astrojs/sitemap`. **Type:** ```typescript interface SitemapConfig { filter?: (page: string) => boolean; customPages?: string[]; i18n?: { defaultLocale: string; locales: Record; }; changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never'; lastmod?: Date; priority?: number; serialize?: (item: SitemapItem) => SitemapItem | undefined; } ``` **Example:** ```typescript discovery({ sitemap: { filter: (page) => !page.includes('/admin') && !page.includes('/draft'), changefreq: 'daily', priority: 0.8 } }) ``` ##### `caching` Configure HTTP cache headers for discovery files. **Type:** ```typescript interface CachingConfig { robots?: number; // seconds llms?: number; humans?: number; security?: number; canary?: number; sitemap?: number; } ``` **Default:** ```typescript { robots: 3600, // 1 hour llms: 3600, // 1 hour humans: 86400, // 24 hours security: 86400, // 24 hours canary: 3600, // 1 hour (check frequently!) sitemap: 3600 // 1 hour } ``` ## Advanced Usage ### Custom Templates You can provide custom templates for any file: ```typescript discovery({ templates: { robots: (config, siteURL) => ` User-agent: * Allow: / # Custom content Sitemap: ${siteURL}/sitemap-index.xml `, llms: (config, siteURL) => ` # ${config.description} Visit ${siteURL} for more information. `, security: (config, siteURL) => ` # Custom security.txt format Contact: ${config.contact} Expires: ${config.expires || new Date(Date.now() + 365*24*60*60*1000).toISOString()} `, canary: (config, siteURL) => ` # Custom canary format Organization: ${config.organization} Last-Updated: ${new Date().toISOString()} ` } }) ``` ### Conditional Generation Disable specific files in certain environments: ```typescript discovery({ robots: { enabled: import.meta.env.PROD // Only in production }, llms: { enabled: true // Always generate }, humans: { enabled: import.meta.env.DEV // Only in development } }) ``` ### Dynamic Content Use functions for dynamic content: ```typescript discovery({ llms: { description: () => { const pkg = JSON.parse(fs.readFileSync('./package.json', 'utf-8')); return `${pkg.name} - ${pkg.description}`; }, apiEndpoints: async () => { // Load from OpenAPI spec const spec = await loadOpenAPISpec(); return spec.paths.map(path => ({ path: path.url, method: path.method, description: path.summary })); } } }) ``` ## Integration with Other Tools ### With @astrojs/sitemap The discovery integration automatically includes `@astrojs/sitemap`, so you don't need to install it separately. Configuration is passed through: ```typescript discovery({ sitemap: { // All @astrojs/sitemap options work here filter: (page) => !page.includes('/secret'), changefreq: 'weekly' } }) ``` ### With Content Collections Automatically extract information from content collections: ```typescript discovery({ llms: { importantPages: async () => { const docs = await getCollection('docs'); return docs.map(doc => ({ name: doc.data.title, path: `/docs/${doc.slug}`, description: doc.data.description })); } } }) ``` ### With Environment Variables Use environment variables for sensitive information: ```typescript discovery({ humans: { team: [ { name: 'Developer', contact: process.env.PUBLIC_CONTACT_EMAIL } ] } }) ``` ## Output The integration generates the following files: ### `/robots.txt` ``` User-agent: * Allow: / # Sitemaps Sitemap: https://example.com/sitemap-index.xml # LLM-specific resources User-agent: Anthropic-AI User-agent: Claude-Web User-agent: GPTBot Allow: /llms.txt # Crawl delay Crawl-delay: 1 ``` ### `/llms.txt` ``` # Project Name - Description > Short tagline ## Site Information - Name: Project Name - Description: Full description - URL: https://example.com ## For AI Assistants Instructions for AI assistants... ## API Endpoints - GET /api/endpoint - Description ``` ### `/humans.txt` ``` /* TEAM */ Name: Developer Name Role: Position Contact: email@example.com /* THANKS */ - Thank you note 1 - Thank you note 2 /* SITE */ Tech stack and details... ``` ### `/sitemap-index.xml` Standard XML sitemap with all your pages. ## Best Practices ### 1. **Set Your Site URL** Always configure `site` in your Astro config: ```typescript export default defineConfig({ site: 'https://example.com', // Required! integrations: [discovery()] }); ``` ### 2. **Keep humans.txt Updated** Update your team information and tech stack regularly: ```typescript discovery({ humans: { site: { lastUpdate: 'auto' // Automatically uses current date } } }) ``` ### 3. **Be Specific with LLM Instructions** Provide clear, actionable instructions for AI assistants: ```typescript discovery({ llms: { instructions: ` When helping users: 1. Always check API documentation first 2. Use the /api/search endpoint for queries 3. Format responses in markdown 4. Include relevant links ` } }) ``` ### 4. **Filter Private Pages** Exclude admin, draft, and private pages: ```typescript discovery({ sitemap: { filter: (page) => { return !page.includes('/admin') && !page.includes('/draft') && !page.includes('/private'); } }, robots: { additionalAgents: [ { userAgent: '*', disallow: ['/admin', '/draft', '/private'] } ] } }) ``` ### 5. **Optimize Cache Headers** Balance freshness with server load: ```typescript discovery({ caching: { robots: 3600, // 1 hour - changes rarely llms: 1800, // 30 min - may update instructions humans: 86400, // 24 hours - credits don't change often sitemap: 3600 // 1 hour - content changes moderately } }) ``` ## Troubleshooting ### Files Not Generating 1. **Check your output mode:** ```typescript export default defineConfig({ output: 'hybrid', // or 'server' // ... }); ``` 2. **Verify site URL is set:** ```typescript export default defineConfig({ site: 'https://example.com' // Must be set! }); ``` 3. **Check for conflicts:** Remove any existing `/public/robots.txt` or similar static files. ### Wrong URLs in Files Make sure your `site` config matches your production domain: ```typescript export default defineConfig({ site: import.meta.env.PROD ? 'https://production.com' : 'http://localhost:4321' }); ``` ### LLM Bots Not Respecting Instructions - Ensure `/llms.txt` is accessible - Check robots.txt allows LLM bots - Verify content is properly formatted ### Sitemap Issues Check `@astrojs/sitemap` documentation for detailed troubleshooting: https://docs.astro.build/en/guides/integrations-guide/sitemap/ ## Migration Guide ### From Manual Files If you have existing static files in `/public`, remove them: ```bash rm public/robots.txt rm public/humans.txt rm public/sitemap.xml ``` Then configure the integration with your existing content: ```typescript discovery({ humans: { team: [/* your existing team data */], thanks: [/* your existing thanks */] } }) ``` ### From @astrojs/sitemap Replace: ```typescript import sitemap from '@astrojs/sitemap'; export default defineConfig({ integrations: [sitemap()] }); ``` With: ```typescript import discovery from '@astrojs/discovery'; export default defineConfig({ integrations: [ discovery({ sitemap: { // Your existing sitemap config } }) ] }); ``` ## Examples ### E-commerce Site ```typescript discovery({ robots: { crawlDelay: 2, additionalAgents: [ { userAgent: 'PriceBot', disallow: ['/checkout', '/account'] } ] }, llms: { description: 'Online store for sustainable products', keyFeatures: [ 'Eco-friendly product catalog', 'Carbon footprint calculator', 'Sustainable shipping options' ], apiEndpoints: [ { path: '/api/products', description: 'Product catalog' }, { path: '/api/calculate-carbon', description: 'Carbon calculator' } ] }, sitemap: { filter: (page) => !page.includes('/checkout') && !page.includes('/account') } }) ``` ### Documentation Site ```typescript discovery({ llms: { description: 'Technical documentation for our API', instructions: ` When helping users: 1. Search documentation before answering 2. Provide code examples from /examples 3. Link to relevant API reference pages 4. Suggest similar solutions from FAQ `, importantPages: async () => { const docs = await getCollection('docs'); return docs .filter(doc => doc.data.featured) .map(doc => ({ name: doc.data.title, path: `/docs/${doc.slug}`, description: doc.data.description })); } }, humans: { team: [ { name: 'Documentation Team', contact: 'docs@example.com' } ], thanks: [ 'Our amazing community contributors', 'Technical writers worldwide' ] } }) ``` ### Personal Blog ```typescript discovery({ llms: { description: 'Personal blog about web development', brandVoice: [ 'Casual and friendly', 'Technical but accessible', 'Focus on practical examples' ] }, humans: { team: [ { name: 'Jane Blogger', role: 'Writer & Developer', twitter: '@janeblogger', github: 'jane-dev' } ], story: ` Started this blog to document my journey learning web development. Went from tutorial hell to building real projects. Now sharing what I've learned to help others on their journey. `, funFacts: [ 'All posts written in markdown', 'Powered by coffee and curiosity', 'Deployed automatically on every commit' ] } }) ``` ## Performance The integration is designed for minimal performance impact: - **Build Time**: Adds ~100-200ms to build process - **Runtime**: All files are statically generated at build time - **Caching**: Smart HTTP cache headers reduce server load - **Bundle Size**: Zero client-side JavaScript ## Contributing We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md). ## License MIT ## Related - [@astrojs/sitemap](https://docs.astro.build/en/guides/integrations-guide/sitemap/) - [humanstxt.org](https://humanstxt.org/) - [llms.txt spec](https://github.com/anthropics/llm-txt) - [robots.txt spec](https://developers.google.com/search/docs/crawling-indexing/robots/intro) ## Credits Built with inspiration from: - The Astro community - humanstxt.org initiative - Anthropic's llms.txt proposal - Web standards organizations --- **Made with ❀️ by the Astro community**