# @astrojs/discovery > Comprehensive discovery integration for Astro - handles robots.txt, llms.txt, humans.txt, and sitemap generation ## Overview This integration provides automatic generation of all standard discovery files for your Astro site, making it easily discoverable by search engines, LLMs, and humans. ## Features - πŸ€– **robots.txt** - Dynamic generation with LLM bot support - 🧠 **llms.txt** - AI assistant discovery and instructions - πŸ‘₯ **humans.txt** - Human-readable credits and tech stack - πŸ—ΊοΈ **sitemap.xml** - Automatic sitemap generation - ⚑ **Dynamic URLs** - Adapts to your `site` config - 🎯 **Smart Caching** - Optimized cache headers - πŸ”§ **Fully Customizable** - Override any section ## Installation ```bash npx astro add @astrojs/discovery ``` Or manually: ```bash npm install @astrojs/discovery ``` ## Quick Start ### Basic Setup ```typescript // astro.config.mjs import { defineConfig } from 'astro'; import discovery from '@astrojs/discovery'; export default defineConfig({ site: 'https://example.com', integrations: [ discovery() ] }); ``` That's it! This will generate: - `/robots.txt` - `/llms.txt` - `/humans.txt` - `/sitemap-index.xml` ### With Configuration ```typescript // astro.config.mjs import { defineConfig } from 'astro'; import discovery from '@astrojs/discovery'; export default defineConfig({ site: 'https://example.com', integrations: [ discovery({ // Robots.txt configuration robots: { crawlDelay: 2, additionalAgents: [ { userAgent: 'CustomBot', allow: ['/api'], disallow: ['/admin'] } ] }, // LLMs.txt configuration llms: { description: 'Your site description for AI assistants', apiEndpoints: [ { path: '/api/chat', description: 'Chat endpoint' }, { path: '/api/search', description: 'Search API' } ], instructions: ` When helping users with our site: 1. Check documentation first 2. Use provided API endpoints 3. Follow brand guidelines ` }, // Humans.txt configuration humans: { team: [ { name: 'Jane Doe', role: 'Creator & Developer', contact: 'jane@example.com', location: 'San Francisco, CA' } ], thanks: [ 'The Astro team', 'Open source community' ], site: { lastUpdate: 'auto', // or specific date language: 'English', doctype: 'HTML5', ide: 'VS Code', techStack: ['Astro', 'TypeScript', 'React'] }, story: 'Your project story...', funFacts: [ 'Built with love', 'Coffee-powered development' ] }, // Sitemap configuration sitemap: { // Passed through to @astrojs/sitemap filter: (page) => !page.includes('/admin'), changefreq: 'weekly', priority: 0.7 } }) ] }); ``` ## API Reference ### `discovery(options?)` #### Options ##### `robots` Configuration for robots.txt generation. **Type:** ```typescript interface RobotsConfig { crawlDelay?: number; allowAllBots?: boolean; llmBots?: { enabled?: boolean; agents?: string[]; // Custom LLM bot names }; additionalAgents?: Array<{ userAgent: string; allow?: string[]; disallow?: string[]; }>; customRules?: string; // Raw robots.txt content to append } ``` **Default:** ```typescript { crawlDelay: 1, allowAllBots: true, llmBots: { enabled: true, agents: [ 'Anthropic-AI', 'Claude-Web', 'GPTBot', 'ChatGPT-User', 'cohere-ai', 'Google-Extended' ] } } ``` **Example:** ```typescript discovery({ robots: { crawlDelay: 2, llmBots: { enabled: true, agents: ['CustomAIBot', 'AnotherBot'] }, additionalAgents: [ { userAgent: 'BadBot', disallow: ['/'] } ] } }) ``` ##### `llms` Configuration for llms.txt generation. **Type:** ```typescript interface LLMsConfig { enabled?: boolean; description?: string; keyFeatures?: string[]; importantPages?: Array<{ name: string; path: string; description?: string; }>; instructions?: string; apiEndpoints?: Array<{ path: string; method?: string; description: string; }>; techStack?: { frontend?: string[]; backend?: string[]; ai?: string[]; other?: string[]; }; brandVoice?: string[]; customSections?: Record; } ``` **Example:** ```typescript discovery({ llms: { description: 'E-commerce platform for sustainable products', keyFeatures: [ 'AI-powered product recommendations', 'Carbon footprint calculator', 'Subscription management' ], instructions: ` When helping users: 1. Check product availability via API 2. Suggest sustainable alternatives 3. Calculate shipping costs `, apiEndpoints: [ { path: '/api/products', method: 'GET', description: 'List all products' }, { path: '/api/calculate-footprint', method: 'POST', description: 'Calculate carbon footprint' } ] } }) ``` ##### `humans` Configuration for humans.txt generation. **Type:** ```typescript interface HumansConfig { enabled?: boolean; team?: Array<{ name: string; role?: string; contact?: string; location?: string; twitter?: string; github?: string; }>; thanks?: string[]; site?: { lastUpdate?: string | 'auto'; language?: string; doctype?: string; ide?: string; techStack?: string[]; standards?: string[]; components?: string[]; software?: string[]; }; story?: string; funFacts?: string[]; philosophy?: string[]; customSections?: Record; } ``` **Example:** ```typescript discovery({ humans: { team: [ { name: 'Alice Developer', role: 'Lead Developer', contact: 'alice@example.com', location: 'New York', github: 'alice-dev' } ], thanks: [ 'Coffee', 'Stack Overflow community', 'My rubber duck' ], story: ` This project started when we realized that... `, funFacts: [ 'Written entirely on a mechanical keyboard', 'Fueled by 347 cups of coffee', 'Built during a 48-hour hackathon' ] } }) ``` ##### `sitemap` Configuration passed to `@astrojs/sitemap`. **Type:** ```typescript interface SitemapConfig { filter?: (page: string) => boolean; customPages?: string[]; i18n?: { defaultLocale: string; locales: Record; }; changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never'; lastmod?: Date; priority?: number; serialize?: (item: SitemapItem) => SitemapItem | undefined; } ``` **Example:** ```typescript discovery({ sitemap: { filter: (page) => !page.includes('/admin') && !page.includes('/draft'), changefreq: 'daily', priority: 0.8 } }) ``` ##### `caching` Configure HTTP cache headers for discovery files. **Type:** ```typescript interface CachingConfig { robots?: number; // seconds llms?: number; humans?: number; sitemap?: number; } ``` **Default:** ```typescript { robots: 3600, // 1 hour llms: 3600, // 1 hour humans: 86400, // 24 hours sitemap: 3600 // 1 hour } ``` ## Advanced Usage ### Custom Templates You can provide custom templates for any file: ```typescript discovery({ templates: { robots: (config, siteURL) => ` User-agent: * Allow: / # Custom content Sitemap: ${siteURL}/sitemap-index.xml `, llms: (config, siteURL) => ` # ${config.description} Visit ${siteURL} for more information. ` } }) ``` ### Conditional Generation Disable specific files in certain environments: ```typescript discovery({ robots: { enabled: import.meta.env.PROD // Only in production }, llms: { enabled: true // Always generate }, humans: { enabled: import.meta.env.DEV // Only in development } }) ``` ### Dynamic Content Use functions for dynamic content: ```typescript discovery({ llms: { description: () => { const pkg = JSON.parse(fs.readFileSync('./package.json', 'utf-8')); return `${pkg.name} - ${pkg.description}`; }, apiEndpoints: async () => { // Load from OpenAPI spec const spec = await loadOpenAPISpec(); return spec.paths.map(path => ({ path: path.url, method: path.method, description: path.summary })); } } }) ``` ## Integration with Other Tools ### With @astrojs/sitemap The discovery integration automatically includes `@astrojs/sitemap`, so you don't need to install it separately. Configuration is passed through: ```typescript discovery({ sitemap: { // All @astrojs/sitemap options work here filter: (page) => !page.includes('/secret'), changefreq: 'weekly' } }) ``` ### With Content Collections Automatically extract information from content collections: ```typescript discovery({ llms: { importantPages: async () => { const docs = await getCollection('docs'); return docs.map(doc => ({ name: doc.data.title, path: `/docs/${doc.slug}`, description: doc.data.description })); } } }) ``` ### With Environment Variables Use environment variables for sensitive information: ```typescript discovery({ humans: { team: [ { name: 'Developer', contact: process.env.PUBLIC_CONTACT_EMAIL } ] } }) ``` ## Output The integration generates the following files: ### `/robots.txt` ``` User-agent: * Allow: / # Sitemaps Sitemap: https://example.com/sitemap-index.xml # LLM-specific resources User-agent: Anthropic-AI User-agent: Claude-Web User-agent: GPTBot Allow: /llms.txt # Crawl delay Crawl-delay: 1 ``` ### `/llms.txt` ``` # Project Name - Description > Short tagline ## Site Information - Name: Project Name - Description: Full description - URL: https://example.com ## For AI Assistants Instructions for AI assistants... ## API Endpoints - GET /api/endpoint - Description ``` ### `/humans.txt` ``` /* TEAM */ Name: Developer Name Role: Position Contact: email@example.com /* THANKS */ - Thank you note 1 - Thank you note 2 /* SITE */ Tech stack and details... ``` ### `/sitemap-index.xml` Standard XML sitemap with all your pages. ## Best Practices ### 1. **Set Your Site URL** Always configure `site` in your Astro config: ```typescript export default defineConfig({ site: 'https://example.com', // Required! integrations: [discovery()] }); ``` ### 2. **Keep humans.txt Updated** Update your team information and tech stack regularly: ```typescript discovery({ humans: { site: { lastUpdate: 'auto' // Automatically uses current date } } }) ``` ### 3. **Be Specific with LLM Instructions** Provide clear, actionable instructions for AI assistants: ```typescript discovery({ llms: { instructions: ` When helping users: 1. Always check API documentation first 2. Use the /api/search endpoint for queries 3. Format responses in markdown 4. Include relevant links ` } }) ``` ### 4. **Filter Private Pages** Exclude admin, draft, and private pages: ```typescript discovery({ sitemap: { filter: (page) => { return !page.includes('/admin') && !page.includes('/draft') && !page.includes('/private'); } }, robots: { additionalAgents: [ { userAgent: '*', disallow: ['/admin', '/draft', '/private'] } ] } }) ``` ### 5. **Optimize Cache Headers** Balance freshness with server load: ```typescript discovery({ caching: { robots: 3600, // 1 hour - changes rarely llms: 1800, // 30 min - may update instructions humans: 86400, // 24 hours - credits don't change often sitemap: 3600 // 1 hour - content changes moderately } }) ``` ## Troubleshooting ### Files Not Generating 1. **Check your output mode:** ```typescript export default defineConfig({ output: 'hybrid', // or 'server' // ... }); ``` 2. **Verify site URL is set:** ```typescript export default defineConfig({ site: 'https://example.com' // Must be set! }); ``` 3. **Check for conflicts:** Remove any existing `/public/robots.txt` or similar static files. ### Wrong URLs in Files Make sure your `site` config matches your production domain: ```typescript export default defineConfig({ site: import.meta.env.PROD ? 'https://production.com' : 'http://localhost:4321' }); ``` ### LLM Bots Not Respecting Instructions - Ensure `/llms.txt` is accessible - Check robots.txt allows LLM bots - Verify content is properly formatted ### Sitemap Issues Check `@astrojs/sitemap` documentation for detailed troubleshooting: https://docs.astro.build/en/guides/integrations-guide/sitemap/ ## Migration Guide ### From Manual Files If you have existing static files in `/public`, remove them: ```bash rm public/robots.txt rm public/humans.txt rm public/sitemap.xml ``` Then configure the integration with your existing content: ```typescript discovery({ humans: { team: [/* your existing team data */], thanks: [/* your existing thanks */] } }) ``` ### From @astrojs/sitemap Replace: ```typescript import sitemap from '@astrojs/sitemap'; export default defineConfig({ integrations: [sitemap()] }); ``` With: ```typescript import discovery from '@astrojs/discovery'; export default defineConfig({ integrations: [ discovery({ sitemap: { // Your existing sitemap config } }) ] }); ``` ## Examples ### E-commerce Site ```typescript discovery({ robots: { crawlDelay: 2, additionalAgents: [ { userAgent: 'PriceBot', disallow: ['/checkout', '/account'] } ] }, llms: { description: 'Online store for sustainable products', keyFeatures: [ 'Eco-friendly product catalog', 'Carbon footprint calculator', 'Sustainable shipping options' ], apiEndpoints: [ { path: '/api/products', description: 'Product catalog' }, { path: '/api/calculate-carbon', description: 'Carbon calculator' } ] }, sitemap: { filter: (page) => !page.includes('/checkout') && !page.includes('/account') } }) ``` ### Documentation Site ```typescript discovery({ llms: { description: 'Technical documentation for our API', instructions: ` When helping users: 1. Search documentation before answering 2. Provide code examples from /examples 3. Link to relevant API reference pages 4. Suggest similar solutions from FAQ `, importantPages: async () => { const docs = await getCollection('docs'); return docs .filter(doc => doc.data.featured) .map(doc => ({ name: doc.data.title, path: `/docs/${doc.slug}`, description: doc.data.description })); } }, humans: { team: [ { name: 'Documentation Team', contact: 'docs@example.com' } ], thanks: [ 'Our amazing community contributors', 'Technical writers worldwide' ] } }) ``` ### Personal Blog ```typescript discovery({ llms: { description: 'Personal blog about web development', brandVoice: [ 'Casual and friendly', 'Technical but accessible', 'Focus on practical examples' ] }, humans: { team: [ { name: 'Jane Blogger', role: 'Writer & Developer', twitter: '@janeblogger', github: 'jane-dev' } ], story: ` Started this blog to document my journey learning web development. Went from tutorial hell to building real projects. Now sharing what I've learned to help others on their journey. `, funFacts: [ 'All posts written in markdown', 'Powered by coffee and curiosity', 'Deployed automatically on every commit' ] } }) ``` ## Performance The integration is designed for minimal performance impact: - **Build Time**: Adds ~100-200ms to build process - **Runtime**: All files are statically generated at build time - **Caching**: Smart HTTP cache headers reduce server load - **Bundle Size**: Zero client-side JavaScript ## Contributing We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md). ## License MIT ## Related - [@astrojs/sitemap](https://docs.astro.build/en/guides/integrations-guide/sitemap/) - [humanstxt.org](https://humanstxt.org/) - [llms.txt spec](https://github.com/anthropics/llm-txt) - [robots.txt spec](https://developers.google.com/search/docs/crawling-indexing/robots/intro) ## Credits Built with inspiration from: - The Astro community - humanstxt.org initiative - Anthropic's llms.txt proposal - Web standards organizations --- **Made with ❀️ by the Astro community**