Add comprehensive problem-oriented how-to guides following Diátaxis framework: - Block specific bots from crawling the site - Customize LLM instructions for AI assistants - Add team members to humans.txt - Filter sitemap pages - Configure cache headers for discovery files - Environment-specific configuration - Integration with Astro content collections - Custom templates for discovery files - ActivityPub/Fediverse integration via WebFinger Each guide provides: - Clear prerequisites - Step-by-step solutions - Multiple approaches/variations - Expected outcomes - Alternative approaches - Common issues and troubleshooting Total: 9 guides, 6,677 words
@astrojs/discovery
Comprehensive discovery integration for Astro - handles robots.txt, llms.txt, humans.txt, security.txt, canary.txt, WebFinger, and sitemap generation
Overview
This integration provides automatic generation of all standard discovery files for your Astro site, making it easily discoverable by search engines, LLMs, humans, and federated services, while providing security contact information and transparency mechanisms.
Features
- 🤖 robots.txt - Dynamic generation with LLM bot support
- 🧠 llms.txt - AI assistant discovery and instructions
- 👥 humans.txt - Human-readable credits and tech stack
- 🔒 security.txt - RFC 9116 compliant security contact info
- 🐦 canary.txt - Warrant canary for transparency
- 🔍 WebFinger - RFC 7033 resource discovery (ActivityPub, OpenID)
- 🗺️ sitemap.xml - Automatic sitemap generation
- ⚡ Dynamic URLs - Adapts to your
siteconfig - 🎯 Smart Caching - Optimized cache headers
- 🔧 Fully Customizable - Override any section
Installation
npx astro add @astrojs/discovery
Or manually:
npm install @astrojs/discovery
Quick Start
Basic Setup
// astro.config.mjs
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery()
]
});
That's it! This will generate:
/robots.txt/llms.txt/humans.txt/sitemap-index.xml
To enable security.txt and canary.txt, add their configurations:
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery({
security: {
contact: 'security@example.com',
},
canary: {
organization: 'Example Corp',
contact: 'canary@example.com',
}
})
]
});
This adds:
/.well-known/security.txt/.well-known/canary.txt
With Configuration
// astro.config.mjs
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery({
// Robots.txt configuration
robots: {
crawlDelay: 2,
additionalAgents: [
{
userAgent: 'CustomBot',
allow: ['/api'],
disallow: ['/admin']
}
]
},
// LLMs.txt configuration
llms: {
description: 'Your site description for AI assistants',
apiEndpoints: [
{ path: '/api/chat', description: 'Chat endpoint' },
{ path: '/api/search', description: 'Search API' }
],
instructions: `
When helping users with our site:
1. Check documentation first
2. Use provided API endpoints
3. Follow brand guidelines
`
},
// Humans.txt configuration
humans: {
team: [
{
name: 'Jane Doe',
role: 'Creator & Developer',
contact: 'jane@example.com',
location: 'San Francisco, CA'
}
],
thanks: [
'The Astro team',
'Open source community'
],
site: {
lastUpdate: 'auto', // or specific date
language: 'English',
doctype: 'HTML5',
ide: 'VS Code',
techStack: ['Astro', 'TypeScript', 'React']
},
story: 'Your project story...',
funFacts: [
'Built with love',
'Coffee-powered development'
]
},
// Sitemap configuration
sitemap: {
// Passed through to @astrojs/sitemap
filter: (page) => !page.includes('/admin'),
changefreq: 'weekly',
priority: 0.7
}
})
]
});
API Reference
discovery(options?)
Options
robots
Configuration for robots.txt generation.
Type:
interface RobotsConfig {
crawlDelay?: number;
allowAllBots?: boolean;
llmBots?: {
enabled?: boolean;
agents?: string[]; // Custom LLM bot names
};
additionalAgents?: Array<{
userAgent: string;
allow?: string[];
disallow?: string[];
}>;
customRules?: string; // Raw robots.txt content to append
}
Default:
{
crawlDelay: 1,
allowAllBots: true,
llmBots: {
enabled: true,
agents: [
'Anthropic-AI',
'Claude-Web',
'GPTBot',
'ChatGPT-User',
'cohere-ai',
'Google-Extended'
]
}
}
Example:
discovery({
robots: {
crawlDelay: 2,
llmBots: {
enabled: true,
agents: ['CustomAIBot', 'AnotherBot']
},
additionalAgents: [
{
userAgent: 'BadBot',
disallow: ['/']
}
]
}
})
llms
Configuration for llms.txt generation.
Type:
interface LLMsConfig {
enabled?: boolean;
description?: string;
keyFeatures?: string[];
importantPages?: Array<{
name: string;
path: string;
description?: string;
}>;
instructions?: string;
apiEndpoints?: Array<{
path: string;
method?: string;
description: string;
}>;
techStack?: {
frontend?: string[];
backend?: string[];
ai?: string[];
other?: string[];
};
brandVoice?: string[];
customSections?: Record<string, string>;
}
Example:
discovery({
llms: {
description: 'E-commerce platform for sustainable products',
keyFeatures: [
'AI-powered product recommendations',
'Carbon footprint calculator',
'Subscription management'
],
instructions: `
When helping users:
1. Check product availability via API
2. Suggest sustainable alternatives
3. Calculate shipping costs
`,
apiEndpoints: [
{
path: '/api/products',
method: 'GET',
description: 'List all products'
},
{
path: '/api/calculate-footprint',
method: 'POST',
description: 'Calculate carbon footprint'
}
]
}
})
humans
Configuration for humans.txt generation.
Type:
interface HumansConfig {
enabled?: boolean;
team?: Array<{
name: string;
role?: string;
contact?: string;
location?: string;
twitter?: string;
github?: string;
}>;
thanks?: string[];
site?: {
lastUpdate?: string | 'auto';
language?: string;
doctype?: string;
ide?: string;
techStack?: string[];
standards?: string[];
components?: string[];
software?: string[];
};
story?: string;
funFacts?: string[];
philosophy?: string[];
customSections?: Record<string, string>;
}
Example:
discovery({
humans: {
team: [
{
name: 'Alice Developer',
role: 'Lead Developer',
contact: 'alice@example.com',
location: 'New York',
github: 'alice-dev'
}
],
thanks: [
'Coffee',
'Stack Overflow community',
'My rubber duck'
],
story: `
This project started when we realized that...
`,
funFacts: [
'Written entirely on a mechanical keyboard',
'Fueled by 347 cups of coffee',
'Built during a 48-hour hackathon'
]
}
})
security
Configuration for security.txt generation (RFC 9116).
Type:
interface SecurityConfig {
enabled?: boolean;
contact: string | string[]; // Required: security contact (email or URL)
expires?: string | 'auto'; // Expiration date (default: 1 year)
encryption?: string | string[]; // PGP key URL(s)
acknowledgments?: string; // Hall of fame URL
preferredLanguages?: string[]; // Preferred languages (e.g., ['en', 'es'])
canonical?: string; // Canonical URL
policy?: string; // Security policy URL
hiring?: string; // Security jobs URL
}
Example:
discovery({
security: {
contact: 'security@example.com',
expires: 'auto', // Auto-calculates 1 year from build
encryption: 'https://example.com/pgp-key.txt',
acknowledgments: 'https://example.com/security/hall-of-fame',
preferredLanguages: ['en', 'es'],
policy: 'https://example.com/security/policy'
}
})
Notes:
- Email contacts automatically get
mailto:prefix expires: 'auto'sets expiration to 1 year from generation- Generates at
/.well-known/security.txtper RFC 9116 - Canonical URL defaults to correct .well-known location
canary
Configuration for warrant canary generation.
Type:
interface CanaryConfig {
enabled?: boolean;
organization?: string; // Organization name
contact?: string; // Contact email
frequency?: 'daily' | 'weekly' | 'monthly' | 'quarterly' | 'yearly';
expires?: string | 'auto'; // Expiration (auto-calculated from frequency)
statements?: CanaryStatement[] | (() => CanaryStatement[]);
additionalStatement?: string; // Additional context
verification?: string; // PGP signature URL
previousCanary?: string; // Previous canary URL
blockchainProof?: { // Blockchain verification
network: string;
address: string;
txHash?: string;
timestamp?: string;
};
personnelStatement?: boolean; // Add duress check
}
interface CanaryStatement {
type: string;
description: string;
received: boolean;
}
Example:
discovery({
canary: {
organization: 'Example Corp',
contact: 'canary@example.com',
frequency: 'monthly', // Auto-expires in 35 days
statements: [
{ type: 'nsl', description: 'National Security Letters', received: false },
{ type: 'gag', description: 'Gag orders', received: false }
],
additionalStatement: 'We are committed to transparency.',
verification: 'PGP Signature: https://example.com/canary.txt.asc',
personnelStatement: true // Adds duress check
}
})
Frequency-based expiration:
daily: 2 daysweekly: 10 daysmonthly: 35 daysquarterly: 100 daysyearly: 380 days
Notes:
- Only non-received statements appear in output
- Statements can be a function for dynamic generation
- Generates at
/.well-known/canary.txt - See CANARY_SPEC.md for full specification
webfinger
Configuration for WebFinger resource discovery (RFC 7033).
Type:
interface WebFingerConfig {
enabled?: boolean; // Opt-in (default: false)
resources?: WebFingerResource[]; // Static resources
collections?: { // Content collection integration
name: string; // Collection name (e.g., 'team')
resourceTemplate: string; // URI template: 'acct:{slug}@example.com'
subjectTemplate?: string; // Defaults to resourceTemplate
linksBuilder?: (entry: any) => WebFingerLink[];
aliasesBuilder?: (entry: any) => string[];
propertiesBuilder?: (entry: any) => Record<string, string | null>;
}[];
}
interface WebFingerResource {
resource: string; // Resource URI (acct:, https://, etc.)
subject?: string; // Subject (defaults to resource)
aliases?: string[]; // Alternative URIs
properties?: Record<string, string | null>; // URI-based properties
links?: WebFingerLink[]; // Related links
}
interface WebFingerLink {
rel: string; // Link relation (URI or IANA type)
href?: string; // Target URI
type?: string; // Media type
titles?: Record<string, string>; // Titles with language tags
properties?: Record<string, string | null>;
}
Example (Static Resources):
discovery({
webfinger: {
enabled: true,
resources: [
{
resource: 'acct:alice@example.com',
aliases: ['https://example.com/@alice'],
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: 'https://example.com/@alice'
},
{
rel: 'self',
type: 'application/activity+json', // ActivityPub/Mastodon
href: 'https://example.com/users/alice'
}
]
}
]
}
})
Example (Content Collection):
discovery({
webfinger: {
enabled: true,
collections: [{
name: 'team', // Astro content collection
resourceTemplate: 'acct:{slug}@example.com',
linksBuilder: (member) => [
{
rel: 'http://webfinger.net/rel/profile-page',
href: `https://example.com/team/${member.slug}`,
type: 'text/html'
},
{
rel: 'http://webfinger.net/rel/avatar',
href: member.data.avatar,
type: 'image/jpeg'
}
],
propertiesBuilder: (member) => ({
'http://schema.org/name': member.data.name,
'http://schema.org/jobTitle': member.data.role
})
}]
}
})
Common Use Cases:
- ActivityPub/Mastodon: Enable federated social network discovery
- OpenID Connect: Provide issuer discovery for authentication
- Team Profiles: Make team members discoverable across services
- Author Discovery: Link blog authors to their profiles/social accounts
Query Format:
GET /.well-known/webfinger?resource=acct:alice@example.com
GET /.well-known/webfinger?resource=acct:alice@example.com&rel=self
Notes:
- Dynamic route - not prerendered
- Requires
?resource=query parameter (RFC 7033) - Optional
?rel=parameter filters links - CORS enabled (
Access-Control-Allow-Origin: *) - Media type:
application/jrd+json - Template vars:
{slug},{id},{data.fieldName},{siteURL}
sitemap
Configuration passed to @astrojs/sitemap.
Type:
interface SitemapConfig {
filter?: (page: string) => boolean;
customPages?: string[];
i18n?: {
defaultLocale: string;
locales: Record<string, string>;
};
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
lastmod?: Date;
priority?: number;
serialize?: (item: SitemapItem) => SitemapItem | undefined;
}
Example:
discovery({
sitemap: {
filter: (page) => !page.includes('/admin') && !page.includes('/draft'),
changefreq: 'daily',
priority: 0.8
}
})
caching
Configure HTTP cache headers for discovery files.
Type:
interface CachingConfig {
robots?: number; // seconds
llms?: number;
humans?: number;
security?: number;
canary?: number;
webfinger?: number;
sitemap?: number;
}
Default:
{
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
security: 86400, // 24 hours
canary: 3600, // 1 hour (check frequently!)
webfinger: 3600, // 1 hour
sitemap: 3600 // 1 hour
}
Advanced Usage
Custom Templates
You can provide custom templates for any file:
discovery({
templates: {
robots: (config, siteURL) => `
User-agent: *
Allow: /
# Custom content
Sitemap: ${siteURL}/sitemap-index.xml
`,
llms: (config, siteURL) => `
# ${config.description}
Visit ${siteURL} for more information.
`,
security: (config, siteURL) => `
# Custom security.txt format
Contact: ${config.contact}
Expires: ${config.expires || new Date(Date.now() + 365*24*60*60*1000).toISOString()}
`,
canary: (config, siteURL) => `
# Custom canary format
Organization: ${config.organization}
Last-Updated: ${new Date().toISOString()}
`
}
})
Conditional Generation
Disable specific files in certain environments:
discovery({
robots: {
enabled: import.meta.env.PROD // Only in production
},
llms: {
enabled: true // Always generate
},
humans: {
enabled: import.meta.env.DEV // Only in development
}
})
Dynamic Content
Use functions for dynamic content:
discovery({
llms: {
description: () => {
const pkg = JSON.parse(fs.readFileSync('./package.json', 'utf-8'));
return `${pkg.name} - ${pkg.description}`;
},
apiEndpoints: async () => {
// Load from OpenAPI spec
const spec = await loadOpenAPISpec();
return spec.paths.map(path => ({
path: path.url,
method: path.method,
description: path.summary
}));
}
}
})
Integration with Other Tools
With @astrojs/sitemap
The discovery integration automatically includes @astrojs/sitemap, so you don't need to install it separately. Configuration is passed through:
discovery({
sitemap: {
// All @astrojs/sitemap options work here
filter: (page) => !page.includes('/secret'),
changefreq: 'weekly'
}
})
With Content Collections
Automatically extract information from content collections:
discovery({
llms: {
importantPages: async () => {
const docs = await getCollection('docs');
return docs.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
}
})
With Environment Variables
Use environment variables for sensitive information:
discovery({
humans: {
team: [
{
name: 'Developer',
contact: process.env.PUBLIC_CONTACT_EMAIL
}
]
}
})
Output
The integration generates the following files:
/robots.txt
User-agent: *
Allow: /
# Sitemaps
Sitemap: https://example.com/sitemap-index.xml
# LLM-specific resources
User-agent: Anthropic-AI
User-agent: Claude-Web
User-agent: GPTBot
Allow: /llms.txt
# Crawl delay
Crawl-delay: 1
/llms.txt
# Project Name - Description
> Short tagline
## Site Information
- Name: Project Name
- Description: Full description
- URL: https://example.com
## For AI Assistants
Instructions for AI assistants...
## API Endpoints
- GET /api/endpoint - Description
/humans.txt
/* TEAM */
Name: Developer Name
Role: Position
Contact: email@example.com
/* THANKS */
- Thank you note 1
- Thank you note 2
/* SITE */
Tech stack and details...
/sitemap-index.xml
Standard XML sitemap with all your pages.
Best Practices
1. Set Your Site URL
Always configure site in your Astro config:
export default defineConfig({
site: 'https://example.com', // Required!
integrations: [discovery()]
});
2. Keep humans.txt Updated
Update your team information and tech stack regularly:
discovery({
humans: {
site: {
lastUpdate: 'auto' // Automatically uses current date
}
}
})
3. Be Specific with LLM Instructions
Provide clear, actionable instructions for AI assistants:
discovery({
llms: {
instructions: `
When helping users:
1. Always check API documentation first
2. Use the /api/search endpoint for queries
3. Format responses in markdown
4. Include relevant links
`
}
})
4. Filter Private Pages
Exclude admin, draft, and private pages:
discovery({
sitemap: {
filter: (page) => {
return !page.includes('/admin') &&
!page.includes('/draft') &&
!page.includes('/private');
}
},
robots: {
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private']
}
]
}
})
5. Optimize Cache Headers
Balance freshness with server load:
discovery({
caching: {
robots: 3600, // 1 hour - changes rarely
llms: 1800, // 30 min - may update instructions
humans: 86400, // 24 hours - credits don't change often
sitemap: 3600 // 1 hour - content changes moderately
}
})
Troubleshooting
Files Not Generating
- Check your output mode:
export default defineConfig({
output: 'hybrid', // or 'server'
// ...
});
- Verify site URL is set:
export default defineConfig({
site: 'https://example.com' // Must be set!
});
- Check for conflicts:
Remove any existing
/public/robots.txtor similar static files.
Wrong URLs in Files
Make sure your site config matches your production domain:
export default defineConfig({
site: import.meta.env.PROD
? 'https://production.com'
: 'http://localhost:4321'
});
LLM Bots Not Respecting Instructions
- Ensure
/llms.txtis accessible - Check robots.txt allows LLM bots
- Verify content is properly formatted
Sitemap Issues
Check @astrojs/sitemap documentation for detailed troubleshooting:
https://docs.astro.build/en/guides/integrations-guide/sitemap/
Migration Guide
From Manual Files
If you have existing static files in /public, remove them:
rm public/robots.txt
rm public/humans.txt
rm public/sitemap.xml
Then configure the integration with your existing content:
discovery({
humans: {
team: [/* your existing team data */],
thanks: [/* your existing thanks */]
}
})
From @astrojs/sitemap
Replace:
import sitemap from '@astrojs/sitemap';
export default defineConfig({
integrations: [sitemap()]
});
With:
import discovery from '@astrojs/discovery';
export default defineConfig({
integrations: [
discovery({
sitemap: {
// Your existing sitemap config
}
})
]
});
Examples
E-commerce Site
discovery({
robots: {
crawlDelay: 2,
additionalAgents: [
{
userAgent: 'PriceBot',
disallow: ['/checkout', '/account']
}
]
},
llms: {
description: 'Online store for sustainable products',
keyFeatures: [
'Eco-friendly product catalog',
'Carbon footprint calculator',
'Sustainable shipping options'
],
apiEndpoints: [
{ path: '/api/products', description: 'Product catalog' },
{ path: '/api/calculate-carbon', description: 'Carbon calculator' }
]
},
sitemap: {
filter: (page) =>
!page.includes('/checkout') &&
!page.includes('/account')
}
})
Documentation Site
discovery({
llms: {
description: 'Technical documentation for our API',
instructions: `
When helping users:
1. Search documentation before answering
2. Provide code examples from /examples
3. Link to relevant API reference pages
4. Suggest similar solutions from FAQ
`,
importantPages: async () => {
const docs = await getCollection('docs');
return docs
.filter(doc => doc.data.featured)
.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
},
humans: {
team: [
{
name: 'Documentation Team',
contact: 'docs@example.com'
}
],
thanks: [
'Our amazing community contributors',
'Technical writers worldwide'
]
}
})
Personal Blog
discovery({
llms: {
description: 'Personal blog about web development',
brandVoice: [
'Casual and friendly',
'Technical but accessible',
'Focus on practical examples'
]
},
humans: {
team: [
{
name: 'Jane Blogger',
role: 'Writer & Developer',
twitter: '@janeblogger',
github: 'jane-dev'
}
],
story: `
Started this blog to document my journey learning web development.
Went from tutorial hell to building real projects. Now sharing
what I've learned to help others on their journey.
`,
funFacts: [
'All posts written in markdown',
'Powered by coffee and curiosity',
'Deployed automatically on every commit'
]
}
})
Performance
The integration is designed for minimal performance impact:
- Build Time: Adds ~100-200ms to build process
- Runtime: All files are statically generated at build time
- Caching: Smart HTTP cache headers reduce server load
- Bundle Size: Zero client-side JavaScript
Contributing
We welcome contributions! See our Contributing Guide.
License
MIT
Related
Credits
Built with inspiration from:
- The Astro community
- humanstxt.org initiative
- Anthropic's llms.txt proposal
- Web standards organizations
Made with ❤️ by the Astro community