Updated README with complete WebFinger section:
## Overview & Features
- Updated tagline to include WebFinger
- Added 🔍 WebFinger to feature list with use cases
## API Reference - webfinger section
### Complete TypeScript Interfaces
- WebFingerConfig with enabled, resources, collections
- WebFingerResource (JRD structure)
- WebFingerLink (rel, href, type, titles, properties)
### Static Resources Example
- Shows simple acct: URI configuration
- ActivityPub/Mastodon integration example
- Profile page and avatar links
### Content Collection Example
- Team members as discoverable resources
- Template URI patterns: acct:{slug}@example.com
- Builder functions for links and properties
- Schema.org property integration
### Common Use Cases
- ActivityPub/Mastodon federation
- OpenID Connect issuer discovery
- Team profile discovery
- Blog author linking
### Query Format Documentation
- Required resource parameter
- Optional rel filtering
- Example queries
### Technical Notes
- Dynamic route (not prerendered)
- CORS enabled per RFC 7033
- Media type: application/jrd+json
- Template variable reference
## Caching Section
- Added webfinger: 3600 (1 hour) to defaults table
Documentation now covers all 7 discovery mechanisms with examples and best practices.
1140 lines
25 KiB
Markdown
1140 lines
25 KiB
Markdown
# @astrojs/discovery
|
|
|
|
> Comprehensive discovery integration for Astro - handles robots.txt, llms.txt, humans.txt, security.txt, canary.txt, WebFinger, and sitemap generation
|
|
|
|
## Overview
|
|
|
|
This integration provides automatic generation of all standard discovery files for your Astro site, making it easily discoverable by search engines, LLMs, humans, and federated services, while providing security contact information and transparency mechanisms.
|
|
|
|
## Features
|
|
|
|
- 🤖 **robots.txt** - Dynamic generation with LLM bot support
|
|
- 🧠 **llms.txt** - AI assistant discovery and instructions
|
|
- 👥 **humans.txt** - Human-readable credits and tech stack
|
|
- 🔒 **security.txt** - RFC 9116 compliant security contact info
|
|
- 🐦 **canary.txt** - Warrant canary for transparency
|
|
- 🔍 **WebFinger** - RFC 7033 resource discovery (ActivityPub, OpenID)
|
|
- 🗺️ **sitemap.xml** - Automatic sitemap generation
|
|
- ⚡ **Dynamic URLs** - Adapts to your `site` config
|
|
- 🎯 **Smart Caching** - Optimized cache headers
|
|
- 🔧 **Fully Customizable** - Override any section
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
npx astro add @astrojs/discovery
|
|
```
|
|
|
|
Or manually:
|
|
|
|
```bash
|
|
npm install @astrojs/discovery
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### Basic Setup
|
|
|
|
```typescript
|
|
// astro.config.mjs
|
|
import { defineConfig } from 'astro';
|
|
import discovery from '@astrojs/discovery';
|
|
|
|
export default defineConfig({
|
|
site: 'https://example.com',
|
|
integrations: [
|
|
discovery()
|
|
]
|
|
});
|
|
```
|
|
|
|
That's it! This will generate:
|
|
- `/robots.txt`
|
|
- `/llms.txt`
|
|
- `/humans.txt`
|
|
- `/sitemap-index.xml`
|
|
|
|
To enable security.txt and canary.txt, add their configurations:
|
|
|
|
```typescript
|
|
export default defineConfig({
|
|
site: 'https://example.com',
|
|
integrations: [
|
|
discovery({
|
|
security: {
|
|
contact: 'security@example.com',
|
|
},
|
|
canary: {
|
|
organization: 'Example Corp',
|
|
contact: 'canary@example.com',
|
|
}
|
|
})
|
|
]
|
|
});
|
|
```
|
|
|
|
This adds:
|
|
- `/.well-known/security.txt`
|
|
- `/.well-known/canary.txt`
|
|
|
|
### With Configuration
|
|
|
|
```typescript
|
|
// astro.config.mjs
|
|
import { defineConfig } from 'astro';
|
|
import discovery from '@astrojs/discovery';
|
|
|
|
export default defineConfig({
|
|
site: 'https://example.com',
|
|
integrations: [
|
|
discovery({
|
|
// Robots.txt configuration
|
|
robots: {
|
|
crawlDelay: 2,
|
|
additionalAgents: [
|
|
{
|
|
userAgent: 'CustomBot',
|
|
allow: ['/api'],
|
|
disallow: ['/admin']
|
|
}
|
|
]
|
|
},
|
|
|
|
// LLMs.txt configuration
|
|
llms: {
|
|
description: 'Your site description for AI assistants',
|
|
apiEndpoints: [
|
|
{ path: '/api/chat', description: 'Chat endpoint' },
|
|
{ path: '/api/search', description: 'Search API' }
|
|
],
|
|
instructions: `
|
|
When helping users with our site:
|
|
1. Check documentation first
|
|
2. Use provided API endpoints
|
|
3. Follow brand guidelines
|
|
`
|
|
},
|
|
|
|
// Humans.txt configuration
|
|
humans: {
|
|
team: [
|
|
{
|
|
name: 'Jane Doe',
|
|
role: 'Creator & Developer',
|
|
contact: 'jane@example.com',
|
|
location: 'San Francisco, CA'
|
|
}
|
|
],
|
|
thanks: [
|
|
'The Astro team',
|
|
'Open source community'
|
|
],
|
|
site: {
|
|
lastUpdate: 'auto', // or specific date
|
|
language: 'English',
|
|
doctype: 'HTML5',
|
|
ide: 'VS Code',
|
|
techStack: ['Astro', 'TypeScript', 'React']
|
|
},
|
|
story: 'Your project story...',
|
|
funFacts: [
|
|
'Built with love',
|
|
'Coffee-powered development'
|
|
]
|
|
},
|
|
|
|
// Sitemap configuration
|
|
sitemap: {
|
|
// Passed through to @astrojs/sitemap
|
|
filter: (page) => !page.includes('/admin'),
|
|
changefreq: 'weekly',
|
|
priority: 0.7
|
|
}
|
|
})
|
|
]
|
|
});
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### `discovery(options?)`
|
|
|
|
#### Options
|
|
|
|
##### `robots`
|
|
|
|
Configuration for robots.txt generation.
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface RobotsConfig {
|
|
crawlDelay?: number;
|
|
allowAllBots?: boolean;
|
|
llmBots?: {
|
|
enabled?: boolean;
|
|
agents?: string[]; // Custom LLM bot names
|
|
};
|
|
additionalAgents?: Array<{
|
|
userAgent: string;
|
|
allow?: string[];
|
|
disallow?: string[];
|
|
}>;
|
|
customRules?: string; // Raw robots.txt content to append
|
|
}
|
|
```
|
|
|
|
**Default:**
|
|
```typescript
|
|
{
|
|
crawlDelay: 1,
|
|
allowAllBots: true,
|
|
llmBots: {
|
|
enabled: true,
|
|
agents: [
|
|
'Anthropic-AI',
|
|
'Claude-Web',
|
|
'GPTBot',
|
|
'ChatGPT-User',
|
|
'cohere-ai',
|
|
'Google-Extended'
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```typescript
|
|
discovery({
|
|
robots: {
|
|
crawlDelay: 2,
|
|
llmBots: {
|
|
enabled: true,
|
|
agents: ['CustomAIBot', 'AnotherBot']
|
|
},
|
|
additionalAgents: [
|
|
{
|
|
userAgent: 'BadBot',
|
|
disallow: ['/']
|
|
}
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
##### `llms`
|
|
|
|
Configuration for llms.txt generation.
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface LLMsConfig {
|
|
enabled?: boolean;
|
|
description?: string;
|
|
keyFeatures?: string[];
|
|
importantPages?: Array<{
|
|
name: string;
|
|
path: string;
|
|
description?: string;
|
|
}>;
|
|
instructions?: string;
|
|
apiEndpoints?: Array<{
|
|
path: string;
|
|
method?: string;
|
|
description: string;
|
|
}>;
|
|
techStack?: {
|
|
frontend?: string[];
|
|
backend?: string[];
|
|
ai?: string[];
|
|
other?: string[];
|
|
};
|
|
brandVoice?: string[];
|
|
customSections?: Record<string, string>;
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```typescript
|
|
discovery({
|
|
llms: {
|
|
description: 'E-commerce platform for sustainable products',
|
|
keyFeatures: [
|
|
'AI-powered product recommendations',
|
|
'Carbon footprint calculator',
|
|
'Subscription management'
|
|
],
|
|
instructions: `
|
|
When helping users:
|
|
1. Check product availability via API
|
|
2. Suggest sustainable alternatives
|
|
3. Calculate shipping costs
|
|
`,
|
|
apiEndpoints: [
|
|
{
|
|
path: '/api/products',
|
|
method: 'GET',
|
|
description: 'List all products'
|
|
},
|
|
{
|
|
path: '/api/calculate-footprint',
|
|
method: 'POST',
|
|
description: 'Calculate carbon footprint'
|
|
}
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
##### `humans`
|
|
|
|
Configuration for humans.txt generation.
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface HumansConfig {
|
|
enabled?: boolean;
|
|
team?: Array<{
|
|
name: string;
|
|
role?: string;
|
|
contact?: string;
|
|
location?: string;
|
|
twitter?: string;
|
|
github?: string;
|
|
}>;
|
|
thanks?: string[];
|
|
site?: {
|
|
lastUpdate?: string | 'auto';
|
|
language?: string;
|
|
doctype?: string;
|
|
ide?: string;
|
|
techStack?: string[];
|
|
standards?: string[];
|
|
components?: string[];
|
|
software?: string[];
|
|
};
|
|
story?: string;
|
|
funFacts?: string[];
|
|
philosophy?: string[];
|
|
customSections?: Record<string, string>;
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```typescript
|
|
discovery({
|
|
humans: {
|
|
team: [
|
|
{
|
|
name: 'Alice Developer',
|
|
role: 'Lead Developer',
|
|
contact: 'alice@example.com',
|
|
location: 'New York',
|
|
github: 'alice-dev'
|
|
}
|
|
],
|
|
thanks: [
|
|
'Coffee',
|
|
'Stack Overflow community',
|
|
'My rubber duck'
|
|
],
|
|
story: `
|
|
This project started when we realized that...
|
|
`,
|
|
funFacts: [
|
|
'Written entirely on a mechanical keyboard',
|
|
'Fueled by 347 cups of coffee',
|
|
'Built during a 48-hour hackathon'
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
##### `security`
|
|
|
|
Configuration for security.txt generation (RFC 9116).
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface SecurityConfig {
|
|
enabled?: boolean;
|
|
contact: string | string[]; // Required: security contact (email or URL)
|
|
expires?: string | 'auto'; // Expiration date (default: 1 year)
|
|
encryption?: string | string[]; // PGP key URL(s)
|
|
acknowledgments?: string; // Hall of fame URL
|
|
preferredLanguages?: string[]; // Preferred languages (e.g., ['en', 'es'])
|
|
canonical?: string; // Canonical URL
|
|
policy?: string; // Security policy URL
|
|
hiring?: string; // Security jobs URL
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```typescript
|
|
discovery({
|
|
security: {
|
|
contact: 'security@example.com',
|
|
expires: 'auto', // Auto-calculates 1 year from build
|
|
encryption: 'https://example.com/pgp-key.txt',
|
|
acknowledgments: 'https://example.com/security/hall-of-fame',
|
|
preferredLanguages: ['en', 'es'],
|
|
policy: 'https://example.com/security/policy'
|
|
}
|
|
})
|
|
```
|
|
|
|
**Notes:**
|
|
- Email contacts automatically get `mailto:` prefix
|
|
- `expires: 'auto'` sets expiration to 1 year from generation
|
|
- Generates at `/.well-known/security.txt` per RFC 9116
|
|
- Canonical URL defaults to correct .well-known location
|
|
|
|
##### `canary`
|
|
|
|
Configuration for warrant canary generation.
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface CanaryConfig {
|
|
enabled?: boolean;
|
|
organization?: string; // Organization name
|
|
contact?: string; // Contact email
|
|
frequency?: 'daily' | 'weekly' | 'monthly' | 'quarterly' | 'yearly';
|
|
expires?: string | 'auto'; // Expiration (auto-calculated from frequency)
|
|
statements?: CanaryStatement[] | (() => CanaryStatement[]);
|
|
additionalStatement?: string; // Additional context
|
|
verification?: string; // PGP signature URL
|
|
previousCanary?: string; // Previous canary URL
|
|
blockchainProof?: { // Blockchain verification
|
|
network: string;
|
|
address: string;
|
|
txHash?: string;
|
|
timestamp?: string;
|
|
};
|
|
personnelStatement?: boolean; // Add duress check
|
|
}
|
|
|
|
interface CanaryStatement {
|
|
type: string;
|
|
description: string;
|
|
received: boolean;
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```typescript
|
|
discovery({
|
|
canary: {
|
|
organization: 'Example Corp',
|
|
contact: 'canary@example.com',
|
|
frequency: 'monthly', // Auto-expires in 35 days
|
|
statements: [
|
|
{ type: 'nsl', description: 'National Security Letters', received: false },
|
|
{ type: 'gag', description: 'Gag orders', received: false }
|
|
],
|
|
additionalStatement: 'We are committed to transparency.',
|
|
verification: 'PGP Signature: https://example.com/canary.txt.asc',
|
|
personnelStatement: true // Adds duress check
|
|
}
|
|
})
|
|
```
|
|
|
|
**Frequency-based expiration:**
|
|
- `daily`: 2 days
|
|
- `weekly`: 10 days
|
|
- `monthly`: 35 days
|
|
- `quarterly`: 100 days
|
|
- `yearly`: 380 days
|
|
|
|
**Notes:**
|
|
- Only non-received statements appear in output
|
|
- Statements can be a function for dynamic generation
|
|
- Generates at `/.well-known/canary.txt`
|
|
- See [CANARY_SPEC.md](./CANARY_SPEC.md) for full specification
|
|
|
|
##### `webfinger`
|
|
|
|
Configuration for WebFinger resource discovery (RFC 7033).
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface WebFingerConfig {
|
|
enabled?: boolean; // Opt-in (default: false)
|
|
resources?: WebFingerResource[]; // Static resources
|
|
collections?: { // Content collection integration
|
|
name: string; // Collection name (e.g., 'team')
|
|
resourceTemplate: string; // URI template: 'acct:{slug}@example.com'
|
|
subjectTemplate?: string; // Defaults to resourceTemplate
|
|
linksBuilder?: (entry: any) => WebFingerLink[];
|
|
aliasesBuilder?: (entry: any) => string[];
|
|
propertiesBuilder?: (entry: any) => Record<string, string | null>;
|
|
}[];
|
|
}
|
|
|
|
interface WebFingerResource {
|
|
resource: string; // Resource URI (acct:, https://, etc.)
|
|
subject?: string; // Subject (defaults to resource)
|
|
aliases?: string[]; // Alternative URIs
|
|
properties?: Record<string, string | null>; // URI-based properties
|
|
links?: WebFingerLink[]; // Related links
|
|
}
|
|
|
|
interface WebFingerLink {
|
|
rel: string; // Link relation (URI or IANA type)
|
|
href?: string; // Target URI
|
|
type?: string; // Media type
|
|
titles?: Record<string, string>; // Titles with language tags
|
|
properties?: Record<string, string | null>;
|
|
}
|
|
```
|
|
|
|
**Example (Static Resources):**
|
|
```typescript
|
|
discovery({
|
|
webfinger: {
|
|
enabled: true,
|
|
resources: [
|
|
{
|
|
resource: 'acct:alice@example.com',
|
|
aliases: ['https://example.com/@alice'],
|
|
links: [
|
|
{
|
|
rel: 'http://webfinger.net/rel/profile-page',
|
|
type: 'text/html',
|
|
href: 'https://example.com/@alice'
|
|
},
|
|
{
|
|
rel: 'self',
|
|
type: 'application/activity+json', // ActivityPub/Mastodon
|
|
href: 'https://example.com/users/alice'
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
**Example (Content Collection):**
|
|
```typescript
|
|
discovery({
|
|
webfinger: {
|
|
enabled: true,
|
|
collections: [{
|
|
name: 'team', // Astro content collection
|
|
resourceTemplate: 'acct:{slug}@example.com',
|
|
linksBuilder: (member) => [
|
|
{
|
|
rel: 'http://webfinger.net/rel/profile-page',
|
|
href: `https://example.com/team/${member.slug}`,
|
|
type: 'text/html'
|
|
},
|
|
{
|
|
rel: 'http://webfinger.net/rel/avatar',
|
|
href: member.data.avatar,
|
|
type: 'image/jpeg'
|
|
}
|
|
],
|
|
propertiesBuilder: (member) => ({
|
|
'http://schema.org/name': member.data.name,
|
|
'http://schema.org/jobTitle': member.data.role
|
|
})
|
|
}]
|
|
}
|
|
})
|
|
```
|
|
|
|
**Common Use Cases:**
|
|
- **ActivityPub/Mastodon**: Enable federated social network discovery
|
|
- **OpenID Connect**: Provide issuer discovery for authentication
|
|
- **Team Profiles**: Make team members discoverable across services
|
|
- **Author Discovery**: Link blog authors to their profiles/social accounts
|
|
|
|
**Query Format:**
|
|
```
|
|
GET /.well-known/webfinger?resource=acct:alice@example.com
|
|
GET /.well-known/webfinger?resource=acct:alice@example.com&rel=self
|
|
```
|
|
|
|
**Notes:**
|
|
- Dynamic route - not prerendered
|
|
- Requires `?resource=` query parameter (RFC 7033)
|
|
- Optional `?rel=` parameter filters links
|
|
- CORS enabled (`Access-Control-Allow-Origin: *`)
|
|
- Media type: `application/jrd+json`
|
|
- Template vars: `{slug}`, `{id}`, `{data.fieldName}`, `{siteURL}`
|
|
|
|
##### `sitemap`
|
|
|
|
Configuration passed to `@astrojs/sitemap`.
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface SitemapConfig {
|
|
filter?: (page: string) => boolean;
|
|
customPages?: string[];
|
|
i18n?: {
|
|
defaultLocale: string;
|
|
locales: Record<string, string>;
|
|
};
|
|
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
|
|
lastmod?: Date;
|
|
priority?: number;
|
|
serialize?: (item: SitemapItem) => SitemapItem | undefined;
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```typescript
|
|
discovery({
|
|
sitemap: {
|
|
filter: (page) => !page.includes('/admin') && !page.includes('/draft'),
|
|
changefreq: 'daily',
|
|
priority: 0.8
|
|
}
|
|
})
|
|
```
|
|
|
|
##### `caching`
|
|
|
|
Configure HTTP cache headers for discovery files.
|
|
|
|
**Type:**
|
|
```typescript
|
|
interface CachingConfig {
|
|
robots?: number; // seconds
|
|
llms?: number;
|
|
humans?: number;
|
|
security?: number;
|
|
canary?: number;
|
|
webfinger?: number;
|
|
sitemap?: number;
|
|
}
|
|
```
|
|
|
|
**Default:**
|
|
```typescript
|
|
{
|
|
robots: 3600, // 1 hour
|
|
llms: 3600, // 1 hour
|
|
humans: 86400, // 24 hours
|
|
security: 86400, // 24 hours
|
|
canary: 3600, // 1 hour (check frequently!)
|
|
webfinger: 3600, // 1 hour
|
|
sitemap: 3600 // 1 hour
|
|
}
|
|
```
|
|
|
|
## Advanced Usage
|
|
|
|
### Custom Templates
|
|
|
|
You can provide custom templates for any file:
|
|
|
|
```typescript
|
|
discovery({
|
|
templates: {
|
|
robots: (config, siteURL) => `
|
|
User-agent: *
|
|
Allow: /
|
|
|
|
# Custom content
|
|
Sitemap: ${siteURL}/sitemap-index.xml
|
|
`,
|
|
|
|
llms: (config, siteURL) => `
|
|
# ${config.description}
|
|
|
|
Visit ${siteURL} for more information.
|
|
`,
|
|
|
|
security: (config, siteURL) => `
|
|
# Custom security.txt format
|
|
Contact: ${config.contact}
|
|
Expires: ${config.expires || new Date(Date.now() + 365*24*60*60*1000).toISOString()}
|
|
`,
|
|
|
|
canary: (config, siteURL) => `
|
|
# Custom canary format
|
|
Organization: ${config.organization}
|
|
Last-Updated: ${new Date().toISOString()}
|
|
`
|
|
}
|
|
})
|
|
```
|
|
|
|
### Conditional Generation
|
|
|
|
Disable specific files in certain environments:
|
|
|
|
```typescript
|
|
discovery({
|
|
robots: {
|
|
enabled: import.meta.env.PROD // Only in production
|
|
},
|
|
llms: {
|
|
enabled: true // Always generate
|
|
},
|
|
humans: {
|
|
enabled: import.meta.env.DEV // Only in development
|
|
}
|
|
})
|
|
```
|
|
|
|
### Dynamic Content
|
|
|
|
Use functions for dynamic content:
|
|
|
|
```typescript
|
|
discovery({
|
|
llms: {
|
|
description: () => {
|
|
const pkg = JSON.parse(fs.readFileSync('./package.json', 'utf-8'));
|
|
return `${pkg.name} - ${pkg.description}`;
|
|
},
|
|
apiEndpoints: async () => {
|
|
// Load from OpenAPI spec
|
|
const spec = await loadOpenAPISpec();
|
|
return spec.paths.map(path => ({
|
|
path: path.url,
|
|
method: path.method,
|
|
description: path.summary
|
|
}));
|
|
}
|
|
}
|
|
})
|
|
```
|
|
|
|
## Integration with Other Tools
|
|
|
|
### With @astrojs/sitemap
|
|
|
|
The discovery integration automatically includes `@astrojs/sitemap`, so you don't need to install it separately. Configuration is passed through:
|
|
|
|
```typescript
|
|
discovery({
|
|
sitemap: {
|
|
// All @astrojs/sitemap options work here
|
|
filter: (page) => !page.includes('/secret'),
|
|
changefreq: 'weekly'
|
|
}
|
|
})
|
|
```
|
|
|
|
### With Content Collections
|
|
|
|
Automatically extract information from content collections:
|
|
|
|
```typescript
|
|
discovery({
|
|
llms: {
|
|
importantPages: async () => {
|
|
const docs = await getCollection('docs');
|
|
return docs.map(doc => ({
|
|
name: doc.data.title,
|
|
path: `/docs/${doc.slug}`,
|
|
description: doc.data.description
|
|
}));
|
|
}
|
|
}
|
|
})
|
|
```
|
|
|
|
### With Environment Variables
|
|
|
|
Use environment variables for sensitive information:
|
|
|
|
```typescript
|
|
discovery({
|
|
humans: {
|
|
team: [
|
|
{
|
|
name: 'Developer',
|
|
contact: process.env.PUBLIC_CONTACT_EMAIL
|
|
}
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
## Output
|
|
|
|
The integration generates the following files:
|
|
|
|
### `/robots.txt`
|
|
```
|
|
User-agent: *
|
|
Allow: /
|
|
|
|
# Sitemaps
|
|
Sitemap: https://example.com/sitemap-index.xml
|
|
|
|
# LLM-specific resources
|
|
User-agent: Anthropic-AI
|
|
User-agent: Claude-Web
|
|
User-agent: GPTBot
|
|
Allow: /llms.txt
|
|
|
|
# Crawl delay
|
|
Crawl-delay: 1
|
|
```
|
|
|
|
### `/llms.txt`
|
|
```
|
|
# Project Name - Description
|
|
|
|
> Short tagline
|
|
|
|
## Site Information
|
|
- Name: Project Name
|
|
- Description: Full description
|
|
- URL: https://example.com
|
|
|
|
## For AI Assistants
|
|
Instructions for AI assistants...
|
|
|
|
## API Endpoints
|
|
- GET /api/endpoint - Description
|
|
```
|
|
|
|
### `/humans.txt`
|
|
```
|
|
/* TEAM */
|
|
|
|
Name: Developer Name
|
|
Role: Position
|
|
Contact: email@example.com
|
|
|
|
/* THANKS */
|
|
- Thank you note 1
|
|
- Thank you note 2
|
|
|
|
/* SITE */
|
|
Tech stack and details...
|
|
```
|
|
|
|
### `/sitemap-index.xml`
|
|
Standard XML sitemap with all your pages.
|
|
|
|
## Best Practices
|
|
|
|
### 1. **Set Your Site URL**
|
|
|
|
Always configure `site` in your Astro config:
|
|
|
|
```typescript
|
|
export default defineConfig({
|
|
site: 'https://example.com', // Required!
|
|
integrations: [discovery()]
|
|
});
|
|
```
|
|
|
|
### 2. **Keep humans.txt Updated**
|
|
|
|
Update your team information and tech stack regularly:
|
|
|
|
```typescript
|
|
discovery({
|
|
humans: {
|
|
site: {
|
|
lastUpdate: 'auto' // Automatically uses current date
|
|
}
|
|
}
|
|
})
|
|
```
|
|
|
|
### 3. **Be Specific with LLM Instructions**
|
|
|
|
Provide clear, actionable instructions for AI assistants:
|
|
|
|
```typescript
|
|
discovery({
|
|
llms: {
|
|
instructions: `
|
|
When helping users:
|
|
1. Always check API documentation first
|
|
2. Use the /api/search endpoint for queries
|
|
3. Format responses in markdown
|
|
4. Include relevant links
|
|
`
|
|
}
|
|
})
|
|
```
|
|
|
|
### 4. **Filter Private Pages**
|
|
|
|
Exclude admin, draft, and private pages:
|
|
|
|
```typescript
|
|
discovery({
|
|
sitemap: {
|
|
filter: (page) => {
|
|
return !page.includes('/admin') &&
|
|
!page.includes('/draft') &&
|
|
!page.includes('/private');
|
|
}
|
|
},
|
|
robots: {
|
|
additionalAgents: [
|
|
{
|
|
userAgent: '*',
|
|
disallow: ['/admin', '/draft', '/private']
|
|
}
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
### 5. **Optimize Cache Headers**
|
|
|
|
Balance freshness with server load:
|
|
|
|
```typescript
|
|
discovery({
|
|
caching: {
|
|
robots: 3600, // 1 hour - changes rarely
|
|
llms: 1800, // 30 min - may update instructions
|
|
humans: 86400, // 24 hours - credits don't change often
|
|
sitemap: 3600 // 1 hour - content changes moderately
|
|
}
|
|
})
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Files Not Generating
|
|
|
|
1. **Check your output mode:**
|
|
```typescript
|
|
export default defineConfig({
|
|
output: 'hybrid', // or 'server'
|
|
// ...
|
|
});
|
|
```
|
|
|
|
2. **Verify site URL is set:**
|
|
```typescript
|
|
export default defineConfig({
|
|
site: 'https://example.com' // Must be set!
|
|
});
|
|
```
|
|
|
|
3. **Check for conflicts:**
|
|
Remove any existing `/public/robots.txt` or similar static files.
|
|
|
|
### Wrong URLs in Files
|
|
|
|
Make sure your `site` config matches your production domain:
|
|
|
|
```typescript
|
|
export default defineConfig({
|
|
site: import.meta.env.PROD
|
|
? 'https://production.com'
|
|
: 'http://localhost:4321'
|
|
});
|
|
```
|
|
|
|
### LLM Bots Not Respecting Instructions
|
|
|
|
- Ensure `/llms.txt` is accessible
|
|
- Check robots.txt allows LLM bots
|
|
- Verify content is properly formatted
|
|
|
|
### Sitemap Issues
|
|
|
|
Check `@astrojs/sitemap` documentation for detailed troubleshooting:
|
|
https://docs.astro.build/en/guides/integrations-guide/sitemap/
|
|
|
|
## Migration Guide
|
|
|
|
### From Manual Files
|
|
|
|
If you have existing static files in `/public`, remove them:
|
|
|
|
```bash
|
|
rm public/robots.txt
|
|
rm public/humans.txt
|
|
rm public/sitemap.xml
|
|
```
|
|
|
|
Then configure the integration with your existing content:
|
|
|
|
```typescript
|
|
discovery({
|
|
humans: {
|
|
team: [/* your existing team data */],
|
|
thanks: [/* your existing thanks */]
|
|
}
|
|
})
|
|
```
|
|
|
|
### From @astrojs/sitemap
|
|
|
|
Replace:
|
|
```typescript
|
|
import sitemap from '@astrojs/sitemap';
|
|
|
|
export default defineConfig({
|
|
integrations: [sitemap()]
|
|
});
|
|
```
|
|
|
|
With:
|
|
```typescript
|
|
import discovery from '@astrojs/discovery';
|
|
|
|
export default defineConfig({
|
|
integrations: [
|
|
discovery({
|
|
sitemap: {
|
|
// Your existing sitemap config
|
|
}
|
|
})
|
|
]
|
|
});
|
|
```
|
|
|
|
## Examples
|
|
|
|
### E-commerce Site
|
|
|
|
```typescript
|
|
discovery({
|
|
robots: {
|
|
crawlDelay: 2,
|
|
additionalAgents: [
|
|
{
|
|
userAgent: 'PriceBot',
|
|
disallow: ['/checkout', '/account']
|
|
}
|
|
]
|
|
},
|
|
llms: {
|
|
description: 'Online store for sustainable products',
|
|
keyFeatures: [
|
|
'Eco-friendly product catalog',
|
|
'Carbon footprint calculator',
|
|
'Sustainable shipping options'
|
|
],
|
|
apiEndpoints: [
|
|
{ path: '/api/products', description: 'Product catalog' },
|
|
{ path: '/api/calculate-carbon', description: 'Carbon calculator' }
|
|
]
|
|
},
|
|
sitemap: {
|
|
filter: (page) =>
|
|
!page.includes('/checkout') &&
|
|
!page.includes('/account')
|
|
}
|
|
})
|
|
```
|
|
|
|
### Documentation Site
|
|
|
|
```typescript
|
|
discovery({
|
|
llms: {
|
|
description: 'Technical documentation for our API',
|
|
instructions: `
|
|
When helping users:
|
|
1. Search documentation before answering
|
|
2. Provide code examples from /examples
|
|
3. Link to relevant API reference pages
|
|
4. Suggest similar solutions from FAQ
|
|
`,
|
|
importantPages: async () => {
|
|
const docs = await getCollection('docs');
|
|
return docs
|
|
.filter(doc => doc.data.featured)
|
|
.map(doc => ({
|
|
name: doc.data.title,
|
|
path: `/docs/${doc.slug}`,
|
|
description: doc.data.description
|
|
}));
|
|
}
|
|
},
|
|
humans: {
|
|
team: [
|
|
{
|
|
name: 'Documentation Team',
|
|
contact: 'docs@example.com'
|
|
}
|
|
],
|
|
thanks: [
|
|
'Our amazing community contributors',
|
|
'Technical writers worldwide'
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
### Personal Blog
|
|
|
|
```typescript
|
|
discovery({
|
|
llms: {
|
|
description: 'Personal blog about web development',
|
|
brandVoice: [
|
|
'Casual and friendly',
|
|
'Technical but accessible',
|
|
'Focus on practical examples'
|
|
]
|
|
},
|
|
humans: {
|
|
team: [
|
|
{
|
|
name: 'Jane Blogger',
|
|
role: 'Writer & Developer',
|
|
twitter: '@janeblogger',
|
|
github: 'jane-dev'
|
|
}
|
|
],
|
|
story: `
|
|
Started this blog to document my journey learning web development.
|
|
Went from tutorial hell to building real projects. Now sharing
|
|
what I've learned to help others on their journey.
|
|
`,
|
|
funFacts: [
|
|
'All posts written in markdown',
|
|
'Powered by coffee and curiosity',
|
|
'Deployed automatically on every commit'
|
|
]
|
|
}
|
|
})
|
|
```
|
|
|
|
## Performance
|
|
|
|
The integration is designed for minimal performance impact:
|
|
|
|
- **Build Time**: Adds ~100-200ms to build process
|
|
- **Runtime**: All files are statically generated at build time
|
|
- **Caching**: Smart HTTP cache headers reduce server load
|
|
- **Bundle Size**: Zero client-side JavaScript
|
|
|
|
## Contributing
|
|
|
|
We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md).
|
|
|
|
## License
|
|
|
|
MIT
|
|
|
|
## Related
|
|
|
|
- [@astrojs/sitemap](https://docs.astro.build/en/guides/integrations-guide/sitemap/)
|
|
- [humanstxt.org](https://humanstxt.org/)
|
|
- [llms.txt spec](https://github.com/anthropics/llm-txt)
|
|
- [robots.txt spec](https://developers.google.com/search/docs/crawling-indexing/robots/intro)
|
|
|
|
## Credits
|
|
|
|
Built with inspiration from:
|
|
- The Astro community
|
|
- humanstxt.org initiative
|
|
- Anthropic's llms.txt proposal
|
|
- Web standards organizations
|
|
|
|
---
|
|
|
|
**Made with ❤️ by the Astro community**
|