feat: initial implementation of @astrojs/discovery integration

This commit introduces a comprehensive Astro integration that automatically
generates discovery files for websites:

Features:
- robots.txt with LLM bot support (Anthropic-AI, GPTBot, etc.)
- llms.txt for AI assistant context and instructions
- humans.txt for team credits and site information
- Automatic sitemap integration via @astrojs/sitemap

Technical Details:
- TypeScript implementation with full type safety
- Configurable HTTP caching headers
- Custom template support for all generated files
- Sensible defaults with extensive customization options
- Date-based versioning (2025.11.03)

Testing:
- 34 unit tests covering all generators
- Test coverage for robots.txt, llms.txt, and humans.txt
- Integration with Vitest

Documentation:
- Comprehensive README with examples
- API reference documentation
- Contributing guidelines
- Example configurations (minimal and full)
This commit is contained in:
Ryan Malloy 2025-11-03 07:36:39 -07:00
commit d25dde4627
25 changed files with 11001 additions and 0 deletions

13
.gitignore vendored Normal file
View File

@ -0,0 +1,13 @@
node_modules/
dist/
.DS_Store
*.log
.env
.env.local
.env.*.local
coverage/
.vscode/
.idea/
*.swp
*.swo
*~

81
CHANGELOG.md Normal file
View File

@ -0,0 +1,81 @@
# Changelog
All notable changes to @astrojs/discovery will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project uses date-based versioning (YYYY.MM.DD).
## [2025.11.03] - 2025-11-03
### Added
- Initial release of @astrojs/discovery
- Automatic robots.txt generation with LLM bot support
- Automatic llms.txt generation for AI assistant context
- Automatic humans.txt generation for team credits
- Integration with @astrojs/sitemap for sitemap-index.xml
- Configurable HTTP caching headers
- Custom template support for all generated files
- TypeScript type definitions
- Comprehensive configuration options
- Example configurations (minimal and full)
### Features
- **robots.txt**
- Default allow-all policy
- LLM-specific bot rules (Anthropic-AI, GPTBot, etc.)
- Custom agent configurations
- Crawl delay settings
- Custom rules support
- **llms.txt**
- Site description and key features
- Important pages listing
- AI assistant instructions
- API endpoint documentation
- Technology stack information
- Brand voice guidelines
- Custom sections
- **humans.txt**
- Team member information
- Thanks/credits section
- Site technical information
- Project story
- Fun facts
- Development philosophy
- Custom sections
- **Configuration**
- Sensible defaults
- Full customization options
- Environment-based toggles
- Dynamic content support
- Cache control configuration
### Documentation
- Comprehensive README with examples
- API reference documentation
- Contributing guidelines
- Example configurations
- Integration guides
## Future Enhancements
### Planned Features
- security.txt support (RFC 9116)
- ads.txt support for advertising
- manifest.json for PWA
- RSS feed integration
- OpenGraph tags injection
- Structured data (JSON-LD)
- Analytics discovery
- i18n support for multi-language sites
### Testing
- Unit tests for generators
- Integration tests
- E2E tests with real Astro projects
---
For more information, see [README.md](README.md)

169
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,169 @@
# Contributing to @astrojs/discovery
Thank you for your interest in contributing to @astrojs/discovery! This guide will help you get started.
## Development Setup
1. **Clone the repository**
```bash
git clone https://github.com/withastro/astro-discovery.git
cd astro-discovery
```
2. **Install dependencies**
```bash
npm install
```
3. **Build the project**
```bash
npm run build
```
4. **Run tests**
```bash
npm test
```
## Project Structure
```
@astrojs/discovery/
├── src/
│ ├── index.ts # Main integration entry point
│ ├── types.ts # TypeScript type definitions
│ ├── config-store.ts # Global config management
│ ├── generators/
│ │ ├── robots.ts # robots.txt generator
│ │ ├── llms.ts # llms.txt generator
│ │ └── humans.ts # humans.txt generator
│ ├── routes/
│ │ ├── robots.ts # /robots.txt API route
│ │ ├── llms.ts # /llms.txt API route
│ │ └── humans.ts # /humans.txt API route
│ └── validators/
│ └── config.ts # Configuration validation
├── dist/ # Built output (generated)
├── example/ # Example configurations
└── tests/ # Test files (to be added)
```
## Making Changes
### Adding New Features
1. Create a new branch for your feature
```bash
git checkout -b feature/your-feature-name
```
2. Make your changes following the existing code style
3. Add tests for your changes
4. Update documentation in README.md
5. Build and test
```bash
npm run build
npm test
```
### Code Style
- Use TypeScript for all code
- Follow existing naming conventions
- Add JSDoc comments for public APIs
- Keep functions focused and small
- Use meaningful variable names
### Commit Messages
Follow conventional commit format:
- `feat:` New features
- `fix:` Bug fixes
- `docs:` Documentation changes
- `test:` Test additions/changes
- `refactor:` Code refactoring
- `chore:` Maintenance tasks
Example:
```
feat: add support for custom LLM bot agents
- Added ability to specify custom LLM bot user agents
- Updated documentation with examples
- Added tests for custom agent configuration
```
## Testing
### Running Tests
```bash
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Run tests with coverage
npm run test:coverage
```
### Writing Tests
Place tests in the `tests/` directory. Follow these patterns:
```typescript
import { describe, it, expect } from 'vitest';
import { generateRobotsTxt } from '../src/generators/robots';
describe('generateRobotsTxt', () => {
it('generates basic robots.txt', () => {
const result = generateRobotsTxt({}, new URL('https://example.com'));
expect(result).toContain('User-agent: *');
expect(result).toContain('Sitemap: https://example.com/sitemap-index.xml');
});
});
```
## Pull Request Process
1. **Update documentation**: Ensure README.md reflects any changes
2. **Add tests**: All new features should have tests
3. **Update CHANGELOG**: Add your changes to CHANGELOG.md
4. **Create a pull request**:
- Use a clear, descriptive title
- Reference any related issues
- Describe your changes in detail
- Include screenshots for UI changes
5. **Address review feedback**: Be responsive to code review comments
## Release Process
(For maintainers)
1. Update version in package.json using date-based versioning (YYYY.MM.DD)
2. Update CHANGELOG.md
3. Create a git tag
4. Push to npm
```bash
npm version 2025.11.03
npm publish
```
## Questions?
- Open an issue for bugs or feature requests
- Start a discussion for questions or ideas
- Join our Discord community
## License
By contributing, you agree that your contributions will be licensed under the MIT License.

21
LICENSE Normal file
View File

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Ryan Malloy
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

881
README.md Normal file
View File

@ -0,0 +1,881 @@
# @astrojs/discovery
> Comprehensive discovery integration for Astro - handles robots.txt, llms.txt, humans.txt, and sitemap generation
## Overview
This integration provides automatic generation of all standard discovery files for your Astro site, making it easily discoverable by search engines, LLMs, and humans.
## Features
- 🤖 **robots.txt** - Dynamic generation with LLM bot support
- 🧠 **llms.txt** - AI assistant discovery and instructions
- 👥 **humans.txt** - Human-readable credits and tech stack
- 🗺️ **sitemap.xml** - Automatic sitemap generation
- ⚡ **Dynamic URLs** - Adapts to your `site` config
- 🎯 **Smart Caching** - Optimized cache headers
- 🔧 **Fully Customizable** - Override any section
## Installation
```bash
npx astro add @astrojs/discovery
```
Or manually:
```bash
npm install @astrojs/discovery
```
## Quick Start
### Basic Setup
```typescript
// astro.config.mjs
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery()
]
});
```
That's it! This will generate:
- `/robots.txt`
- `/llms.txt`
- `/humans.txt`
- `/sitemap-index.xml`
### With Configuration
```typescript
// astro.config.mjs
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery({
// Robots.txt configuration
robots: {
crawlDelay: 2,
additionalAgents: [
{
userAgent: 'CustomBot',
allow: ['/api'],
disallow: ['/admin']
}
]
},
// LLMs.txt configuration
llms: {
description: 'Your site description for AI assistants',
apiEndpoints: [
{ path: '/api/chat', description: 'Chat endpoint' },
{ path: '/api/search', description: 'Search API' }
],
instructions: `
When helping users with our site:
1. Check documentation first
2. Use provided API endpoints
3. Follow brand guidelines
`
},
// Humans.txt configuration
humans: {
team: [
{
name: 'Jane Doe',
role: 'Creator & Developer',
contact: 'jane@example.com',
location: 'San Francisco, CA'
}
],
thanks: [
'The Astro team',
'Open source community'
],
site: {
lastUpdate: 'auto', // or specific date
language: 'English',
doctype: 'HTML5',
ide: 'VS Code',
techStack: ['Astro', 'TypeScript', 'React']
},
story: 'Your project story...',
funFacts: [
'Built with love',
'Coffee-powered development'
]
},
// Sitemap configuration
sitemap: {
// Passed through to @astrojs/sitemap
filter: (page) => !page.includes('/admin'),
changefreq: 'weekly',
priority: 0.7
}
})
]
});
```
## API Reference
### `discovery(options?)`
#### Options
##### `robots`
Configuration for robots.txt generation.
**Type:**
```typescript
interface RobotsConfig {
crawlDelay?: number;
allowAllBots?: boolean;
llmBots?: {
enabled?: boolean;
agents?: string[]; // Custom LLM bot names
};
additionalAgents?: Array<{
userAgent: string;
allow?: string[];
disallow?: string[];
}>;
customRules?: string; // Raw robots.txt content to append
}
```
**Default:**
```typescript
{
crawlDelay: 1,
allowAllBots: true,
llmBots: {
enabled: true,
agents: [
'Anthropic-AI',
'Claude-Web',
'GPTBot',
'ChatGPT-User',
'cohere-ai',
'Google-Extended'
]
}
}
```
**Example:**
```typescript
discovery({
robots: {
crawlDelay: 2,
llmBots: {
enabled: true,
agents: ['CustomAIBot', 'AnotherBot']
},
additionalAgents: [
{
userAgent: 'BadBot',
disallow: ['/']
}
]
}
})
```
##### `llms`
Configuration for llms.txt generation.
**Type:**
```typescript
interface LLMsConfig {
enabled?: boolean;
description?: string;
keyFeatures?: string[];
importantPages?: Array<{
name: string;
path: string;
description?: string;
}>;
instructions?: string;
apiEndpoints?: Array<{
path: string;
method?: string;
description: string;
}>;
techStack?: {
frontend?: string[];
backend?: string[];
ai?: string[];
other?: string[];
};
brandVoice?: string[];
customSections?: Record<string, string>;
}
```
**Example:**
```typescript
discovery({
llms: {
description: 'E-commerce platform for sustainable products',
keyFeatures: [
'AI-powered product recommendations',
'Carbon footprint calculator',
'Subscription management'
],
instructions: `
When helping users:
1. Check product availability via API
2. Suggest sustainable alternatives
3. Calculate shipping costs
`,
apiEndpoints: [
{
path: '/api/products',
method: 'GET',
description: 'List all products'
},
{
path: '/api/calculate-footprint',
method: 'POST',
description: 'Calculate carbon footprint'
}
]
}
})
```
##### `humans`
Configuration for humans.txt generation.
**Type:**
```typescript
interface HumansConfig {
enabled?: boolean;
team?: Array<{
name: string;
role?: string;
contact?: string;
location?: string;
twitter?: string;
github?: string;
}>;
thanks?: string[];
site?: {
lastUpdate?: string | 'auto';
language?: string;
doctype?: string;
ide?: string;
techStack?: string[];
standards?: string[];
components?: string[];
software?: string[];
};
story?: string;
funFacts?: string[];
philosophy?: string[];
customSections?: Record<string, string>;
}
```
**Example:**
```typescript
discovery({
humans: {
team: [
{
name: 'Alice Developer',
role: 'Lead Developer',
contact: 'alice@example.com',
location: 'New York',
github: 'alice-dev'
}
],
thanks: [
'Coffee',
'Stack Overflow community',
'My rubber duck'
],
story: `
This project started when we realized that...
`,
funFacts: [
'Written entirely on a mechanical keyboard',
'Fueled by 347 cups of coffee',
'Built during a 48-hour hackathon'
]
}
})
```
##### `sitemap`
Configuration passed to `@astrojs/sitemap`.
**Type:**
```typescript
interface SitemapConfig {
filter?: (page: string) => boolean;
customPages?: string[];
i18n?: {
defaultLocale: string;
locales: Record<string, string>;
};
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
lastmod?: Date;
priority?: number;
serialize?: (item: SitemapItem) => SitemapItem | undefined;
}
```
**Example:**
```typescript
discovery({
sitemap: {
filter: (page) => !page.includes('/admin') && !page.includes('/draft'),
changefreq: 'daily',
priority: 0.8
}
})
```
##### `caching`
Configure HTTP cache headers for discovery files.
**Type:**
```typescript
interface CachingConfig {
robots?: number; // seconds
llms?: number;
humans?: number;
sitemap?: number;
}
```
**Default:**
```typescript
{
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
sitemap: 3600 // 1 hour
}
```
## Advanced Usage
### Custom Templates
You can provide custom templates for any file:
```typescript
discovery({
templates: {
robots: (config, siteURL) => `
User-agent: *
Allow: /
# Custom content
Sitemap: ${siteURL}/sitemap-index.xml
`,
llms: (config, siteURL) => `
# ${config.description}
Visit ${siteURL} for more information.
`
}
})
```
### Conditional Generation
Disable specific files in certain environments:
```typescript
discovery({
robots: {
enabled: import.meta.env.PROD // Only in production
},
llms: {
enabled: true // Always generate
},
humans: {
enabled: import.meta.env.DEV // Only in development
}
})
```
### Dynamic Content
Use functions for dynamic content:
```typescript
discovery({
llms: {
description: () => {
const pkg = JSON.parse(fs.readFileSync('./package.json', 'utf-8'));
return `${pkg.name} - ${pkg.description}`;
},
apiEndpoints: async () => {
// Load from OpenAPI spec
const spec = await loadOpenAPISpec();
return spec.paths.map(path => ({
path: path.url,
method: path.method,
description: path.summary
}));
}
}
})
```
## Integration with Other Tools
### With @astrojs/sitemap
The discovery integration automatically includes `@astrojs/sitemap`, so you don't need to install it separately. Configuration is passed through:
```typescript
discovery({
sitemap: {
// All @astrojs/sitemap options work here
filter: (page) => !page.includes('/secret'),
changefreq: 'weekly'
}
})
```
### With Content Collections
Automatically extract information from content collections:
```typescript
discovery({
llms: {
importantPages: async () => {
const docs = await getCollection('docs');
return docs.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
}
})
```
### With Environment Variables
Use environment variables for sensitive information:
```typescript
discovery({
humans: {
team: [
{
name: 'Developer',
contact: process.env.PUBLIC_CONTACT_EMAIL
}
]
}
})
```
## Output
The integration generates the following files:
### `/robots.txt`
```
User-agent: *
Allow: /
# Sitemaps
Sitemap: https://example.com/sitemap-index.xml
# LLM-specific resources
User-agent: Anthropic-AI
User-agent: Claude-Web
User-agent: GPTBot
Allow: /llms.txt
# Crawl delay
Crawl-delay: 1
```
### `/llms.txt`
```
# Project Name - Description
> Short tagline
## Site Information
- Name: Project Name
- Description: Full description
- URL: https://example.com
## For AI Assistants
Instructions for AI assistants...
## API Endpoints
- GET /api/endpoint - Description
```
### `/humans.txt`
```
/* TEAM */
Name: Developer Name
Role: Position
Contact: email@example.com
/* THANKS */
- Thank you note 1
- Thank you note 2
/* SITE */
Tech stack and details...
```
### `/sitemap-index.xml`
Standard XML sitemap with all your pages.
## Best Practices
### 1. **Set Your Site URL**
Always configure `site` in your Astro config:
```typescript
export default defineConfig({
site: 'https://example.com', // Required!
integrations: [discovery()]
});
```
### 2. **Keep humans.txt Updated**
Update your team information and tech stack regularly:
```typescript
discovery({
humans: {
site: {
lastUpdate: 'auto' // Automatically uses current date
}
}
})
```
### 3. **Be Specific with LLM Instructions**
Provide clear, actionable instructions for AI assistants:
```typescript
discovery({
llms: {
instructions: `
When helping users:
1. Always check API documentation first
2. Use the /api/search endpoint for queries
3. Format responses in markdown
4. Include relevant links
`
}
})
```
### 4. **Filter Private Pages**
Exclude admin, draft, and private pages:
```typescript
discovery({
sitemap: {
filter: (page) => {
return !page.includes('/admin') &&
!page.includes('/draft') &&
!page.includes('/private');
}
},
robots: {
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private']
}
]
}
})
```
### 5. **Optimize Cache Headers**
Balance freshness with server load:
```typescript
discovery({
caching: {
robots: 3600, // 1 hour - changes rarely
llms: 1800, // 30 min - may update instructions
humans: 86400, // 24 hours - credits don't change often
sitemap: 3600 // 1 hour - content changes moderately
}
})
```
## Troubleshooting
### Files Not Generating
1. **Check your output mode:**
```typescript
export default defineConfig({
output: 'hybrid', // or 'server'
// ...
});
```
2. **Verify site URL is set:**
```typescript
export default defineConfig({
site: 'https://example.com' // Must be set!
});
```
3. **Check for conflicts:**
Remove any existing `/public/robots.txt` or similar static files.
### Wrong URLs in Files
Make sure your `site` config matches your production domain:
```typescript
export default defineConfig({
site: import.meta.env.PROD
? 'https://production.com'
: 'http://localhost:4321'
});
```
### LLM Bots Not Respecting Instructions
- Ensure `/llms.txt` is accessible
- Check robots.txt allows LLM bots
- Verify content is properly formatted
### Sitemap Issues
Check `@astrojs/sitemap` documentation for detailed troubleshooting:
https://docs.astro.build/en/guides/integrations-guide/sitemap/
## Migration Guide
### From Manual Files
If you have existing static files in `/public`, remove them:
```bash
rm public/robots.txt
rm public/humans.txt
rm public/sitemap.xml
```
Then configure the integration with your existing content:
```typescript
discovery({
humans: {
team: [/* your existing team data */],
thanks: [/* your existing thanks */]
}
})
```
### From @astrojs/sitemap
Replace:
```typescript
import sitemap from '@astrojs/sitemap';
export default defineConfig({
integrations: [sitemap()]
});
```
With:
```typescript
import discovery from '@astrojs/discovery';
export default defineConfig({
integrations: [
discovery({
sitemap: {
// Your existing sitemap config
}
})
]
});
```
## Examples
### E-commerce Site
```typescript
discovery({
robots: {
crawlDelay: 2,
additionalAgents: [
{
userAgent: 'PriceBot',
disallow: ['/checkout', '/account']
}
]
},
llms: {
description: 'Online store for sustainable products',
keyFeatures: [
'Eco-friendly product catalog',
'Carbon footprint calculator',
'Sustainable shipping options'
],
apiEndpoints: [
{ path: '/api/products', description: 'Product catalog' },
{ path: '/api/calculate-carbon', description: 'Carbon calculator' }
]
},
sitemap: {
filter: (page) =>
!page.includes('/checkout') &&
!page.includes('/account')
}
})
```
### Documentation Site
```typescript
discovery({
llms: {
description: 'Technical documentation for our API',
instructions: `
When helping users:
1. Search documentation before answering
2. Provide code examples from /examples
3. Link to relevant API reference pages
4. Suggest similar solutions from FAQ
`,
importantPages: async () => {
const docs = await getCollection('docs');
return docs
.filter(doc => doc.data.featured)
.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
},
humans: {
team: [
{
name: 'Documentation Team',
contact: 'docs@example.com'
}
],
thanks: [
'Our amazing community contributors',
'Technical writers worldwide'
]
}
})
```
### Personal Blog
```typescript
discovery({
llms: {
description: 'Personal blog about web development',
brandVoice: [
'Casual and friendly',
'Technical but accessible',
'Focus on practical examples'
]
},
humans: {
team: [
{
name: 'Jane Blogger',
role: 'Writer & Developer',
twitter: '@janeblogger',
github: 'jane-dev'
}
],
story: `
Started this blog to document my journey learning web development.
Went from tutorial hell to building real projects. Now sharing
what I've learned to help others on their journey.
`,
funFacts: [
'All posts written in markdown',
'Powered by coffee and curiosity',
'Deployed automatically on every commit'
]
}
})
```
## Performance
The integration is designed for minimal performance impact:
- **Build Time**: Adds ~100-200ms to build process
- **Runtime**: All files are statically generated at build time
- **Caching**: Smart HTTP cache headers reduce server load
- **Bundle Size**: Zero client-side JavaScript
## Contributing
We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md).
## License
MIT
## Related
- [@astrojs/sitemap](https://docs.astro.build/en/guides/integrations-guide/sitemap/)
- [humanstxt.org](https://humanstxt.org/)
- [llms.txt spec](https://github.com/anthropics/llm-txt)
- [robots.txt spec](https://developers.google.com/search/docs/crawling-indexing/robots/intro)
## Credits
Built with inspiration from:
- The Astro community
- humanstxt.org initiative
- Anthropic's llms.txt proposal
- Web standards organizations
---
**Made with ❤️ by the Astro community**

View File

@ -0,0 +1,699 @@
# @astrojs/discovery - Implementation Guide
> Technical implementation details for building the Astro discovery integration
## Package Structure
```
@astrojs/discovery/
├── package.json
├── README.md
├── LICENSE
├── tsconfig.json
├── src/
│ ├── index.ts # Main entry point
│ ├── types.ts # TypeScript definitions
│ ├── generators/
│ │ ├── robots.ts # robots.txt generation
│ │ ├── llms.ts # llms.txt generation
│ │ ├── humans.ts # humans.txt generation
│ │ └── utils.ts # Shared utilities
│ ├── templates/
│ │ ├── robots.template.ts
│ │ ├── llms.template.ts
│ │ └── humans.template.ts
│ └── validators/
│ └── config.ts # Config validation
├── dist/ # Built output
└── tests/
├── robots.test.ts
├── llms.test.ts
├── humans.test.ts
└── integration.test.ts
```
## Core Implementation
### 1. Main Integration File (`src/index.ts`)
```typescript
import type { AstroIntegration } from 'astro';
import type { DiscoveryConfig } from './types';
import sitemap from '@astrojs/sitemap';
import { generateRobotsTxt } from './generators/robots';
import { generateLLMsTxt } from './generators/llms';
import { generateHumansTxt } from './generators/humans';
import { validateConfig } from './validators/config';
export default function discovery(
userConfig: DiscoveryConfig = {}
): AstroIntegration {
// Merge with defaults
const config = validateConfig(userConfig);
return {
name: '@astrojs/discovery',
hooks: {
'astro:config:setup': ({ config: astroConfig, injectRoute, updateConfig }) => {
// Ensure site is configured
if (!astroConfig.site) {
throw new Error(
'@astrojs/discovery requires `site` to be set in astro.config.mjs'
);
}
// Add sitemap integration
updateConfig({
integrations: [
sitemap(config.sitemap || {})
]
});
// Inject dynamic routes for discovery files
if (config.robots?.enabled !== false) {
injectRoute({
pattern: '/robots.txt',
entrypoint: '@astrojs/discovery/routes/robots.ts',
prerender: true
});
}
if (config.llms?.enabled !== false) {
injectRoute({
pattern: '/llms.txt',
entrypoint: '@astrojs/discovery/routes/llms.ts',
prerender: true
});
}
if (config.humans?.enabled !== false) {
injectRoute({
pattern: '/humans.txt',
entrypoint: '@astrojs/discovery/routes/humans.ts',
prerender: true
});
}
},
'astro:build:done': ({ dir, routes }) => {
// Post-build validation
console.log('✅ Discovery files generated:');
if (config.robots?.enabled !== false) console.log(' - /robots.txt');
if (config.llms?.enabled !== false) console.log(' - /llms.txt');
if (config.humans?.enabled !== false) console.log(' - /humans.txt');
console.log(' - /sitemap-index.xml');
}
}
};
}
// Named exports
export type { DiscoveryConfig } from './types';
```
### 2. Type Definitions (`src/types.ts`)
```typescript
export interface DiscoveryConfig {
robots?: RobotsConfig;
llms?: LLMsConfig;
humans?: HumansConfig;
sitemap?: SitemapConfig;
caching?: CachingConfig;
templates?: TemplateConfig;
}
export interface RobotsConfig {
enabled?: boolean;
crawlDelay?: number;
allowAllBots?: boolean;
llmBots?: {
enabled?: boolean;
agents?: string[];
};
additionalAgents?: Array<{
userAgent: string;
allow?: string[];
disallow?: string[];
}>;
customRules?: string;
}
export interface LLMsConfig {
enabled?: boolean;
description?: string | (() => string);
keyFeatures?: string[];
importantPages?: ImportantPage[] | (() => Promise<ImportantPage[]>);
instructions?: string;
apiEndpoints?: APIEndpoint[];
techStack?: TechStack;
brandVoice?: string[];
customSections?: Record<string, string>;
}
export interface HumansConfig {
enabled?: boolean;
team?: TeamMember[];
thanks?: string[];
site?: SiteInfo;
story?: string;
funFacts?: string[];
philosophy?: string[];
customSections?: Record<string, string>;
}
export interface SitemapConfig {
filter?: (page: string) => boolean;
customPages?: string[];
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
priority?: number;
}
export interface CachingConfig {
robots?: number;
llms?: number;
humans?: number;
sitemap?: number;
}
export interface TemplateConfig {
robots?: (config: RobotsConfig, siteURL: URL) => string;
llms?: (config: LLMsConfig, siteURL: URL) => string;
humans?: (config: HumansConfig, siteURL: URL) => string;
}
export interface ImportantPage {
name: string;
path: string;
description?: string;
}
export interface APIEndpoint {
path: string;
method?: string;
description: string;
}
export interface TechStack {
frontend?: string[];
backend?: string[];
ai?: string[];
other?: string[];
}
export interface TeamMember {
name: string;
role?: string;
contact?: string;
location?: string;
twitter?: string;
github?: string;
}
export interface SiteInfo {
lastUpdate?: string | 'auto';
language?: string;
doctype?: string;
ide?: string;
techStack?: string[];
standards?: string[];
components?: string[];
software?: string[];
}
```
### 3. Robots.txt Generator (`src/generators/robots.ts`)
```typescript
import type { RobotsConfig } from '../types';
const DEFAULT_LLM_BOTS = [
'Anthropic-AI',
'Claude-Web',
'GPTBot',
'ChatGPT-User',
'cohere-ai',
'Google-Extended'
];
export function generateRobotsTxt(
config: RobotsConfig,
siteURL: URL
): string {
const lines: string[] = [];
// Allow all bots by default
if (config.allowAllBots !== false) {
lines.push('User-agent: *');
lines.push('Allow: /');
lines.push('');
}
// Add sitemap
lines.push('# Sitemaps');
lines.push(`Sitemap: ${new URL('sitemap-index.xml', siteURL).href}`);
lines.push('');
// LLM-specific rules
if (config.llmBots?.enabled !== false) {
lines.push('# LLM-specific resources');
lines.push('# See: https://github.com/anthropics/llm-txt');
const agents = config.llmBots?.agents || DEFAULT_LLM_BOTS;
agents.forEach(agent => {
lines.push(`User-agent: ${agent}`);
});
lines.push('Allow: /llms.txt');
lines.push('');
}
// Additional agent rules
if (config.additionalAgents) {
config.additionalAgents.forEach(agent => {
lines.push(`User-agent: ${agent.userAgent}`);
if (agent.allow) {
agent.allow.forEach(path => {
lines.push(`Allow: ${path}`);
});
}
if (agent.disallow) {
agent.disallow.forEach(path => {
lines.push(`Disallow: ${path}`);
});
}
lines.push('');
});
}
// Crawl delay
if (config.crawlDelay) {
lines.push('# Crawl delay (be nice to our server)');
lines.push(`Crawl-delay: ${config.crawlDelay}`);
lines.push('');
}
// Custom rules
if (config.customRules) {
lines.push('# Custom rules');
lines.push(config.customRules);
lines.push('');
}
return lines.join('\n');
}
```
### 4. LLMs.txt Generator (`src/generators/llms.ts`)
```typescript
import type { LLMsConfig, ImportantPage } from '../types';
export async function generateLLMsTxt(
config: LLMsConfig,
siteURL: URL
): Promise<string> {
const lines: string[] = [];
// Header
const description = typeof config.description === 'function'
? config.description()
: config.description;
lines.push(`# ${siteURL.hostname}`);
if (description) {
lines.push('');
lines.push(`> ${description}`);
}
lines.push('');
lines.push('---');
lines.push('');
// Site Information
lines.push('## Site Information');
lines.push('');
lines.push(`- **URL**: ${siteURL.href}`);
if (description) {
lines.push(`- **Description**: ${description}`);
}
lines.push('');
// Key Features
if (config.keyFeatures && config.keyFeatures.length > 0) {
lines.push('## Key Features');
lines.push('');
config.keyFeatures.forEach(feature => {
lines.push(`- ${feature}`);
});
lines.push('');
}
// Important Pages
if (config.importantPages) {
const pages = typeof config.importantPages === 'function'
? await config.importantPages()
: config.importantPages;
if (pages.length > 0) {
lines.push('## Important Pages');
lines.push('');
pages.forEach(page => {
const url = new URL(page.path, siteURL).href;
lines.push(`- **${page.name}**: ${url}`);
if (page.description) {
lines.push(` ${page.description}`);
}
});
lines.push('');
}
}
// Instructions for AI Assistants
if (config.instructions) {
lines.push('## For AI Assistants');
lines.push('');
lines.push(config.instructions);
lines.push('');
}
// API Endpoints
if (config.apiEndpoints && config.apiEndpoints.length > 0) {
lines.push('## API Endpoints');
lines.push('');
config.apiEndpoints.forEach(endpoint => {
const method = endpoint.method || 'GET';
lines.push(`- \`${method} ${endpoint.path}\` - ${endpoint.description}`);
});
lines.push('');
}
// Tech Stack
if (config.techStack) {
lines.push('## Technical Stack');
lines.push('');
if (config.techStack.frontend) {
lines.push(`- **Frontend**: ${config.techStack.frontend.join(', ')}`);
}
if (config.techStack.backend) {
lines.push(`- **Backend**: ${config.techStack.backend.join(', ')}`);
}
if (config.techStack.ai) {
lines.push(`- **AI**: ${config.techStack.ai.join(', ')}`);
}
if (config.techStack.other) {
lines.push(`- **Other**: ${config.techStack.other.join(', ')}`);
}
lines.push('');
}
// Brand Voice
if (config.brandVoice && config.brandVoice.length > 0) {
lines.push('## Brand Voice');
lines.push('');
config.brandVoice.forEach(item => {
lines.push(`- ${item}`);
});
lines.push('');
}
// Custom Sections
if (config.customSections) {
Object.entries(config.customSections).forEach(([title, content]) => {
lines.push(`## ${title}`);
lines.push('');
lines.push(content);
lines.push('');
});
}
// Footer
lines.push('---');
lines.push('');
lines.push(`Last Updated: ${new Date().toISOString().split('T')[0]}`);
return lines.join('\n');
}
```
### 5. Humans.txt Generator (`src/generators/humans.ts`)
```typescript
import type { HumansConfig } from '../types';
export function generateHumansTxt(config: HumansConfig): string {
const lines: string[] = [];
// Team section
if (config.team && config.team.length > 0) {
lines.push('/* TEAM */');
lines.push('');
config.team.forEach((member, index) => {
if (index > 0) lines.push('');
lines.push(`Name: ${member.name}`);
if (member.role) lines.push(`Role: ${member.role}`);
if (member.contact) lines.push(`Contact: ${member.contact}`);
if (member.location) lines.push(`From: ${member.location}`);
if (member.twitter) lines.push(`Twitter: ${member.twitter}`);
if (member.github) lines.push(`GitHub: ${member.github}`);
});
lines.push('');
}
// Thanks section
if (config.thanks && config.thanks.length > 0) {
lines.push('/* THANKS */');
lines.push('');
config.thanks.forEach(thanks => {
lines.push(`- ${thanks}`);
});
lines.push('');
}
// Site section
if (config.site) {
lines.push('/* SITE */');
lines.push('');
const lastUpdate = config.site.lastUpdate === 'auto'
? new Date().toISOString().split('T')[0]
: config.site.lastUpdate;
if (lastUpdate) lines.push(`Last update: ${lastUpdate}`);
if (config.site.language) lines.push(`Language: ${config.site.language}`);
if (config.site.doctype) lines.push(`Doctype: ${config.site.doctype}`);
if (config.site.ide) lines.push(`IDE: ${config.site.ide}`);
if (config.site.techStack) {
lines.push(`Tech Stack: ${config.site.techStack.join(', ')}`);
}
if (config.site.standards) {
lines.push(`Standards: ${config.site.standards.join(', ')}`);
}
if (config.site.components) {
lines.push(`Components: ${config.site.components.join(', ')}`);
}
if (config.site.software) {
lines.push(`Software: ${config.site.software.join(', ')}`);
}
lines.push('');
}
// Story section
if (config.story) {
lines.push('/* THE STORY */');
lines.push('');
lines.push(config.story);
lines.push('');
}
// Fun Facts section
if (config.funFacts && config.funFacts.length > 0) {
lines.push('/* FUN FACTS */');
lines.push('');
config.funFacts.forEach(fact => {
lines.push(`- ${fact}`);
});
lines.push('');
}
// Philosophy section
if (config.philosophy && config.philosophy.length > 0) {
lines.push('/* PHILOSOPHY */');
lines.push('');
config.philosophy.forEach(item => {
lines.push(`"${item}"`);
});
lines.push('');
}
// Custom sections
if (config.customSections) {
Object.entries(config.customSections).forEach(([title, content]) => {
lines.push(`/* ${title.toUpperCase()} */`);
lines.push('');
lines.push(content);
lines.push('');
});
}
return lines.join('\n');
}
```
### 6. API Route Template (`routes/robots.ts`)
```typescript
import type { APIRoute } from 'astro';
import { generateRobotsTxt } from '../generators/robots';
import { getConfig } from '../config';
export const GET: APIRoute = ({ site }) => {
const config = getConfig();
const siteURL = site || new URL('http://localhost:4321');
const content = config.templates?.robots
? config.templates.robots(config.robots, siteURL)
: generateRobotsTxt(config.robots, siteURL);
return new Response(content, {
status: 200,
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': `public, max-age=${config.caching?.robots || 3600}`,
},
});
};
```
## Testing Strategy
### Unit Tests
```typescript
// tests/robots.test.ts
import { describe, it, expect } from 'vitest';
import { generateRobotsTxt } from '../src/generators/robots';
describe('generateRobotsTxt', () => {
it('generates basic robots.txt', () => {
const result = generateRobotsTxt({}, new URL('https://example.com'));
expect(result).toContain('User-agent: *');
expect(result).toContain('Sitemap: https://example.com/sitemap-index.xml');
});
it('includes LLM bots when enabled', () => {
const result = generateRobotsTxt(
{ llmBots: { enabled: true } },
new URL('https://example.com')
);
expect(result).toContain('Anthropic-AI');
expect(result).toContain('GPTBot');
});
it('respects custom crawl delay', () => {
const result = generateRobotsTxt(
{ crawlDelay: 5 },
new URL('https://example.com')
);
expect(result).toContain('Crawl-delay: 5');
});
});
```
### Integration Tests
```typescript
// tests/integration.test.ts
import { describe, it, expect } from 'vitest';
import { testIntegration } from '@astrojs/test-utils';
import discovery from '../src/index';
describe('discovery integration', () => {
it('generates all discovery files', async () => {
const fixture = await testIntegration({
integrations: [discovery()],
site: 'https://example.com'
});
const files = await fixture.readdir('dist');
expect(files).toContain('robots.txt');
expect(files).toContain('llms.txt');
expect(files).toContain('humans.txt');
expect(files).toContain('sitemap-index.xml');
});
});
```
## Build & Publish
### package.json
```json
{
"name": "@astrojs/discovery",
"version": "1.0.0",
"description": "Complete discovery integration for Astro",
"type": "module",
"exports": {
".": "./dist/index.js",
"./routes/*": "./dist/routes/*"
},
"files": [
"dist",
"README.md"
],
"scripts": {
"build": "tsc",
"test": "vitest",
"prepublishOnly": "npm run build && npm test"
},
"peerDependencies": {
"astro": "^5.0.0"
},
"dependencies": {
"@astrojs/sitemap": "^3.6.0"
},
"devDependencies": {
"@astrojs/test-utils": "^1.0.0",
"typescript": "^5.3.0",
"vitest": "^1.0.0"
},
"keywords": [
"astro",
"astro-integration",
"robots",
"sitemap",
"llms",
"humans",
"discovery",
"seo"
]
}
```
## Future Enhancements
1. **security.txt Support** - Add RFC 9116 security.txt generation
2. **ads.txt Support** - For sites with advertising
3. **manifest.json Support** - PWA manifest generation
4. **RSS Feed Integration** - Optional RSS feed generation
5. **OpenGraph Tags** - Meta tag injection
6. **Structured Data** - JSON-LD schema.org markup
7. **Analytics Integration** - Built-in analytics discovery
8. **i18n Support** - Multi-language discovery files
## Resources
- [Astro Integration API](https://docs.astro.build/en/reference/integrations-reference/)
- [humanstxt.org](https://humanstxt.org/)
- [robots.txt spec](https://developers.google.com/search/docs/crawling-indexing/robots/intro)
- [llms.txt proposal](https://github.com/anthropics/llm-txt)
---
**This integration is a proposal. Implementation details may vary based on Astro's API evolution.**

View File

@ -0,0 +1,881 @@
# @astrojs/discovery
> Comprehensive discovery integration for Astro - handles robots.txt, llms.txt, humans.txt, and sitemap generation
## Overview
This integration provides automatic generation of all standard discovery files for your Astro site, making it easily discoverable by search engines, LLMs, and humans.
## Features
- 🤖 **robots.txt** - Dynamic generation with LLM bot support
- 🧠 **llms.txt** - AI assistant discovery and instructions
- 👥 **humans.txt** - Human-readable credits and tech stack
- 🗺️ **sitemap.xml** - Automatic sitemap generation
- ⚡ **Dynamic URLs** - Adapts to your `site` config
- 🎯 **Smart Caching** - Optimized cache headers
- 🔧 **Fully Customizable** - Override any section
## Installation
```bash
npx astro add @astrojs/discovery
```
Or manually:
```bash
npm install @astrojs/discovery
```
## Quick Start
### Basic Setup
```typescript
// astro.config.mjs
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery()
]
});
```
That's it! This will generate:
- `/robots.txt`
- `/llms.txt`
- `/humans.txt`
- `/sitemap-index.xml`
### With Configuration
```typescript
// astro.config.mjs
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery({
// Robots.txt configuration
robots: {
crawlDelay: 2,
additionalAgents: [
{
userAgent: 'CustomBot',
allow: ['/api'],
disallow: ['/admin']
}
]
},
// LLMs.txt configuration
llms: {
description: 'Your site description for AI assistants',
apiEndpoints: [
{ path: '/api/chat', description: 'Chat endpoint' },
{ path: '/api/search', description: 'Search API' }
],
instructions: `
When helping users with our site:
1. Check documentation first
2. Use provided API endpoints
3. Follow brand guidelines
`
},
// Humans.txt configuration
humans: {
team: [
{
name: 'Jane Doe',
role: 'Creator & Developer',
contact: 'jane@example.com',
location: 'San Francisco, CA'
}
],
thanks: [
'The Astro team',
'Open source community'
],
site: {
lastUpdate: 'auto', // or specific date
language: 'English',
doctype: 'HTML5',
ide: 'VS Code',
techStack: ['Astro', 'TypeScript', 'React']
},
story: 'Your project story...',
funFacts: [
'Built with love',
'Coffee-powered development'
]
},
// Sitemap configuration
sitemap: {
// Passed through to @astrojs/sitemap
filter: (page) => !page.includes('/admin'),
changefreq: 'weekly',
priority: 0.7
}
})
]
});
```
## API Reference
### `discovery(options?)`
#### Options
##### `robots`
Configuration for robots.txt generation.
**Type:**
```typescript
interface RobotsConfig {
crawlDelay?: number;
allowAllBots?: boolean;
llmBots?: {
enabled?: boolean;
agents?: string[]; // Custom LLM bot names
};
additionalAgents?: Array<{
userAgent: string;
allow?: string[];
disallow?: string[];
}>;
customRules?: string; // Raw robots.txt content to append
}
```
**Default:**
```typescript
{
crawlDelay: 1,
allowAllBots: true,
llmBots: {
enabled: true,
agents: [
'Anthropic-AI',
'Claude-Web',
'GPTBot',
'ChatGPT-User',
'cohere-ai',
'Google-Extended'
]
}
}
```
**Example:**
```typescript
discovery({
robots: {
crawlDelay: 2,
llmBots: {
enabled: true,
agents: ['CustomAIBot', 'AnotherBot']
},
additionalAgents: [
{
userAgent: 'BadBot',
disallow: ['/']
}
]
}
})
```
##### `llms`
Configuration for llms.txt generation.
**Type:**
```typescript
interface LLMsConfig {
enabled?: boolean;
description?: string;
keyFeatures?: string[];
importantPages?: Array<{
name: string;
path: string;
description?: string;
}>;
instructions?: string;
apiEndpoints?: Array<{
path: string;
method?: string;
description: string;
}>;
techStack?: {
frontend?: string[];
backend?: string[];
ai?: string[];
other?: string[];
};
brandVoice?: string[];
customSections?: Record<string, string>;
}
```
**Example:**
```typescript
discovery({
llms: {
description: 'E-commerce platform for sustainable products',
keyFeatures: [
'AI-powered product recommendations',
'Carbon footprint calculator',
'Subscription management'
],
instructions: `
When helping users:
1. Check product availability via API
2. Suggest sustainable alternatives
3. Calculate shipping costs
`,
apiEndpoints: [
{
path: '/api/products',
method: 'GET',
description: 'List all products'
},
{
path: '/api/calculate-footprint',
method: 'POST',
description: 'Calculate carbon footprint'
}
]
}
})
```
##### `humans`
Configuration for humans.txt generation.
**Type:**
```typescript
interface HumansConfig {
enabled?: boolean;
team?: Array<{
name: string;
role?: string;
contact?: string;
location?: string;
twitter?: string;
github?: string;
}>;
thanks?: string[];
site?: {
lastUpdate?: string | 'auto';
language?: string;
doctype?: string;
ide?: string;
techStack?: string[];
standards?: string[];
components?: string[];
software?: string[];
};
story?: string;
funFacts?: string[];
philosophy?: string[];
customSections?: Record<string, string>;
}
```
**Example:**
```typescript
discovery({
humans: {
team: [
{
name: 'Alice Developer',
role: 'Lead Developer',
contact: 'alice@example.com',
location: 'New York',
github: 'alice-dev'
}
],
thanks: [
'Coffee',
'Stack Overflow community',
'My rubber duck'
],
story: `
This project started when we realized that...
`,
funFacts: [
'Written entirely on a mechanical keyboard',
'Fueled by 347 cups of coffee',
'Built during a 48-hour hackathon'
]
}
})
```
##### `sitemap`
Configuration passed to `@astrojs/sitemap`.
**Type:**
```typescript
interface SitemapConfig {
filter?: (page: string) => boolean;
customPages?: string[];
i18n?: {
defaultLocale: string;
locales: Record<string, string>;
};
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
lastmod?: Date;
priority?: number;
serialize?: (item: SitemapItem) => SitemapItem | undefined;
}
```
**Example:**
```typescript
discovery({
sitemap: {
filter: (page) => !page.includes('/admin') && !page.includes('/draft'),
changefreq: 'daily',
priority: 0.8
}
})
```
##### `caching`
Configure HTTP cache headers for discovery files.
**Type:**
```typescript
interface CachingConfig {
robots?: number; // seconds
llms?: number;
humans?: number;
sitemap?: number;
}
```
**Default:**
```typescript
{
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
sitemap: 3600 // 1 hour
}
```
## Advanced Usage
### Custom Templates
You can provide custom templates for any file:
```typescript
discovery({
templates: {
robots: (config, siteURL) => `
User-agent: *
Allow: /
# Custom content
Sitemap: ${siteURL}/sitemap-index.xml
`,
llms: (config, siteURL) => `
# ${config.description}
Visit ${siteURL} for more information.
`
}
})
```
### Conditional Generation
Disable specific files in certain environments:
```typescript
discovery({
robots: {
enabled: import.meta.env.PROD // Only in production
},
llms: {
enabled: true // Always generate
},
humans: {
enabled: import.meta.env.DEV // Only in development
}
})
```
### Dynamic Content
Use functions for dynamic content:
```typescript
discovery({
llms: {
description: () => {
const pkg = JSON.parse(fs.readFileSync('./package.json', 'utf-8'));
return `${pkg.name} - ${pkg.description}`;
},
apiEndpoints: async () => {
// Load from OpenAPI spec
const spec = await loadOpenAPISpec();
return spec.paths.map(path => ({
path: path.url,
method: path.method,
description: path.summary
}));
}
}
})
```
## Integration with Other Tools
### With @astrojs/sitemap
The discovery integration automatically includes `@astrojs/sitemap`, so you don't need to install it separately. Configuration is passed through:
```typescript
discovery({
sitemap: {
// All @astrojs/sitemap options work here
filter: (page) => !page.includes('/secret'),
changefreq: 'weekly'
}
})
```
### With Content Collections
Automatically extract information from content collections:
```typescript
discovery({
llms: {
importantPages: async () => {
const docs = await getCollection('docs');
return docs.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
}
})
```
### With Environment Variables
Use environment variables for sensitive information:
```typescript
discovery({
humans: {
team: [
{
name: 'Developer',
contact: process.env.PUBLIC_CONTACT_EMAIL
}
]
}
})
```
## Output
The integration generates the following files:
### `/robots.txt`
```
User-agent: *
Allow: /
# Sitemaps
Sitemap: https://example.com/sitemap-index.xml
# LLM-specific resources
User-agent: Anthropic-AI
User-agent: Claude-Web
User-agent: GPTBot
Allow: /llms.txt
# Crawl delay
Crawl-delay: 1
```
### `/llms.txt`
```
# Project Name - Description
> Short tagline
## Site Information
- Name: Project Name
- Description: Full description
- URL: https://example.com
## For AI Assistants
Instructions for AI assistants...
## API Endpoints
- GET /api/endpoint - Description
```
### `/humans.txt`
```
/* TEAM */
Name: Developer Name
Role: Position
Contact: email@example.com
/* THANKS */
- Thank you note 1
- Thank you note 2
/* SITE */
Tech stack and details...
```
### `/sitemap-index.xml`
Standard XML sitemap with all your pages.
## Best Practices
### 1. **Set Your Site URL**
Always configure `site` in your Astro config:
```typescript
export default defineConfig({
site: 'https://example.com', // Required!
integrations: [discovery()]
});
```
### 2. **Keep humans.txt Updated**
Update your team information and tech stack regularly:
```typescript
discovery({
humans: {
site: {
lastUpdate: 'auto' // Automatically uses current date
}
}
})
```
### 3. **Be Specific with LLM Instructions**
Provide clear, actionable instructions for AI assistants:
```typescript
discovery({
llms: {
instructions: `
When helping users:
1. Always check API documentation first
2. Use the /api/search endpoint for queries
3. Format responses in markdown
4. Include relevant links
`
}
})
```
### 4. **Filter Private Pages**
Exclude admin, draft, and private pages:
```typescript
discovery({
sitemap: {
filter: (page) => {
return !page.includes('/admin') &&
!page.includes('/draft') &&
!page.includes('/private');
}
},
robots: {
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private']
}
]
}
})
```
### 5. **Optimize Cache Headers**
Balance freshness with server load:
```typescript
discovery({
caching: {
robots: 3600, // 1 hour - changes rarely
llms: 1800, // 30 min - may update instructions
humans: 86400, // 24 hours - credits don't change often
sitemap: 3600 // 1 hour - content changes moderately
}
})
```
## Troubleshooting
### Files Not Generating
1. **Check your output mode:**
```typescript
export default defineConfig({
output: 'hybrid', // or 'server'
// ...
});
```
2. **Verify site URL is set:**
```typescript
export default defineConfig({
site: 'https://example.com' // Must be set!
});
```
3. **Check for conflicts:**
Remove any existing `/public/robots.txt` or similar static files.
### Wrong URLs in Files
Make sure your `site` config matches your production domain:
```typescript
export default defineConfig({
site: import.meta.env.PROD
? 'https://production.com'
: 'http://localhost:4321'
});
```
### LLM Bots Not Respecting Instructions
- Ensure `/llms.txt` is accessible
- Check robots.txt allows LLM bots
- Verify content is properly formatted
### Sitemap Issues
Check `@astrojs/sitemap` documentation for detailed troubleshooting:
https://docs.astro.build/en/guides/integrations-guide/sitemap/
## Migration Guide
### From Manual Files
If you have existing static files in `/public`, remove them:
```bash
rm public/robots.txt
rm public/humans.txt
rm public/sitemap.xml
```
Then configure the integration with your existing content:
```typescript
discovery({
humans: {
team: [/* your existing team data */],
thanks: [/* your existing thanks */]
}
})
```
### From @astrojs/sitemap
Replace:
```typescript
import sitemap from '@astrojs/sitemap';
export default defineConfig({
integrations: [sitemap()]
});
```
With:
```typescript
import discovery from '@astrojs/discovery';
export default defineConfig({
integrations: [
discovery({
sitemap: {
// Your existing sitemap config
}
})
]
});
```
## Examples
### E-commerce Site
```typescript
discovery({
robots: {
crawlDelay: 2,
additionalAgents: [
{
userAgent: 'PriceBot',
disallow: ['/checkout', '/account']
}
]
},
llms: {
description: 'Online store for sustainable products',
keyFeatures: [
'Eco-friendly product catalog',
'Carbon footprint calculator',
'Sustainable shipping options'
],
apiEndpoints: [
{ path: '/api/products', description: 'Product catalog' },
{ path: '/api/calculate-carbon', description: 'Carbon calculator' }
]
},
sitemap: {
filter: (page) =>
!page.includes('/checkout') &&
!page.includes('/account')
}
})
```
### Documentation Site
```typescript
discovery({
llms: {
description: 'Technical documentation for our API',
instructions: `
When helping users:
1. Search documentation before answering
2. Provide code examples from /examples
3. Link to relevant API reference pages
4. Suggest similar solutions from FAQ
`,
importantPages: async () => {
const docs = await getCollection('docs');
return docs
.filter(doc => doc.data.featured)
.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
},
humans: {
team: [
{
name: 'Documentation Team',
contact: 'docs@example.com'
}
],
thanks: [
'Our amazing community contributors',
'Technical writers worldwide'
]
}
})
```
### Personal Blog
```typescript
discovery({
llms: {
description: 'Personal blog about web development',
brandVoice: [
'Casual and friendly',
'Technical but accessible',
'Focus on practical examples'
]
},
humans: {
team: [
{
name: 'Jane Blogger',
role: 'Writer & Developer',
twitter: '@janeblogger',
github: 'jane-dev'
}
],
story: `
Started this blog to document my journey learning web development.
Went from tutorial hell to building real projects. Now sharing
what I've learned to help others on their journey.
`,
funFacts: [
'All posts written in markdown',
'Powered by coffee and curiosity',
'Deployed automatically on every commit'
]
}
})
```
## Performance
The integration is designed for minimal performance impact:
- **Build Time**: Adds ~100-200ms to build process
- **Runtime**: All files are statically generated at build time
- **Caching**: Smart HTTP cache headers reduce server load
- **Bundle Size**: Zero client-side JavaScript
## Contributing
We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md).
## License
MIT
## Related
- [@astrojs/sitemap](https://docs.astro.build/en/guides/integrations-guide/sitemap/)
- [humanstxt.org](https://humanstxt.org/)
- [llms.txt spec](https://github.com/anthropics/llm-txt)
- [robots.txt spec](https://developers.google.com/search/docs/crawling-indexing/robots/intro)
## Credits
Built with inspiration from:
- The Astro community
- humanstxt.org initiative
- Anthropic's llms.txt proposal
- Web standards organizations
---
**Made with ❤️ by the Astro community**

View File

@ -0,0 +1,178 @@
import { defineConfig } from 'astro/config';
import discovery from '@astrojs/discovery';
// Example configuration showing all available options
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery({
// Robots.txt configuration
robots: {
crawlDelay: 2,
allowAllBots: true,
llmBots: {
enabled: true,
// Default bots are included, add custom ones here
agents: [
'Anthropic-AI',
'Claude-Web',
'GPTBot',
'ChatGPT-User',
'CustomBot',
],
},
additionalAgents: [
{
userAgent: 'BadBot',
disallow: ['/'],
},
{
userAgent: 'GoodBot',
allow: ['/api'],
disallow: ['/admin'],
},
],
customRules: `
# Custom rules
User-agent: SpecialBot
Crawl-delay: 10
`.trim(),
},
// LLMs.txt configuration
llms: {
description: 'Your site description for AI assistants',
keyFeatures: [
'Feature 1',
'Feature 2',
'Feature 3',
],
importantPages: [
{
name: 'Documentation',
path: '/docs',
description: 'Complete API documentation',
},
{
name: 'Blog',
path: '/blog',
description: 'Latest articles and tutorials',
},
],
instructions: `
When helping users with our site:
1. Check documentation first at /docs
2. Use provided API endpoints
3. Follow brand guidelines
4. Be helpful and accurate
`.trim(),
apiEndpoints: [
{
path: '/api/chat',
method: 'POST',
description: 'Chat endpoint for conversations',
},
{
path: '/api/search',
method: 'GET',
description: 'Search API for content',
},
],
techStack: {
frontend: ['Astro', 'TypeScript', 'React'],
backend: ['Node.js', 'FastAPI'],
ai: ['Claude', 'GPT-4'],
other: ['Docker', 'PostgreSQL'],
},
brandVoice: [
'Professional yet friendly',
'Technical but accessible',
'Focus on practical examples',
],
customSections: {
'Contact': 'For support, email support@example.com',
},
},
// Humans.txt configuration
humans: {
team: [
{
name: 'Jane Doe',
role: 'Creator & Developer',
contact: 'jane@example.com',
location: 'San Francisco, CA',
twitter: '@janedoe',
github: 'janedoe',
},
{
name: 'John Smith',
role: 'Designer',
contact: 'john@example.com',
location: 'New York, NY',
},
],
thanks: [
'The Astro team for amazing tools',
'Open source community',
'Coffee ☕',
],
site: {
lastUpdate: 'auto', // or specific date like '2025-11-03'
language: 'English',
doctype: 'HTML5',
ide: 'VS Code',
techStack: ['Astro', 'TypeScript', 'React', 'Tailwind CSS'],
standards: ['HTML5', 'CSS3', 'ES2022'],
components: ['Astro Components', 'React Components'],
software: ['Node.js', 'TypeScript', 'Git'],
},
story: `
This project started when we realized there was a need for better
discovery mechanisms on the web. We wanted to make it easy for
search engines, AI assistants, and humans to understand what our
site is about and how to interact with it.
`.trim(),
funFacts: [
'Built with love and coffee',
'Over 100 commits in the first week',
'Designed with accessibility in mind',
],
philosophy: [
'Make the web more discoverable',
'Embrace open standards',
'Build with the future in mind',
],
customSections: {
'SUSTAINABILITY': 'This site is carbon neutral and hosted on green servers.',
},
},
// Sitemap configuration (passed to @astrojs/sitemap)
sitemap: {
filter: (page) =>
!page.includes('/admin') &&
!page.includes('/draft') &&
!page.includes('/private'),
changefreq: 'weekly',
priority: 0.7,
},
// HTTP caching configuration (in seconds)
caching: {
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
sitemap: 3600, // 1 hour
},
// Custom templates (optional)
// templates: {
// robots: (config, siteURL) => `Your custom robots.txt content`,
// llms: (config, siteURL) => `Your custom llms.txt content`,
// humans: (config, siteURL) => `Your custom humans.txt content`,
// },
}),
],
});

View File

@ -0,0 +1,18 @@
import { defineConfig } from 'astro/config';
import discovery from '@astrojs/discovery';
// Minimal configuration - just provide your site URL
// Everything else uses sensible defaults
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery(),
],
});
// This will generate:
// - /robots.txt with LLM bot support
// - /llms.txt with basic site info
// - /humans.txt with basic structure
// - /sitemap-index.xml with all your pages

6611
package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

62
package.json Normal file
View File

@ -0,0 +1,62 @@
{
"name": "@astrojs/discovery",
"version": "2025.11.03",
"description": "Complete discovery integration for Astro - handles robots.txt, llms.txt, humans.txt, and sitemap generation",
"type": "module",
"exports": {
".": "./dist/index.js",
"./routes/*": "./dist/routes/*.js"
},
"files": [
"dist",
"README.md",
"LICENSE"
],
"scripts": {
"build": "tsc",
"dev": "tsc --watch",
"test": "vitest",
"test:ci": "vitest run",
"prepublishOnly": "npm run build && npm test"
},
"peerDependencies": {
"astro": "^5.0.0"
},
"dependencies": {
"@astrojs/sitemap": "^3.6.0"
},
"devDependencies": {
"@types/node": "^22.0.0",
"astro": "^5.0.0",
"typescript": "^5.7.0",
"vitest": "^2.1.0"
},
"keywords": [
"astro",
"astro-integration",
"astro-component",
"robots",
"sitemap",
"llms",
"llms-txt",
"humans",
"humans-txt",
"discovery",
"seo",
"ai",
"llm"
],
"author": {
"name": "Ryan Malloy",
"email": "ryan@supported.systems"
},
"license": "MIT",
"repository": {
"type": "git",
"url": "https://github.com/withastro/astro-discovery"
},
"bugs": {
"url": "https://github.com/withastro/astro-discovery/issues"
},
"homepage": "https://github.com/withastro/astro-discovery#readme"
}

15
src/config-store.ts Normal file
View File

@ -0,0 +1,15 @@
import type { DiscoveryConfig } from './types.js';
/**
* Shared configuration store
* This allows the integration to pass config to route handlers
*/
let globalConfig: DiscoveryConfig = {};
export function setConfig(config: DiscoveryConfig): void {
globalConfig = config;
}
export function getConfig(): DiscoveryConfig {
return globalConfig;
}

145
src/generators/humans.ts Normal file
View File

@ -0,0 +1,145 @@
import type { HumansConfig } from '../types.js';
/**
* Generate humans.txt content
*
* This file provides human-readable credits and information about
* the site, team, and technology stack.
*
* @param config - Humans.txt configuration
* @returns Generated humans.txt content
*/
export function generateHumansTxt(config: HumansConfig): string {
const lines: string[] = [];
// Team section
if (config.team && config.team.length > 0) {
lines.push('/* TEAM */');
lines.push('');
config.team.forEach((member, index) => {
if (index > 0) {
lines.push('');
}
lines.push(` Name: ${member.name}`);
if (member.role) {
lines.push(` Role: ${member.role}`);
}
if (member.contact) {
lines.push(` Contact: ${member.contact}`);
}
if (member.location) {
lines.push(` From: ${member.location}`);
}
if (member.twitter) {
lines.push(` Twitter: ${member.twitter}`);
}
if (member.github) {
lines.push(` GitHub: ${member.github}`);
}
});
lines.push('');
}
// Thanks section
if (config.thanks && config.thanks.length > 0) {
lines.push('/* THANKS */');
lines.push('');
config.thanks.forEach(thanks => {
lines.push(` ${thanks}`);
});
lines.push('');
}
// Site section
if (config.site) {
lines.push('/* SITE */');
lines.push('');
const lastUpdate = config.site.lastUpdate === 'auto'
? new Date().toISOString().split('T')[0]
: config.site.lastUpdate;
if (lastUpdate) {
lines.push(` Last update: ${lastUpdate}`);
}
if (config.site.language) {
lines.push(` Language: ${config.site.language}`);
}
if (config.site.doctype) {
lines.push(` Doctype: ${config.site.doctype}`);
}
if (config.site.ide) {
lines.push(` IDE: ${config.site.ide}`);
}
if (config.site.techStack && config.site.techStack.length > 0) {
lines.push(` Tech Stack: ${config.site.techStack.join(', ')}`);
}
if (config.site.standards && config.site.standards.length > 0) {
lines.push(` Standards: ${config.site.standards.join(', ')}`);
}
if (config.site.components && config.site.components.length > 0) {
lines.push(` Components: ${config.site.components.join(', ')}`);
}
if (config.site.software && config.site.software.length > 0) {
lines.push(` Software: ${config.site.software.join(', ')}`);
}
lines.push('');
}
// Story section
if (config.story) {
lines.push('/* THE STORY */');
lines.push('');
// Indent multi-line stories
const storyLines = config.story.trim().split('\n');
storyLines.forEach(line => {
lines.push(` ${line.trim()}`);
});
lines.push('');
}
// Fun Facts section
if (config.funFacts && config.funFacts.length > 0) {
lines.push('/* FUN FACTS */');
lines.push('');
config.funFacts.forEach(fact => {
lines.push(` ${fact}`);
});
lines.push('');
}
// Philosophy section
if (config.philosophy && config.philosophy.length > 0) {
lines.push('/* PHILOSOPHY */');
lines.push('');
config.philosophy.forEach(item => {
lines.push(` "${item}"`);
});
lines.push('');
}
// Custom sections
if (config.customSections) {
Object.entries(config.customSections).forEach(([title, content]) => {
lines.push(`/* ${title.toUpperCase()} */`);
lines.push('');
// Indent custom content
const contentLines = content.trim().split('\n');
contentLines.forEach(line => {
lines.push(` ${line.trim()}`);
});
lines.push('');
});
}
return lines.join('\n').trim() + '\n';
}

146
src/generators/llms.ts Normal file
View File

@ -0,0 +1,146 @@
import type { LLMsConfig } from '../types.js';
/**
* Generate llms.txt content
*
* This file provides context and instructions for AI assistants
* following the llms.txt specification.
*
* @param config - LLMs.txt configuration
* @param siteURL - Site base URL
* @returns Generated llms.txt content
*/
export async function generateLLMsTxt(
config: LLMsConfig,
siteURL: URL
): Promise<string> {
const lines: string[] = [];
// Header with site name
const description = typeof config.description === 'function'
? config.description()
: config.description;
lines.push(`# ${siteURL.hostname}`);
if (description) {
lines.push('');
lines.push(`> ${description}`);
}
lines.push('');
lines.push('---');
lines.push('');
// Site Information
lines.push('## Site Information');
lines.push('');
lines.push(`- **URL**: ${siteURL.href}`);
if (description) {
lines.push(`- **Description**: ${description}`);
}
lines.push('');
// Key Features
if (config.keyFeatures && config.keyFeatures.length > 0) {
lines.push('## Key Features');
lines.push('');
config.keyFeatures.forEach(feature => {
lines.push(`- ${feature}`);
});
lines.push('');
}
// Important Pages
if (config.importantPages) {
const pages = typeof config.importantPages === 'function'
? await config.importantPages()
: config.importantPages;
if (pages.length > 0) {
lines.push('## Important Pages');
lines.push('');
pages.forEach(page => {
const url = new URL(page.path, siteURL).href;
lines.push(`- **[${page.name}](${url})**`);
if (page.description) {
lines.push(` ${page.description}`);
}
});
lines.push('');
}
}
// Instructions for AI Assistants
if (config.instructions) {
lines.push('## Instructions for AI Assistants');
lines.push('');
lines.push(config.instructions.trim());
lines.push('');
}
// API Endpoints
if (config.apiEndpoints && config.apiEndpoints.length > 0) {
lines.push('## API Endpoints');
lines.push('');
config.apiEndpoints.forEach(endpoint => {
const method = endpoint.method || 'GET';
const fullUrl = new URL(endpoint.path, siteURL).href;
lines.push(`- \`${method} ${endpoint.path}\``);
lines.push(` ${endpoint.description}`);
lines.push(` Full URL: ${fullUrl}`);
});
lines.push('');
}
// Tech Stack
if (config.techStack) {
const hasAnyTech = Object.values(config.techStack).some(arr => arr && arr.length > 0);
if (hasAnyTech) {
lines.push('## Technical Stack');
lines.push('');
if (config.techStack.frontend && config.techStack.frontend.length > 0) {
lines.push(`- **Frontend**: ${config.techStack.frontend.join(', ')}`);
}
if (config.techStack.backend && config.techStack.backend.length > 0) {
lines.push(`- **Backend**: ${config.techStack.backend.join(', ')}`);
}
if (config.techStack.ai && config.techStack.ai.length > 0) {
lines.push(`- **AI/ML**: ${config.techStack.ai.join(', ')}`);
}
if (config.techStack.other && config.techStack.other.length > 0) {
lines.push(`- **Other**: ${config.techStack.other.join(', ')}`);
}
lines.push('');
}
}
// Brand Voice
if (config.brandVoice && config.brandVoice.length > 0) {
lines.push('## Brand Voice & Guidelines');
lines.push('');
config.brandVoice.forEach(item => {
lines.push(`- ${item}`);
});
lines.push('');
}
// Custom Sections
if (config.customSections) {
Object.entries(config.customSections).forEach(([title, content]) => {
lines.push(`## ${title}`);
lines.push('');
lines.push(content.trim());
lines.push('');
});
}
// Footer
lines.push('---');
lines.push('');
lines.push(`**Last Updated**: ${new Date().toISOString().split('T')[0]}`);
lines.push('');
lines.push('*This file was generated by [@astrojs/discovery](https://github.com/withastro/astro-discovery)*');
return lines.join('\n').trim() + '\n';
}

102
src/generators/robots.ts Normal file
View File

@ -0,0 +1,102 @@
import type { RobotsConfig } from '../types.js';
/**
* Default LLM bot user agents that should have access to llms.txt
*/
const DEFAULT_LLM_BOTS = [
'Anthropic-AI',
'Claude-Web',
'GPTBot',
'ChatGPT-User',
'cohere-ai',
'Google-Extended',
'PerplexityBot',
'Applebot-Extended',
];
/**
* Generate robots.txt content
*
* @param config - Robots.txt configuration
* @param siteURL - Site base URL
* @returns Generated robots.txt content
*/
export function generateRobotsTxt(
config: RobotsConfig,
siteURL: URL
): string {
const lines: string[] = [];
// Header comment
lines.push('# robots.txt');
lines.push(`# Generated by @astrojs/discovery for ${siteURL.hostname}`);
lines.push('');
// Allow all bots by default
if (config.allowAllBots !== false) {
lines.push('User-agent: *');
lines.push('Allow: /');
lines.push('');
}
// Add sitemap reference
lines.push('# Sitemaps');
lines.push(`Sitemap: ${new URL('sitemap-index.xml', siteURL).href}`);
lines.push('');
// LLM-specific rules
if (config.llmBots?.enabled !== false) {
lines.push('# LLM-specific resources');
lines.push('# AI assistants can find additional context at /llms.txt');
lines.push('# See: https://github.com/anthropics/llm-txt');
lines.push('');
const agents = config.llmBots?.agents || DEFAULT_LLM_BOTS;
agents.forEach(agent => {
lines.push(`User-agent: ${agent}`);
});
lines.push('Allow: /llms.txt');
lines.push('Allow: /llms-full.txt');
lines.push('');
}
// Additional agent rules
if (config.additionalAgents && config.additionalAgents.length > 0) {
lines.push('# Custom agent rules');
lines.push('');
config.additionalAgents.forEach(agent => {
lines.push(`User-agent: ${agent.userAgent}`);
if (agent.allow && agent.allow.length > 0) {
agent.allow.forEach(path => {
lines.push(`Allow: ${path}`);
});
}
if (agent.disallow && agent.disallow.length > 0) {
agent.disallow.forEach(path => {
lines.push(`Disallow: ${path}`);
});
}
lines.push('');
});
}
// Crawl delay
if (config.crawlDelay) {
lines.push('# Crawl delay (be nice to our server)');
lines.push(`Crawl-delay: ${config.crawlDelay}`);
lines.push('');
}
// Custom rules
if (config.customRules) {
lines.push('# Custom rules');
lines.push(config.customRules.trim());
lines.push('');
}
return lines.join('\n').trim() + '\n';
}

127
src/index.ts Normal file
View File

@ -0,0 +1,127 @@
import type { AstroIntegration } from 'astro';
import type { DiscoveryConfig } from './types.js';
import sitemap from '@astrojs/sitemap';
import { validateConfig } from './validators/config.js';
import { setConfig } from './config-store.js';
/**
* Astro Discovery Integration
*
* Automatically generates discovery files for your Astro site:
* - /robots.txt - Search engine and bot instructions
* - /llms.txt - AI assistant context and guidelines
* - /humans.txt - Human-readable credits and information
* - /sitemap-index.xml - Site structure for search engines
*
* @param userConfig - Optional configuration
* @returns Astro integration
*
* @example
* ```ts
* // astro.config.mjs
* import discovery from '@astrojs/discovery';
*
* export default defineConfig({
* site: 'https://example.com',
* integrations: [
* discovery({
* llms: {
* description: 'My awesome site',
* instructions: 'Be helpful and accurate'
* }
* })
* ]
* });
* ```
*/
export default function discovery(
userConfig: DiscoveryConfig = {}
): AstroIntegration {
// Merge with defaults and validate
const config = validateConfig(userConfig);
// Store config globally for route handlers to access
setConfig(config);
return {
name: '@astrojs/discovery',
hooks: {
'astro:config:setup': ({ config: astroConfig, injectRoute, updateConfig }) => {
// Ensure site is configured
if (!astroConfig.site) {
throw new Error(
'[@astrojs/discovery] The `site` option must be set in your Astro config.\n' +
'Example: site: "https://example.com"'
);
}
// Add sitemap integration
updateConfig({
integrations: [
sitemap(config.sitemap || {})
]
});
// Inject dynamic routes for discovery files
if (config.robots?.enabled !== false) {
injectRoute({
pattern: '/robots.txt',
entrypoint: '@astrojs/discovery/routes/robots',
prerender: true
});
}
if (config.llms?.enabled !== false) {
injectRoute({
pattern: '/llms.txt',
entrypoint: '@astrojs/discovery/routes/llms',
prerender: true
});
}
if (config.humans?.enabled !== false) {
injectRoute({
pattern: '/humans.txt',
entrypoint: '@astrojs/discovery/routes/humans',
prerender: true
});
}
},
'astro:build:done': () => {
// Post-build notification
console.log('\n✨ @astrojs/discovery - Generated files:');
if (config.robots?.enabled !== false) {
console.log(' ✅ /robots.txt');
}
if (config.llms?.enabled !== false) {
console.log(' ✅ /llms.txt');
}
if (config.humans?.enabled !== false) {
console.log(' ✅ /humans.txt');
}
console.log(' ✅ /sitemap-index.xml');
console.log('');
}
}
};
}
// Named exports
export type {
DiscoveryConfig,
RobotsConfig,
LLMsConfig,
HumansConfig,
SitemapConfig,
CachingConfig,
TemplateConfig,
ImportantPage,
APIEndpoint,
TechStack,
TeamMember,
SiteInfo,
SitemapItem,
} from './types.js';

30
src/routes/humans.ts Normal file
View File

@ -0,0 +1,30 @@
import type { APIRoute } from 'astro';
import { generateHumansTxt } from '../generators/humans.js';
import { getConfig } from '../config-store.js';
/**
* API route for /humans.txt
*/
export const GET: APIRoute = ({ site }) => {
const config = getConfig();
const humansConfig = config.humans || {};
const siteURL = site || new URL('http://localhost:4321');
// Use custom template if provided
const content = config.templates?.humans
? config.templates.humans(humansConfig, siteURL)
: generateHumansTxt(humansConfig);
// Get cache duration (default: 24 hours)
const cacheSeconds = config.caching?.humans ?? 86400;
return new Response(content, {
status: 200,
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': `public, max-age=${cacheSeconds}`,
},
});
};
export const prerender = true;

30
src/routes/llms.ts Normal file
View File

@ -0,0 +1,30 @@
import type { APIRoute } from 'astro';
import { generateLLMsTxt } from '../generators/llms.js';
import { getConfig } from '../config-store.js';
/**
* API route for /llms.txt
*/
export const GET: APIRoute = async ({ site }) => {
const config = getConfig();
const llmsConfig = config.llms || {};
const siteURL = site || new URL('http://localhost:4321');
// Use custom template if provided
const content = config.templates?.llms
? await config.templates.llms(llmsConfig, siteURL)
: await generateLLMsTxt(llmsConfig, siteURL);
// Get cache duration (default: 1 hour)
const cacheSeconds = config.caching?.llms ?? 3600;
return new Response(content, {
status: 200,
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': `public, max-age=${cacheSeconds}`,
},
});
};
export const prerender = true;

30
src/routes/robots.ts Normal file
View File

@ -0,0 +1,30 @@
import type { APIRoute } from 'astro';
import { generateRobotsTxt } from '../generators/robots.js';
import { getConfig } from '../config-store.js';
/**
* API route for /robots.txt
*/
export const GET: APIRoute = ({ site }) => {
const config = getConfig();
const robotsConfig = config.robots || {};
const siteURL = site || new URL('http://localhost:4321');
// Use custom template if provided
const content = config.templates?.robots
? config.templates.robots(robotsConfig, siteURL)
: generateRobotsTxt(robotsConfig, siteURL);
// Get cache duration (default: 1 hour)
const cacheSeconds = config.caching?.robots ?? 3600;
return new Response(content, {
status: 200,
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': `public, max-age=${cacheSeconds}`,
},
});
};
export const prerender = true;

235
src/types.ts Normal file
View File

@ -0,0 +1,235 @@
/**
* Main configuration interface for the Astro Discovery integration
*/
export interface DiscoveryConfig {
/** Configuration for robots.txt generation */
robots?: RobotsConfig;
/** Configuration for llms.txt generation */
llms?: LLMsConfig;
/** Configuration for humans.txt generation */
humans?: HumansConfig;
/** Configuration passed to @astrojs/sitemap */
sitemap?: SitemapConfig;
/** HTTP cache control configuration */
caching?: CachingConfig;
/** Custom template functions */
templates?: TemplateConfig;
}
/**
* Configuration for robots.txt generation
*/
export interface RobotsConfig {
/** Enable/disable robots.txt generation (default: true) */
enabled?: boolean;
/** Crawl delay in seconds for polite crawlers */
crawlDelay?: number;
/** Allow all bots by default (default: true) */
allowAllBots?: boolean;
/** LLM-specific bot configuration */
llmBots?: {
/** Enable LLM bot rules (default: true) */
enabled?: boolean;
/** Custom LLM bot user agents */
agents?: string[];
};
/** Additional custom agent rules */
additionalAgents?: Array<{
/** User agent string */
userAgent: string;
/** Paths to allow */
allow?: string[];
/** Paths to disallow */
disallow?: string[];
}>;
/** Custom raw robots.txt content to append */
customRules?: string;
}
/**
* Configuration for llms.txt generation
*/
export interface LLMsConfig {
/** Enable/disable llms.txt generation (default: true) */
enabled?: boolean;
/** Site description for AI assistants (can be dynamic) */
description?: string | (() => string);
/** Key features of the site */
keyFeatures?: string[];
/** Important pages for AI to know about */
importantPages?: ImportantPage[] | (() => Promise<ImportantPage[]>);
/** Instructions for AI assistants */
instructions?: string;
/** API endpoints available */
apiEndpoints?: APIEndpoint[];
/** Technology stack information */
techStack?: TechStack;
/** Brand voice guidelines */
brandVoice?: string[];
/** Custom sections to add */
customSections?: Record<string, string>;
}
/**
* Configuration for humans.txt generation
*/
export interface HumansConfig {
/** Enable/disable humans.txt generation (default: true) */
enabled?: boolean;
/** Team members */
team?: TeamMember[];
/** Thank you notes */
thanks?: string[];
/** Site information */
site?: SiteInfo;
/** Project story/history */
story?: string;
/** Fun facts about the project */
funFacts?: string[];
/** Development philosophy */
philosophy?: string[];
/** Custom sections to add */
customSections?: Record<string, string>;
}
/**
* Configuration for sitemap generation (passed to @astrojs/sitemap)
* This is a simplified type - actual options are passed through to @astrojs/sitemap
*/
export interface SitemapConfig {
/** Filter function to exclude pages */
filter?: (page: string) => boolean;
/** Custom pages to include */
customPages?: string[];
/** Change frequency hint */
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
/** Priority hint (0.0 - 1.0) */
priority?: number;
/** Internationalization configuration */
i18n?: {
defaultLocale: string;
locales: Record<string, string>;
};
/** Last modification date */
lastmod?: Date;
/** Allow any other sitemap options from @astrojs/sitemap */
[key: string]: any;
}
/**
* HTTP caching configuration (in seconds)
*/
export interface CachingConfig {
/** Cache duration for robots.txt (default: 3600) */
robots?: number;
/** Cache duration for llms.txt (default: 3600) */
llms?: number;
/** Cache duration for humans.txt (default: 86400) */
humans?: number;
/** Cache duration for sitemap (default: 3600) */
sitemap?: number;
}
/**
* Custom template functions
*/
export interface TemplateConfig {
/** Custom robots.txt template */
robots?: (config: RobotsConfig, siteURL: URL) => string;
/** Custom llms.txt template */
llms?: (config: LLMsConfig, siteURL: URL) => string | Promise<string>;
/** Custom humans.txt template */
humans?: (config: HumansConfig, siteURL: URL) => string;
}
/**
* Important page definition for llms.txt
*/
export interface ImportantPage {
/** Page name/title */
name: string;
/** Path relative to site root */
path: string;
/** Optional description */
description?: string;
}
/**
* API endpoint definition for llms.txt
*/
export interface APIEndpoint {
/** Endpoint path */
path: string;
/** HTTP method (default: GET) */
method?: string;
/** Endpoint description */
description: string;
}
/**
* Technology stack information for llms.txt
*/
export interface TechStack {
/** Frontend technologies */
frontend?: string[];
/** Backend technologies */
backend?: string[];
/** AI/ML technologies */
ai?: string[];
/** Other technologies */
other?: string[];
}
/**
* Team member definition for humans.txt
*/
export interface TeamMember {
/** Full name */
name: string;
/** Role/title */
role?: string;
/** Contact email */
contact?: string;
/** Location */
location?: string;
/** Twitter handle */
twitter?: string;
/** GitHub username */
github?: string;
}
/**
* Site information for humans.txt
*/
export interface SiteInfo {
/** Last update date ('auto' for current date) */
lastUpdate?: string | 'auto';
/** Primary language */
language?: string;
/** Document type */
doctype?: string;
/** IDE/editor used */
ide?: string;
/** Technology stack */
techStack?: string[];
/** Web standards followed */
standards?: string[];
/** Components/libraries used */
components?: string[];
/** Software/tools used */
software?: string[];
}
/**
* Sitemap item interface (from @astrojs/sitemap)
*/
export interface SitemapItem {
url: string;
lastmod?: Date;
changefreq?: 'always' | 'hourly' | 'daily' | 'weekly' | 'monthly' | 'yearly' | 'never';
priority?: number;
links?: Array<{
url: string;
lang: string;
}>;
}

77
src/validators/config.ts Normal file
View File

@ -0,0 +1,77 @@
import type { DiscoveryConfig } from '../types.js';
/**
* Default configuration values
*/
const DEFAULT_CONFIG: Required<Omit<DiscoveryConfig, 'templates'>> & { templates?: DiscoveryConfig['templates'] } = {
robots: {
enabled: true,
crawlDelay: 1,
allowAllBots: true,
llmBots: {
enabled: true,
},
},
llms: {
enabled: true,
},
humans: {
enabled: true,
},
sitemap: {},
caching: {
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
sitemap: 3600, // 1 hour
},
};
/**
* Validate and merge user configuration with defaults
*
* @param userConfig - User-provided configuration
* @returns Validated and merged configuration
*/
export function validateConfig(userConfig: DiscoveryConfig = {}): DiscoveryConfig {
const config: DiscoveryConfig = {
robots: {
...DEFAULT_CONFIG.robots,
...userConfig.robots,
llmBots: {
...DEFAULT_CONFIG.robots.llmBots,
...userConfig.robots?.llmBots,
},
},
llms: {
...DEFAULT_CONFIG.llms,
...userConfig.llms,
},
humans: {
...DEFAULT_CONFIG.humans,
...userConfig.humans,
},
sitemap: {
...DEFAULT_CONFIG.sitemap,
...userConfig.sitemap,
},
caching: {
...DEFAULT_CONFIG.caching,
...userConfig.caching,
},
templates: userConfig.templates,
};
// Validation warnings (non-breaking)
if (config.caching) {
Object.entries(config.caching).forEach(([key, value]) => {
if (value !== undefined && (value < 0 || value > 31536000)) {
console.warn(
`@astrojs/discovery: Cache duration for "${key}" should be between 0 and 31536000 seconds (1 year). Got: ${value}`
);
}
});
}
return config;
}

155
tests/humans.test.ts Normal file
View File

@ -0,0 +1,155 @@
import { describe, it, expect } from 'vitest';
import { generateHumansTxt } from '../src/generators/humans.js';
describe('generateHumansTxt', () => {
it('generates basic humans.txt structure', () => {
const result = generateHumansTxt({});
expect(result).toBeTruthy();
});
it('includes team section', () => {
const result = generateHumansTxt({
team: [
{
name: 'Jane Doe',
role: 'Developer',
contact: 'jane@example.com',
location: 'SF',
twitter: '@jane',
github: 'jane',
},
],
});
expect(result).toContain('/* TEAM */');
expect(result).toContain('Name: Jane Doe');
expect(result).toContain('Role: Developer');
expect(result).toContain('Contact: jane@example.com');
expect(result).toContain('From: SF');
expect(result).toContain('Twitter: @jane');
expect(result).toContain('GitHub: jane');
});
it('includes multiple team members', () => {
const result = generateHumansTxt({
team: [
{ name: 'Jane Doe' },
{ name: 'John Smith' },
],
});
expect(result).toContain('Jane Doe');
expect(result).toContain('John Smith');
});
it('includes thanks section', () => {
const result = generateHumansTxt({
thanks: ['Coffee', 'Stack Overflow'],
});
expect(result).toContain('/* THANKS */');
expect(result).toContain('Coffee');
expect(result).toContain('Stack Overflow');
});
it('includes site section with auto date', () => {
const result = generateHumansTxt({
site: {
lastUpdate: 'auto',
language: 'English',
doctype: 'HTML5',
ide: 'VS Code',
},
});
const today = new Date().toISOString().split('T')[0];
expect(result).toContain('/* SITE */');
expect(result).toContain(`Last update: ${today}`);
expect(result).toContain('Language: English');
expect(result).toContain('Doctype: HTML5');
expect(result).toContain('IDE: VS Code');
});
it('includes site section with custom date', () => {
const result = generateHumansTxt({
site: {
lastUpdate: '2025-11-03',
},
});
expect(result).toContain('Last update: 2025-11-03');
});
it('includes tech stack', () => {
const result = generateHumansTxt({
site: {
techStack: ['Astro', 'TypeScript', 'React'],
standards: ['HTML5', 'CSS3'],
components: ['Astro Components'],
software: ['VS Code', 'Git'],
},
});
expect(result).toContain('Tech Stack: Astro, TypeScript, React');
expect(result).toContain('Standards: HTML5, CSS3');
expect(result).toContain('Components: Astro Components');
expect(result).toContain('Software: VS Code, Git');
});
it('includes story section', () => {
const result = generateHumansTxt({
story: 'This is our story.\nIt spans multiple lines.',
});
expect(result).toContain('/* THE STORY */');
expect(result).toContain('This is our story.');
});
it('includes fun facts', () => {
const result = generateHumansTxt({
funFacts: ['Built with love', 'Coffee powered'],
});
expect(result).toContain('/* FUN FACTS */');
expect(result).toContain('Built with love');
expect(result).toContain('Coffee powered');
});
it('includes philosophy section', () => {
const result = generateHumansTxt({
philosophy: ['Make it simple', 'Make it work'],
});
expect(result).toContain('/* PHILOSOPHY */');
expect(result).toContain('"Make it simple"');
expect(result).toContain('"Make it work"');
});
it('includes custom sections', () => {
const result = generateHumansTxt({
customSections: {
'CONTACT': 'Email: info@example.com',
'LICENSE': 'MIT License',
},
});
expect(result).toContain('/* CONTACT */');
expect(result).toContain('Email: info@example.com');
expect(result).toContain('/* LICENSE */');
expect(result).toContain('MIT License');
});
it('properly indents content', () => {
const result = generateHumansTxt({
team: [{ name: 'Jane' }],
});
expect(result).toContain(' Name: Jane');
});
it('ends with newline', () => {
const result = generateHumansTxt({});
expect(result.endsWith('\n')).toBe(true);
});
});

166
tests/llms.test.ts Normal file
View File

@ -0,0 +1,166 @@
import { describe, it, expect } from 'vitest';
import { generateLLMsTxt } from '../src/generators/llms.js';
describe('generateLLMsTxt', () => {
const testURL = new URL('https://example.com');
it('generates basic llms.txt with site URL', async () => {
const result = await generateLLMsTxt({}, testURL);
expect(result).toContain('# example.com');
expect(result).toContain('**URL**: https://example.com/');
});
it('includes description when provided', async () => {
const result = await generateLLMsTxt(
{ description: 'Test site description' },
testURL
);
expect(result).toContain('> Test site description');
expect(result).toContain('**Description**: Test site description');
});
it('supports dynamic description function', async () => {
const result = await generateLLMsTxt(
{ description: () => 'Dynamic description' },
testURL
);
expect(result).toContain('> Dynamic description');
});
it('includes key features', async () => {
const result = await generateLLMsTxt(
{
keyFeatures: ['Feature 1', 'Feature 2', 'Feature 3'],
},
testURL
);
expect(result).toContain('## Key Features');
expect(result).toContain('- Feature 1');
expect(result).toContain('- Feature 2');
});
it('includes important pages', async () => {
const result = await generateLLMsTxt(
{
importantPages: [
{
name: 'Docs',
path: '/docs',
description: 'Documentation',
},
],
},
testURL
);
expect(result).toContain('## Important Pages');
expect(result).toContain('[Docs]');
expect(result).toContain('https://example.com/docs');
expect(result).toContain('Documentation');
});
it('supports async important pages function', async () => {
const result = await generateLLMsTxt(
{
importantPages: async () => [
{ name: 'Blog', path: '/blog' },
],
},
testURL
);
expect(result).toContain('[Blog]');
});
it('includes AI instructions', async () => {
const result = await generateLLMsTxt(
{
instructions: 'Be helpful and accurate',
},
testURL
);
expect(result).toContain('## Instructions for AI Assistants');
expect(result).toContain('Be helpful and accurate');
});
it('includes API endpoints', async () => {
const result = await generateLLMsTxt(
{
apiEndpoints: [
{
path: '/api/test',
method: 'POST',
description: 'Test endpoint',
},
],
},
testURL
);
expect(result).toContain('## API Endpoints');
expect(result).toContain('POST /api/test');
expect(result).toContain('Test endpoint');
});
it('includes tech stack', async () => {
const result = await generateLLMsTxt(
{
techStack: {
frontend: ['Astro', 'React'],
backend: ['Node.js'],
ai: ['Claude'],
},
},
testURL
);
expect(result).toContain('## Technical Stack');
expect(result).toContain('**Frontend**: Astro, React');
expect(result).toContain('**Backend**: Node.js');
expect(result).toContain('**AI/ML**: Claude');
});
it('includes brand voice', async () => {
const result = await generateLLMsTxt(
{
brandVoice: ['Professional', 'Friendly'],
},
testURL
);
expect(result).toContain('## Brand Voice & Guidelines');
expect(result).toContain('- Professional');
expect(result).toContain('- Friendly');
});
it('includes custom sections', async () => {
const result = await generateLLMsTxt(
{
customSections: {
'Contact': 'Email: test@example.com',
},
},
testURL
);
expect(result).toContain('## Contact');
expect(result).toContain('Email: test@example.com');
});
it('includes last updated date', async () => {
const result = await generateLLMsTxt({}, testURL);
const today = new Date().toISOString().split('T')[0];
expect(result).toContain(`**Last Updated**: ${today}`);
});
it('ends with newline', async () => {
const result = await generateLLMsTxt({}, testURL);
expect(result.endsWith('\n')).toBe(true);
});
});

94
tests/robots.test.ts Normal file
View File

@ -0,0 +1,94 @@
import { describe, it, expect } from 'vitest';
import { generateRobotsTxt } from '../src/generators/robots.js';
describe('generateRobotsTxt', () => {
const testURL = new URL('https://example.com');
it('generates basic robots.txt with defaults', () => {
const result = generateRobotsTxt({}, testURL);
expect(result).toContain('User-agent: *');
expect(result).toContain('Allow: /');
expect(result).toContain('Sitemap: https://example.com/sitemap-index.xml');
});
it('includes LLM bots when enabled', () => {
const result = generateRobotsTxt(
{ llmBots: { enabled: true } },
testURL
);
expect(result).toContain('Anthropic-AI');
expect(result).toContain('GPTBot');
expect(result).toContain('Claude-Web');
expect(result).toContain('Allow: /llms.txt');
});
it('excludes LLM bots when disabled', () => {
const result = generateRobotsTxt(
{ llmBots: { enabled: false } },
testURL
);
expect(result).not.toContain('Anthropic-AI');
expect(result).not.toContain('GPTBot');
});
it('respects custom crawl delay', () => {
const result = generateRobotsTxt(
{ crawlDelay: 5 },
testURL
);
expect(result).toContain('Crawl-delay: 5');
});
it('includes custom agents', () => {
const result = generateRobotsTxt(
{
additionalAgents: [
{
userAgent: 'CustomBot',
allow: ['/api'],
disallow: ['/admin'],
},
],
},
testURL
);
expect(result).toContain('User-agent: CustomBot');
expect(result).toContain('Allow: /api');
expect(result).toContain('Disallow: /admin');
});
it('includes custom rules', () => {
const customRules = 'User-agent: SpecialBot\nCrawl-delay: 10';
const result = generateRobotsTxt(
{ customRules },
testURL
);
expect(result).toContain(customRules);
});
it('allows custom LLM bot agents', () => {
const result = generateRobotsTxt(
{
llmBots: {
enabled: true,
agents: ['CustomAI', 'AnotherBot'],
},
},
testURL
);
expect(result).toContain('CustomAI');
expect(result).toContain('AnotherBot');
});
it('ends with newline', () => {
const result = generateRobotsTxt({}, testURL);
expect(result.endsWith('\n')).toBe(true);
});
});

35
tsconfig.json Normal file
View File

@ -0,0 +1,35 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "ES2022",
"lib": ["ES2022"],
"moduleResolution": "bundler",
"resolveJsonModule": true,
"allowJs": true,
"checkJs": false,
"outDir": "./dist",
"rootDir": "./src",
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"strict": true,
"noImplicitAny": true,
"strictNullChecks": true,
"strictFunctionTypes": true,
"strictBindCallApply": true,
"strictPropertyInitialization": true,
"noImplicitThis": true,
"alwaysStrict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"allowSyntheticDefaultImports": true,
"types": ["node"]
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist", "**/*.test.ts"]
}