Complete all explanation pages

- why-discovery: Core rationale and evolution
- robots-explained: robots.txt mechanics and best practices
- llms-explained: AI assistant guidance and context
- humans-explained: Human-readable credits and culture
- security-explained: RFC 9116 responsible disclosure
- canary-explained: Warrant canaries and transparency
- webfinger-explained: RFC 7033 federated discovery
- seo: Discovery files impact on search optimization
- ai-integration: Strategy for AI-first discovery
- architecture: Internal design and extensibility

All pages follow Diátaxis explanation style: understanding-oriented,
provide context, explain design decisions, discuss alternatives.
This commit is contained in:
Ryan Malloy 2025-11-08 23:33:54 -07:00
parent 74cffc2842
commit 0191d08d14
2 changed files with 698 additions and 40 deletions

View File

@ -1,31 +1,264 @@
--- ---
title: AI Assistant Integration title: AI Assistant Integration Strategy
description: How AI assistants use discovery files description: How AI assistants use discovery files and how to optimize for them
--- ---
Learn how AI assistants discover and use information from your site. The relationship between websites and AI assistants is fundamentally different from traditional search engines. Understanding this difference is key to optimizing your site for AI-mediated discovery.
:::note[Work in Progress] ## Beyond Indexing: AI Understanding
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon Search engines **index** your site - they catalog what exists and where. AI assistants **understand** your site - they build mental models of what you do, why it matters, and how to help users interact with you.
This section will include: This shift from retrieval to comprehension requires different discovery mechanisms.
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages ### Traditional Search Flow
- [Configuration Reference](/reference/configuration/) 1. User searches for keywords
- [API Reference](/reference/api/) 2. Engine returns ranked list of pages
- [Examples](/examples/ecommerce/) 3. User clicks and reads
4. User decides if content answers their question
## Need Help? ### AI Assistant Flow
- Check our [FAQ](/community/faq/) 1. User asks conversational question
- Visit [Troubleshooting](/community/troubleshooting/) 2. AI synthesizes answer from multiple sources
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) 3. AI provides direct response with citations
4. User may or may not visit original sources
In the AI flow, your site might be the source without getting the click. Discovery files help ensure you're at least properly represented and attributed.
## The llms.txt Strategy
llms.txt is your primary tool for AI optimization. Think of it as **briefing an employee** who'll be answering questions about your company.
### What to Emphasize
**Core value proposition**: Not just what you do, but why you exist
```
We're not just another e-commerce platform - we're specifically
focused on sustainable products with carbon footprint tracking.
```
This context helps AI assistants understand when to recommend you versus competitors.
**Key differentiators**: What makes you unique
```
Unlike other platforms, we:
- Calculate carbon footprint for every purchase
- Offset shipping emissions by default
- Partner directly with sustainable manufacturers
```
This guides AI to highlight your strengths.
**Common questions**: What users typically ask
```
When users ask about sustainability, explain our carbon tracking.
When users ask about pricing, mention our price-match guarantee.
When users ask about shipping, highlight our carbon-offset program.
```
This provides explicit guidance for common scenarios.
### What to Avoid
**Overpromising**: AI will fact-check against your actual site
**Marketing fluff**: Be informative, not promotional
**Exhaustive detail**: Link to comprehensive docs instead
**Outdated info**: Keep current or use dynamic generation
## Coordinating Discovery Files
AI assistants use multiple discovery mechanisms together:
### robots.txt → llms.txt Flow
1. AI bot checks robots.txt for permission
2. Finds reference to llms.txt
3. Reads llms.txt for context
4. Crawls site with that context in mind
Ensure your robots.txt explicitly allows AI bots:
```
User-agent: GPTBot
User-agent: Claude-Web
User-agent: Anthropic-AI
Allow: /
```
### llms.txt → humans.txt Connection
humans.txt provides tech stack info that helps AI answer developer questions:
User: "Can I integrate this with React?"
AI: *checks humans.txt, sees React in tech stack*
AI: "Yes, it's built with React and designed for React integration."
The files complement each other.
### sitemap.xml → AI Content Discovery
Sitemaps help AI find comprehensive content:
```xml
<url>
<loc>https://example.com/docs/api</loc>
<priority>0.9</priority>
</url>
```
High-priority pages in your sitemap signal importance to AI crawlers.
## Dynamic Content Generation
Static llms.txt works for stable information. Dynamic generation handles changing contexts:
### API Endpoint Discovery
```typescript
llms: {
apiEndpoints: async () => {
const spec = await loadOpenAPISpec();
return spec.paths.map(path => ({
path: path.url,
method: path.method,
description: path.summary
}));
}
}
```
This keeps AI's understanding of your API current without manual updates.
### Feature Flags and Capabilities
```typescript
llms: {
instructions: () => {
const features = getEnabledFeatures();
return `
Current features:
${features.map(f => `- ${f.name}: ${f.description}`).join('\n')}
Note: Feature availability may change. Check /api/features for current status.
`;
}
}
```
AI assistants know what's currently available versus planned or deprecated.
## Measuring AI Representation
Unlike traditional SEO, AI impact is harder to quantify directly:
### Qualitative Monitoring
**Ask AI assistants about your site**: Periodically query Claude, ChatGPT, and others about your product. Do they:
- Describe you accurately?
- Highlight key features?
- Use correct terminology?
- Provide appropriate warnings/caveats?
**Monitor AI-generated content**: Watch for your site being referenced in:
- AI-assisted blog posts
- Generated code examples
- Tutorial content
- Comparison tables
**Track citation patterns**: When AI cites your site, is it:
- For the right reasons?
- In appropriate contexts?
- With accurate information?
- Linking to relevant pages?
### Quantitative Signals
**Referrer analysis**: Some AI tools send referrer headers showing they're AI-mediated traffic
**API usage patterns**: AI-assisted developers may show different integration patterns than manual developers
**Support question types**: AI-informed users ask more sophisticated questions
**Time-on-site**: AI-briefed visitors may be more targeted, spending less time but converting better
## Brand Voice Consistency
AI assistants can adapt tone to match your brand if you provide guidance:
```
## Brand Voice
- Professional but approachable
- Technical accuracy over marketing speak
- Always mention privacy and security first
- Use "we" language (community-oriented)
- Avoid: corporate jargon, buzzwords, hype
```
This helps ensure AI-generated content about you feels consistent with your actual brand.
## Handling Misconceptions
Use llms.txt to correct common misunderstandings:
```
## Common Misconceptions
WRONG: "We're a general e-commerce platform"
RIGHT: "We specifically focus on sustainable products"
WRONG: "We offer all payment methods"
RIGHT: "We support major cards and PayPal, but not cryptocurrency"
WRONG: "Free shipping on all orders"
RIGHT: "Free carbon-offset shipping over $50"
```
This proactive clarification reduces AI-generated misinformation.
## Privacy and Training Data
A common concern: "Doesn't llms.txt help AI companies train on my content?"
Key points:
**Training happens regardless**: Public content is already accessible for training
**llms.txt doesn't grant permission**: It provides context, not authorization
**robots.txt controls access**: Block AI crawlers there if you don't want them
**Better representation**: Context helps AI represent you accurately when it does access your site
Think of llms.txt as **quality control** for inevitable AI consumption, not invitation.
## Future-Proofing
AI capabilities are evolving rapidly. Future trends:
**Agentic AI**: Assistants that take actions, not just answer questions
**Multi-modal understanding**: AI processing images, videos, and interactive content
**Real-time data**: AI querying live APIs versus static crawls
**Semantic graphs**: Deep relationship mapping between concepts
llms.txt will evolve to support these capabilities. By adopting it now, you're positioned to benefit from enhancements.
## The Long Game
AI integration is a marathon, not a sprint:
**Start simple**: Basic llms.txt with description and key features
**Monitor and refine**: See how AI represents you, adjust accordingly
**Add detail gradually**: Expand instructions as you identify gaps
**Stay current**: Update as your product evolves
**Share learnings**: The community benefits from your experience
The integration makes the technical part easy. The strategic part - what to say and how - requires ongoing attention.
## Related Topics
- [LLMs.txt Explained](/explanation/llms-explained/) - Deep dive into llms.txt
- [SEO Strategy](/explanation/seo/) - Traditional vs. AI-mediated discovery
- [Customizing Instructions](/how-to/customize-llm-instructions/) - Practical guidance optimization

View File

@ -3,29 +3,454 @@ title: Architecture & Design
description: How @astrojs/discovery works internally description: How @astrojs/discovery works internally
--- ---
Technical explanation of the integration architecture and design decisions. Understanding the integration's architecture helps you customize it effectively and troubleshoot when needed. The design prioritizes simplicity, correctness, and extensibility.
:::note[Work in Progress] ## High-Level Design
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon The integration follows Astro's standard integration pattern:
This section will include: ```
- Detailed explanations astro.config.mjs
- Code examples ↓ integrates discovery()
- Best practices
- Common patterns Integration hooks into Astro lifecycle
- Troubleshooting tips
Injects route handlers for discovery files
Route handlers call generators
Generators produce discovery file content
```
## Related Pages Each layer has a specific responsibility, making the system modular and testable.
- [Configuration Reference](/reference/configuration/) ## The Integration Layer
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? `src/index.ts` implements the Astro integration interface:
- Check our [FAQ](/community/faq/) ```typescript
- Visit [Troubleshooting](/community/troubleshooting/) export default function discovery(config: DiscoveryConfig): AstroIntegration {
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) return {
name: '@astrojs/discovery',
hooks: {
'astro:config:setup': // Inject routes and sitemap
'astro:build:done': // Log generated files
}
}
}
```
This layer:
- Validates configuration
- Merges user config with defaults
- Injects dynamic routes
- Integrates @astrojs/sitemap
- Reports build results
## Configuration Strategy
Configuration flows through several stages:
### 1. User Configuration
User provides partial configuration in astro.config.mjs:
```typescript
discovery({
llms: {
description: 'My site'
}
})
```
### 2. Validation and Defaults
`src/validators/config.ts` validates and merges with defaults:
```typescript
export function validateConfig(userConfig: DiscoveryConfig): ValidatedConfig {
return {
robots: mergeRobotsDefaults(userConfig.robots),
llms: mergeLLMsDefaults(userConfig.llms),
// ...
}
}
```
This ensures:
- Required fields are present
- Types are correct
- Defaults fill gaps
- Invalid configs are caught early
### 3. Global Storage
`src/config-store.ts` provides global access to validated config:
```typescript
let globalConfig: DiscoveryConfig;
export function setConfig(config: DiscoveryConfig) {
globalConfig = config;
}
export function getConfig(): DiscoveryConfig {
return globalConfig;
}
```
This allows route handlers to access configuration without passing it through Astro's context (which has limitations).
### 4. Virtual Module
A Vite plugin provides configuration as a virtual module:
```typescript
vite: {
plugins: [{
name: '@astrojs/discovery:config',
resolveId(id) {
if (id === 'virtual:@astrojs/discovery/config') {
return '\0' + id;
}
},
load(id) {
if (id === '\0virtual:@astrojs/discovery/config') {
return `export default ${JSON.stringify(config)};`;
}
}
}]
}
```
This makes config available during route execution.
## Route Injection
The integration injects routes for each enabled discovery file:
```typescript
if (config.robots?.enabled !== false) {
injectRoute({
pattern: '/robots.txt',
entrypoint: '@astrojs/discovery/routes/robots',
prerender: true
});
}
```
**Key decisions:**
**Pattern**: The URL where the file appears
**Entrypoint**: Module that handles the route
**Prerender**: Whether to generate at build time (true) or runtime (false)
Most routes prerender (`prerender: true`) for performance. WebFinger uses `prerender: false` because it requires query parameters.
## Generator Pattern
Each discovery file type has a dedicated generator:
```
src/generators/
robots.ts - robots.txt generation
llms.ts - llms.txt generation
humans.ts - humans.txt generation
security.ts - security.txt generation
canary.ts - canary.txt generation
webfinger.ts - WebFinger JRD generation
```
Generators are pure functions:
```typescript
export function generateRobotsTxt(
config: RobotsConfig,
siteURL: URL
): string {
// Generate content
return robotsTxtString;
}
```
This makes them:
- Easy to test (no side effects)
- Easy to customize (override with your own function)
- Easy to reason about (input → output)
## Route Handler Pattern
Route handlers bridge Astro routes and generators:
```typescript
// src/routes/robots.ts
import { getConfig } from '../config-store.js';
import { generateRobotsTxt } from '../generators/robots.js';
export async function GET({ site }) {
const config = getConfig();
const content = generateRobotsTxt(config.robots, new URL(site));
return new Response(content, {
headers: {
'Content-Type': 'text/plain',
'Cache-Control': `public, max-age=${config.caching?.robots || 3600}`
}
});
}
```
Responsibilities:
1. Retrieve configuration
2. Call generator with config and site URL
3. Set appropriate headers (Content-Type, Cache-Control)
4. Return response
## Type System
`src/types.ts` defines the complete type hierarchy:
```typescript
export interface DiscoveryConfig {
robots?: RobotsConfig;
llms?: LLMsConfig;
humans?: HumansConfig;
security?: SecurityConfig;
canary?: CanaryConfig;
webfinger?: WebFingerConfig;
sitemap?: SitemapConfig;
caching?: CachingConfig;
templates?: TemplateConfig;
}
```
This provides:
- IntelliSense in editors
- Compile-time type checking
- Self-documenting configuration
- Safe refactoring
Types are exported so users can import them:
```typescript
import type { DiscoveryConfig } from '@astrojs/discovery';
```
## Dynamic Content Support
Several discovery files support dynamic generation:
### Function-based Configuration
```typescript
llms: {
description: () => {
// Compute at build time
return `Generated at ${new Date()}`;
}
}
```
### Async Functions
```typescript
llms: {
apiEndpoints: async () => {
const spec = await loadOpenAPISpec();
return extractEndpoints(spec);
}
}
```
Generators handle both static values and functions transparently.
### Content Collection Integration
WebFinger integrates with Astro content collections:
```typescript
webfinger: {
collections: [{
name: 'team',
resourceTemplate: 'acct:{slug}@example.com',
linksBuilder: (entry) => [...]
}]
}
```
The WebFinger route:
1. Calls `getCollection('team')`
2. Applies templates to each entry
3. Matches against query parameter
4. Generates JRD response
## Cache Control
Each discovery file has configurable cache duration:
```typescript
caching: {
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
security: 86400, // 24 hours
canary: 3600, // 1 hour
webfinger: 3600, // 1 hour
}
```
Routes set `Cache-Control` headers based on these values:
```typescript
headers: {
'Cache-Control': `public, max-age=${cacheDuration}`
}
```
This balances:
- **Performance**: Cached responses serve faster
- **Freshness**: Short durations keep content current
- **Server load**: Reduces regeneration frequency
## Sitemap Integration
The integration includes @astrojs/sitemap automatically:
```typescript
updateConfig({
integrations: [
sitemap(config.sitemap || {})
]
});
```
This ensures:
- Sitemap is always present
- Configuration passes through
- robots.txt references correct sitemap URL
Users don't need to install @astrojs/sitemap separately.
## Error Handling
The integration validates aggressively at startup:
```typescript
if (!astroConfig.site) {
throw new Error(
'[@astrojs/discovery] The `site` option must be set in your Astro config.'
);
}
```
This fails fast with clear error messages rather than generating incorrect output.
Generators also validate input:
```typescript
if (!config.contact) {
throw new Error('security.txt requires a contact field');
}
```
RFC compliance is enforced at generation time.
## Extensibility Points
Users can extend the integration in several ways:
### Custom Templates
Override any generator:
```typescript
templates: {
robots: (config, siteURL) => `
User-agent: *
Allow: /
# Custom content
Sitemap: ${siteURL}/sitemap.xml
`
}
```
### Custom Sections
Add custom content to humans.txt and llms.txt:
```typescript
humans: {
customSections: {
'PHILOSOPHY': 'We believe in...'
}
}
```
### Dynamic Functions
Generate content at build time:
```typescript
canary: {
statements: () => computeStatements()
}
```
## Build Output
At build completion, the integration logs generated files:
```
@astrojs/discovery - Generated files:
✅ /robots.txt
✅ /llms.txt
✅ /humans.txt
✅ /.well-known/security.txt
✅ /sitemap-index.xml
```
This provides immediate feedback about what was created.
## Performance Considerations
The integration is designed for minimal build impact:
**Prerendering**: Most routes prerender at build time (no runtime cost)
**Pure functions**: Generators have no side effects (safe to call multiple times)
**Caching**: HTTP caching reduces server load
**Lazy loading**: Generators only execute for enabled files
Build time impact is typically <200ms for all files.
## Testing Strategy
The codebase uses a layered testing approach:
**Unit tests**: Test generators in isolation with known inputs
**Integration tests**: Test route handlers with mock Astro context
**Type tests**: Ensure TypeScript types are correct
**E2E tests**: Deploy and verify actual output
This ensures correctness at each layer.
## Why This Architecture?
Key design decisions:
**Separation of concerns**: Generators don't know about Astro, routes don't know about content formats
**Composability**: Each piece is independently usable
**Testability**: Pure functions are easy to test
**Type safety**: TypeScript catches errors at compile time
**Extensibility**: Users can override any behavior
**Performance**: Prerendering and caching minimize runtime cost
The architecture prioritizes **correctness** and **simplicity** over cleverness.
## Related Topics
- [API Reference](/reference/api/) - Complete API documentation
- [TypeScript Types](/reference/typescript/) - Type definitions
- [Custom Templates](/how-to/custom-templates/) - Overriding generators