Merge tutorial documentation

Complete learning-oriented tutorials and getting started guides
- 9 tutorial pages (8,816 words)
- Step-by-step hands-on guides
- Progressive complexity with verification steps
- Real-world examples
This commit is contained in:
Ryan Malloy 2025-11-08 23:40:37 -07:00
commit 192ce8194f
27 changed files with 8114 additions and 505 deletions

View File

@ -1,31 +1,264 @@
--- ---
title: AI Assistant Integration title: AI Assistant Integration Strategy
description: How AI assistants use discovery files description: How AI assistants use discovery files and how to optimize for them
--- ---
Learn how AI assistants discover and use information from your site. The relationship between websites and AI assistants is fundamentally different from traditional search engines. Understanding this difference is key to optimizing your site for AI-mediated discovery.
:::note[Work in Progress] ## Beyond Indexing: AI Understanding
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon Search engines **index** your site - they catalog what exists and where. AI assistants **understand** your site - they build mental models of what you do, why it matters, and how to help users interact with you.
This section will include: This shift from retrieval to comprehension requires different discovery mechanisms.
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages ### Traditional Search Flow
- [Configuration Reference](/reference/configuration/) 1. User searches for keywords
- [API Reference](/reference/api/) 2. Engine returns ranked list of pages
- [Examples](/examples/ecommerce/) 3. User clicks and reads
4. User decides if content answers their question
## Need Help? ### AI Assistant Flow
- Check our [FAQ](/community/faq/) 1. User asks conversational question
- Visit [Troubleshooting](/community/troubleshooting/) 2. AI synthesizes answer from multiple sources
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) 3. AI provides direct response with citations
4. User may or may not visit original sources
In the AI flow, your site might be the source without getting the click. Discovery files help ensure you're at least properly represented and attributed.
## The llms.txt Strategy
llms.txt is your primary tool for AI optimization. Think of it as **briefing an employee** who'll be answering questions about your company.
### What to Emphasize
**Core value proposition**: Not just what you do, but why you exist
```
We're not just another e-commerce platform - we're specifically
focused on sustainable products with carbon footprint tracking.
```
This context helps AI assistants understand when to recommend you versus competitors.
**Key differentiators**: What makes you unique
```
Unlike other platforms, we:
- Calculate carbon footprint for every purchase
- Offset shipping emissions by default
- Partner directly with sustainable manufacturers
```
This guides AI to highlight your strengths.
**Common questions**: What users typically ask
```
When users ask about sustainability, explain our carbon tracking.
When users ask about pricing, mention our price-match guarantee.
When users ask about shipping, highlight our carbon-offset program.
```
This provides explicit guidance for common scenarios.
### What to Avoid
**Overpromising**: AI will fact-check against your actual site
**Marketing fluff**: Be informative, not promotional
**Exhaustive detail**: Link to comprehensive docs instead
**Outdated info**: Keep current or use dynamic generation
## Coordinating Discovery Files
AI assistants use multiple discovery mechanisms together:
### robots.txt → llms.txt Flow
1. AI bot checks robots.txt for permission
2. Finds reference to llms.txt
3. Reads llms.txt for context
4. Crawls site with that context in mind
Ensure your robots.txt explicitly allows AI bots:
```
User-agent: GPTBot
User-agent: Claude-Web
User-agent: Anthropic-AI
Allow: /
```
### llms.txt → humans.txt Connection
humans.txt provides tech stack info that helps AI answer developer questions:
User: "Can I integrate this with React?"
AI: *checks humans.txt, sees React in tech stack*
AI: "Yes, it's built with React and designed for React integration."
The files complement each other.
### sitemap.xml → AI Content Discovery
Sitemaps help AI find comprehensive content:
```xml
<url>
<loc>https://example.com/docs/api</loc>
<priority>0.9</priority>
</url>
```
High-priority pages in your sitemap signal importance to AI crawlers.
## Dynamic Content Generation
Static llms.txt works for stable information. Dynamic generation handles changing contexts:
### API Endpoint Discovery
```typescript
llms: {
apiEndpoints: async () => {
const spec = await loadOpenAPISpec();
return spec.paths.map(path => ({
path: path.url,
method: path.method,
description: path.summary
}));
}
}
```
This keeps AI's understanding of your API current without manual updates.
### Feature Flags and Capabilities
```typescript
llms: {
instructions: () => {
const features = getEnabledFeatures();
return `
Current features:
${features.map(f => `- ${f.name}: ${f.description}`).join('\n')}
Note: Feature availability may change. Check /api/features for current status.
`;
}
}
```
AI assistants know what's currently available versus planned or deprecated.
## Measuring AI Representation
Unlike traditional SEO, AI impact is harder to quantify directly:
### Qualitative Monitoring
**Ask AI assistants about your site**: Periodically query Claude, ChatGPT, and others about your product. Do they:
- Describe you accurately?
- Highlight key features?
- Use correct terminology?
- Provide appropriate warnings/caveats?
**Monitor AI-generated content**: Watch for your site being referenced in:
- AI-assisted blog posts
- Generated code examples
- Tutorial content
- Comparison tables
**Track citation patterns**: When AI cites your site, is it:
- For the right reasons?
- In appropriate contexts?
- With accurate information?
- Linking to relevant pages?
### Quantitative Signals
**Referrer analysis**: Some AI tools send referrer headers showing they're AI-mediated traffic
**API usage patterns**: AI-assisted developers may show different integration patterns than manual developers
**Support question types**: AI-informed users ask more sophisticated questions
**Time-on-site**: AI-briefed visitors may be more targeted, spending less time but converting better
## Brand Voice Consistency
AI assistants can adapt tone to match your brand if you provide guidance:
```
## Brand Voice
- Professional but approachable
- Technical accuracy over marketing speak
- Always mention privacy and security first
- Use "we" language (community-oriented)
- Avoid: corporate jargon, buzzwords, hype
```
This helps ensure AI-generated content about you feels consistent with your actual brand.
## Handling Misconceptions
Use llms.txt to correct common misunderstandings:
```
## Common Misconceptions
WRONG: "We're a general e-commerce platform"
RIGHT: "We specifically focus on sustainable products"
WRONG: "We offer all payment methods"
RIGHT: "We support major cards and PayPal, but not cryptocurrency"
WRONG: "Free shipping on all orders"
RIGHT: "Free carbon-offset shipping over $50"
```
This proactive clarification reduces AI-generated misinformation.
## Privacy and Training Data
A common concern: "Doesn't llms.txt help AI companies train on my content?"
Key points:
**Training happens regardless**: Public content is already accessible for training
**llms.txt doesn't grant permission**: It provides context, not authorization
**robots.txt controls access**: Block AI crawlers there if you don't want them
**Better representation**: Context helps AI represent you accurately when it does access your site
Think of llms.txt as **quality control** for inevitable AI consumption, not invitation.
## Future-Proofing
AI capabilities are evolving rapidly. Future trends:
**Agentic AI**: Assistants that take actions, not just answer questions
**Multi-modal understanding**: AI processing images, videos, and interactive content
**Real-time data**: AI querying live APIs versus static crawls
**Semantic graphs**: Deep relationship mapping between concepts
llms.txt will evolve to support these capabilities. By adopting it now, you're positioned to benefit from enhancements.
## The Long Game
AI integration is a marathon, not a sprint:
**Start simple**: Basic llms.txt with description and key features
**Monitor and refine**: See how AI represents you, adjust accordingly
**Add detail gradually**: Expand instructions as you identify gaps
**Stay current**: Update as your product evolves
**Share learnings**: The community benefits from your experience
The integration makes the technical part easy. The strategic part - what to say and how - requires ongoing attention.
## Related Topics
- [LLMs.txt Explained](/explanation/llms-explained/) - Deep dive into llms.txt
- [SEO Strategy](/explanation/seo/) - Traditional vs. AI-mediated discovery
- [Customizing Instructions](/how-to/customize-llm-instructions/) - Practical guidance optimization

View File

@ -3,29 +3,454 @@ title: Architecture & Design
description: How @astrojs/discovery works internally description: How @astrojs/discovery works internally
--- ---
Technical explanation of the integration architecture and design decisions. Understanding the integration's architecture helps you customize it effectively and troubleshoot when needed. The design prioritizes simplicity, correctness, and extensibility.
:::note[Work in Progress] ## High-Level Design
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon The integration follows Astro's standard integration pattern:
This section will include: ```
- Detailed explanations astro.config.mjs
- Code examples ↓ integrates discovery()
- Best practices
- Common patterns Integration hooks into Astro lifecycle
- Troubleshooting tips
Injects route handlers for discovery files
Route handlers call generators
Generators produce discovery file content
```
## Related Pages Each layer has a specific responsibility, making the system modular and testable.
- [Configuration Reference](/reference/configuration/) ## The Integration Layer
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? `src/index.ts` implements the Astro integration interface:
- Check our [FAQ](/community/faq/) ```typescript
- Visit [Troubleshooting](/community/troubleshooting/) export default function discovery(config: DiscoveryConfig): AstroIntegration {
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) return {
name: '@astrojs/discovery',
hooks: {
'astro:config:setup': // Inject routes and sitemap
'astro:build:done': // Log generated files
}
}
}
```
This layer:
- Validates configuration
- Merges user config with defaults
- Injects dynamic routes
- Integrates @astrojs/sitemap
- Reports build results
## Configuration Strategy
Configuration flows through several stages:
### 1. User Configuration
User provides partial configuration in astro.config.mjs:
```typescript
discovery({
llms: {
description: 'My site'
}
})
```
### 2. Validation and Defaults
`src/validators/config.ts` validates and merges with defaults:
```typescript
export function validateConfig(userConfig: DiscoveryConfig): ValidatedConfig {
return {
robots: mergeRobotsDefaults(userConfig.robots),
llms: mergeLLMsDefaults(userConfig.llms),
// ...
}
}
```
This ensures:
- Required fields are present
- Types are correct
- Defaults fill gaps
- Invalid configs are caught early
### 3. Global Storage
`src/config-store.ts` provides global access to validated config:
```typescript
let globalConfig: DiscoveryConfig;
export function setConfig(config: DiscoveryConfig) {
globalConfig = config;
}
export function getConfig(): DiscoveryConfig {
return globalConfig;
}
```
This allows route handlers to access configuration without passing it through Astro's context (which has limitations).
### 4. Virtual Module
A Vite plugin provides configuration as a virtual module:
```typescript
vite: {
plugins: [{
name: '@astrojs/discovery:config',
resolveId(id) {
if (id === 'virtual:@astrojs/discovery/config') {
return '\0' + id;
}
},
load(id) {
if (id === '\0virtual:@astrojs/discovery/config') {
return `export default ${JSON.stringify(config)};`;
}
}
}]
}
```
This makes config available during route execution.
## Route Injection
The integration injects routes for each enabled discovery file:
```typescript
if (config.robots?.enabled !== false) {
injectRoute({
pattern: '/robots.txt',
entrypoint: '@astrojs/discovery/routes/robots',
prerender: true
});
}
```
**Key decisions:**
**Pattern**: The URL where the file appears
**Entrypoint**: Module that handles the route
**Prerender**: Whether to generate at build time (true) or runtime (false)
Most routes prerender (`prerender: true`) for performance. WebFinger uses `prerender: false` because it requires query parameters.
## Generator Pattern
Each discovery file type has a dedicated generator:
```
src/generators/
robots.ts - robots.txt generation
llms.ts - llms.txt generation
humans.ts - humans.txt generation
security.ts - security.txt generation
canary.ts - canary.txt generation
webfinger.ts - WebFinger JRD generation
```
Generators are pure functions:
```typescript
export function generateRobotsTxt(
config: RobotsConfig,
siteURL: URL
): string {
// Generate content
return robotsTxtString;
}
```
This makes them:
- Easy to test (no side effects)
- Easy to customize (override with your own function)
- Easy to reason about (input → output)
## Route Handler Pattern
Route handlers bridge Astro routes and generators:
```typescript
// src/routes/robots.ts
import { getConfig } from '../config-store.js';
import { generateRobotsTxt } from '../generators/robots.js';
export async function GET({ site }) {
const config = getConfig();
const content = generateRobotsTxt(config.robots, new URL(site));
return new Response(content, {
headers: {
'Content-Type': 'text/plain',
'Cache-Control': `public, max-age=${config.caching?.robots || 3600}`
}
});
}
```
Responsibilities:
1. Retrieve configuration
2. Call generator with config and site URL
3. Set appropriate headers (Content-Type, Cache-Control)
4. Return response
## Type System
`src/types.ts` defines the complete type hierarchy:
```typescript
export interface DiscoveryConfig {
robots?: RobotsConfig;
llms?: LLMsConfig;
humans?: HumansConfig;
security?: SecurityConfig;
canary?: CanaryConfig;
webfinger?: WebFingerConfig;
sitemap?: SitemapConfig;
caching?: CachingConfig;
templates?: TemplateConfig;
}
```
This provides:
- IntelliSense in editors
- Compile-time type checking
- Self-documenting configuration
- Safe refactoring
Types are exported so users can import them:
```typescript
import type { DiscoveryConfig } from '@astrojs/discovery';
```
## Dynamic Content Support
Several discovery files support dynamic generation:
### Function-based Configuration
```typescript
llms: {
description: () => {
// Compute at build time
return `Generated at ${new Date()}`;
}
}
```
### Async Functions
```typescript
llms: {
apiEndpoints: async () => {
const spec = await loadOpenAPISpec();
return extractEndpoints(spec);
}
}
```
Generators handle both static values and functions transparently.
### Content Collection Integration
WebFinger integrates with Astro content collections:
```typescript
webfinger: {
collections: [{
name: 'team',
resourceTemplate: 'acct:{slug}@example.com',
linksBuilder: (entry) => [...]
}]
}
```
The WebFinger route:
1. Calls `getCollection('team')`
2. Applies templates to each entry
3. Matches against query parameter
4. Generates JRD response
## Cache Control
Each discovery file has configurable cache duration:
```typescript
caching: {
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
security: 86400, // 24 hours
canary: 3600, // 1 hour
webfinger: 3600, // 1 hour
}
```
Routes set `Cache-Control` headers based on these values:
```typescript
headers: {
'Cache-Control': `public, max-age=${cacheDuration}`
}
```
This balances:
- **Performance**: Cached responses serve faster
- **Freshness**: Short durations keep content current
- **Server load**: Reduces regeneration frequency
## Sitemap Integration
The integration includes @astrojs/sitemap automatically:
```typescript
updateConfig({
integrations: [
sitemap(config.sitemap || {})
]
});
```
This ensures:
- Sitemap is always present
- Configuration passes through
- robots.txt references correct sitemap URL
Users don't need to install @astrojs/sitemap separately.
## Error Handling
The integration validates aggressively at startup:
```typescript
if (!astroConfig.site) {
throw new Error(
'[@astrojs/discovery] The `site` option must be set in your Astro config.'
);
}
```
This fails fast with clear error messages rather than generating incorrect output.
Generators also validate input:
```typescript
if (!config.contact) {
throw new Error('security.txt requires a contact field');
}
```
RFC compliance is enforced at generation time.
## Extensibility Points
Users can extend the integration in several ways:
### Custom Templates
Override any generator:
```typescript
templates: {
robots: (config, siteURL) => `
User-agent: *
Allow: /
# Custom content
Sitemap: ${siteURL}/sitemap.xml
`
}
```
### Custom Sections
Add custom content to humans.txt and llms.txt:
```typescript
humans: {
customSections: {
'PHILOSOPHY': 'We believe in...'
}
}
```
### Dynamic Functions
Generate content at build time:
```typescript
canary: {
statements: () => computeStatements()
}
```
## Build Output
At build completion, the integration logs generated files:
```
@astrojs/discovery - Generated files:
✅ /robots.txt
✅ /llms.txt
✅ /humans.txt
✅ /.well-known/security.txt
✅ /sitemap-index.xml
```
This provides immediate feedback about what was created.
## Performance Considerations
The integration is designed for minimal build impact:
**Prerendering**: Most routes prerender at build time (no runtime cost)
**Pure functions**: Generators have no side effects (safe to call multiple times)
**Caching**: HTTP caching reduces server load
**Lazy loading**: Generators only execute for enabled files
Build time impact is typically <200ms for all files.
## Testing Strategy
The codebase uses a layered testing approach:
**Unit tests**: Test generators in isolation with known inputs
**Integration tests**: Test route handlers with mock Astro context
**Type tests**: Ensure TypeScript types are correct
**E2E tests**: Deploy and verify actual output
This ensures correctness at each layer.
## Why This Architecture?
Key design decisions:
**Separation of concerns**: Generators don't know about Astro, routes don't know about content formats
**Composability**: Each piece is independently usable
**Testability**: Pure functions are easy to test
**Type safety**: TypeScript catches errors at compile time
**Extensibility**: Users can override any behavior
**Performance**: Prerendering and caching minimize runtime cost
The architecture prioritizes **correctness** and **simplicity** over cleverness.
## Related Topics
- [API Reference](/reference/api/) - Complete API documentation
- [TypeScript Types](/reference/typescript/) - Type definitions
- [Custom Templates](/how-to/custom-templates/) - Overriding generators

View File

@ -1,31 +1,231 @@
--- ---
title: Warrant Canaries title: Warrant Canaries
description: Understanding warrant canaries and transparency description: Understanding warrant canaries and transparency mechanisms
--- ---
Learn how warrant canaries work and their role in organizational transparency. A warrant canary is a method for organizations to communicate the **absence** of secret government orders through regular public statements. The concept comes from the canaries coal miners once carried - their silence indicated danger.
:::note[Work in Progress] ## The Gag Order Problem
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon Certain legal instruments (National Security Letters in the US, similar mechanisms elsewhere) can compel organizations to:
This section will include: 1. Provide user data or access to systems
- Detailed explanations 2. Never disclose that the request was made
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages This creates an information asymmetry - users can't know if their service provider has been compromised by government orders.
- [Configuration Reference](/reference/configuration/) Warrant canaries address this by inverting the communication: instead of saying "we received an order" (which is forbidden), the organization regularly says "we have NOT received an order."
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? If the statement stops or changes, users can infer something happened.
- Check our [FAQ](/community/faq/) ## How It Works
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) A simple canary statement:
```
As of 2024-11-08, Example Corp has NOT received:
- National Security Letters
- FISA court orders
- Gag orders preventing disclosure
- Secret government requests for user data
- Requests to install surveillance capabilities
```
The organization publishes this monthly. Users monitor it. If November's update doesn't appear, or the statements change, users know to investigate.
The canary communicates through **absence** rather than disclosure.
## Legal Theory and Limitations
Warrant canaries operate in a legal gray area. The theory:
- Compelled speech (forcing you to lie) may violate free speech rights
- Choosing to remain silent is protected
- Government can prevent disclosure but cannot compel false statements
This hasn't been extensively tested in court. Canaries are no guarantee, but they provide a transparency mechanism where direct disclosure is prohibited.
Important limitations:
- **No legal precedent**: Courts haven't ruled definitively on validity
- **Jurisdictional differences**: What works in one country may not in another
- **Sophistication of threats**: Adversaries may compel continued updates
- **Interpretation challenges**: Absence could mean many things
Canaries are part of a transparency strategy, not a complete solution.
## What Goes in a Canary
The integration's default statements cover common government data requests:
**National Security Letters (NSLs)**: US administrative subpoenas for subscriber information
**FISA court orders**: Foreign Intelligence Surveillance Act orders
**Gag orders**: Any order preventing disclosure of requests
**Surveillance requests**: Secret requests for user data
**Backdoor requests**: Demands to install surveillance capabilities
You can customize these or add organization-specific concerns.
## Frequency and Expiration
Canaries must update regularly. The frequency determines trust:
**Daily**: Maximum transparency, high maintenance burden
**Weekly**: Good for high-security contexts
**Monthly**: Standard for most organizations
**Quarterly**: Minimum for credibility
**Yearly**: Too infrequent to be meaningful
The integration auto-calculates expiration based on frequency:
- Daily: 2 days
- Weekly: 10 days
- Monthly: 35 days
- Quarterly: 100 days
- Yearly: 380 days
These provide buffer time while ensuring staleness is obvious.
## The Personnel Statement
A sophisticated addition is the personnel statement:
```
Key Personnel Statement: All key personnel with access to
infrastructure remain free and under no duress.
```
This addresses scenarios where individuals are compelled to act under physical threat or coercion.
If personnel are compromised, the statement can be omitted without violating gag orders (since it's not disclosing a government request).
## Verification Mechanisms
Mere publication isn't enough - users need to verify authenticity:
### PGP Signatures
Sign canary.txt with your organization's PGP key:
```
Verification: https://example.com/canary.txt.asc
```
This proves the canary came from you and hasn't been tampered with.
### Blockchain Anchoring
Publish a hash of the canary to a blockchain:
```
Blockchain-Proof: ethereum:0x123...abc:0xdef...789
Blockchain-Timestamp: 2024-11-08T12:00:00Z
```
This creates an immutable, time-stamped record that the canary existed at a specific moment.
Anyone can verify the canary matches the blockchain hash, preventing retroactive alterations.
### Previous Canary Links
Link to the previous canary:
```
Previous-Canary: https://example.com/canary-2024-10.txt
```
This creates a chain of trust. If an attacker compromises your site and tries to backdate canaries, the chain breaks.
## What Absence Means
If a canary stops updating or changes, it doesn't definitively mean government compromise. Possible reasons:
- Organization received a legal order (the intended signal)
- Technical failure prevented update
- Personnel forgot or were unable to update
- Organization shut down or changed practices
- Security incident prevented trusted publication
Users must interpret absence in context. Multiple verification methods help distinguish scenarios.
## Building Trust Over Time
A new canary has limited credibility. Trust builds through:
1. **Consistency**: Regular updates on schedule
2. **Verification**: Multiple cryptographic proofs
3. **Transparency**: Clear explanation of canary purpose and limitations
4. **History**: Years of reliable updates
5. **Community**: External monitoring and verification
Organizations should start canaries early, before they're needed, to build this trust.
## The Integration's Approach
This integration makes canaries accessible:
**Auto-expiration**: Calculated from frequency
**Default statements**: Cover common concerns
**Dynamic generation**: Functions can generate statements at build time
**Verification support**: Links to PGP signatures and blockchain proofs
**Update reminders**: Clear expiration in content
You configure once, the integration handles timing and formatting.
## When to Use Canaries
Canaries make sense for:
- Organizations handling sensitive user data
- Services likely to receive government data requests
- Privacy-focused companies
- Organizations operating in multiple jurisdictions
- Platforms used by activists, journalists, or vulnerable groups
They're less relevant for:
- Personal blogs without user data
- Purely informational sites
- Organizations that can't commit to regular updates
- Contexts where legal risks outweigh benefits
## Practical Considerations
**Update process**: Who's responsible for monthly updates?
**Backup procedures**: What if primary person is unavailable?
**Legal review**: Has counsel approved canary language and process?
**Monitoring**: Who watches for expiration?
**Communication**: How will users be notified of canary changes?
**Contingency**: What's the plan if you must stop publishing?
These operational questions matter as much as the canary itself.
## The Limitations
Canaries are not magic:
- They rely on legal interpretations that haven't been tested
- Sophisticated adversaries may compel continued updates
- Absence is ambiguous - could be many causes
- Only useful for orders that come with gag provisions
- Don't address technical compromises or insider threats
They're one tool in a transparency toolkit, not a complete solution.
## Real-World Examples
**Tech companies**: Some publish annual or quarterly canaries as part of transparency reports
**VPN providers**: Many use canaries to signal absence of data retention orders
**Privacy-focused services**: Canaries are common among services catering to privacy-conscious users
**Open source projects**: Some maintainers publish personal canaries about project compromise
The practice is growing as awareness of surveillance increases.
## Related Topics
- [Security.txt](/explanation/security-explained/) - Complementary transparency for security issues
- [Canary Reference](/reference/canary/) - Complete configuration options
- [Blockchain Verification](/how-to/canary-verification/) - Setting up cryptographic proofs

View File

@ -3,29 +3,306 @@ title: Understanding humans.txt
description: The human side of discovery files description: The human side of discovery files
--- ---
Explore the humans.txt initiative and how it credits the people behind websites. In a web dominated by machine-readable metadata, humans.txt is a delightful rebellion. It's a file written by humans, for humans, about the humans who built the website you're visiting.
:::note[Work in Progress] ## The Initiative
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon humans.txt emerged in 2008 from a simple observation: websites have extensive metadata for machines (robots.txt, sitemaps, structured data) but nothing to credit the people who built them.
This section will include: The initiative proposed a standard format for human-readable credits, transforming the impersonal `/humans.txt` URL into a space for personality, gratitude, and transparency.
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages ## What Makes It Human
- [Configuration Reference](/reference/configuration/) Unlike other discovery files optimized for parsing, humans.txt embraces readability and creativity:
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? ```
/* TEAM */
Developer: Jane Doe
Role: Full-stack wizardry
Location: Portland, OR
Favorite beverage: Cold brew coffee
- Check our [FAQ](/community/faq/) /* THANKS */
- Visit [Troubleshooting](/community/troubleshooting/) - Stack Overflow (for everything)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) - My rubber duck debugging companion
- Coffee, obviously
/* SITE */
Built with: Blood, sweat, and JavaScript
Fun fact: Deployed 47 times before launch
```
Notice the tone - casual, personal, fun. This isn't corporate boilerplate. It's a connection between builders and users.
## Why It Matters
On the surface, humans.txt seems frivolous. Who cares about credits buried in a text file?
But consider the impact:
**Recognition**: Developers, designers, and content creators work in the shadows. Humans.txt brings them into the light.
**Transparency**: Users curious about how your site works can see the tech stack and team behind it.
**Recruitment**: Talented developers browse humans.txt files. Listing your stack and philosophy attracts aligned talent.
**Culture**: A well-crafted humans.txt reveals company culture and values better than any about page.
**Humanity**: In an increasingly automated web, humans.txt reminds us that real people built this.
## The Standard Sections
The initiative proposes several standard sections:
### TEAM
Credits for everyone who contributed:
```
/* TEAM */
Name: Alice Developer
Role: Lead Developer
Contact: alice@example.com
Twitter: @alicedev
From: Brooklyn, NY
```
List everyone - developers, designers, writers, managers. Projects are team efforts.
### THANKS
Acknowledgments for inspiration, tools, and support:
```
/* THANKS */
- The Astro community
- Open-source maintainers everywhere
- Our beta testers
- Late night playlist creators
```
This section humanizes development. We build on the work of others.
### SITE
Technical details about the project:
```
/* SITE */
Last update: 2024-11-08
Language: English / Markdown
Doctype: HTML5
IDE: VS Code with Vim keybindings
Components: Astro, React, TypeScript
Standards: HTML5, CSS3, ES2022
```
This satisfies developer curiosity and provides context for technical decisions.
## Going Beyond the Standard
The beauty of humans.txt is flexibility. Many sites add custom sections:
**STORY**: The origin story of your project
**PHILOSOPHY**: Development principles and values
**FUN FACTS**: Easter eggs and behind-the-scenes details
**COLOPHON**: Typography and design choices
**ERRORS**: Humorous changelog of mistakes
These additions transform humans.txt from credits into narrative.
## The Integration's Approach
This integration generates humans.txt with opinionated defaults but encourages customization:
**Auto-dating**: `lastUpdate: 'auto'` uses current build date
**Flexible structure**: Add any custom sections you want
**Dynamic content**: Generate team lists from content collections
**Rich metadata**: Include social links, locations, and personal touches
The goal is making credits easy enough that you'll actually maintain them.
## Real-World Examples
**Humanstxt.org** (the initiative's site):
```
/* TEAM */
Creator: Abel Cabans
Site: http://abelcabans.com
Twitter: @abelcabans
Location: Sant Cugat del Vallès, Barcelona, Spain
/* THANKS */
- All the people who have contributed
- Spread the word!
/* SITE */
Last update: 2024/01/15
Standards: HTML5, CSS3
Components: Jekyll
Software: TextMate, Git
```
Clean, simple, effective.
**Creative Agency** (fictional but typical):
```
/* TEAM */
Creative Director: Max Wilson
Role: Visionary chaos coordinator
Contact: max@agency.com
Fun fact: Has never missed a deadline (barely)
Designer: Sarah Chen
Role: Pixel perfectionist
Location: San Francisco
Tool of choice: Figma, obviously
Developer: Jordan Lee
Role: Code whisperer
From: Remote (currently Bali)
Coffee order: Oat milk cortado
/* THANKS */
- Our clients for trusting us with their dreams
- The internet for cat videos during crunch time
- Figma for not crashing during presentations
/* STORY */
We started in a garage. Not for dramatic effect - office
space in SF is expensive. Three friends with complementary
skills and a shared belief that design should be delightful.
Five years later, we're still in that garage (now with
better chairs). But we've shipped products used by millions
and worked with brands we admired as kids.
We believe in:
- Craftsmanship over shortcuts
- Accessibility as a baseline, not a feature
- Open source as community participation
- Making the web more fun
/* SITE */
Built with: Astro, Svelte, TypeScript, TailwindCSS
Deployed on: Cloudflare Pages
Font: Inter (because we're not monsters)
Colors: Custom palette inspired by Bauhaus
Last rewrite: 2024 (the third time's the charm)
```
Notice the personality, the details, the humanity.
## The "Last Update" Decision
The `lastUpdate` field presents a philosophical question: should it reflect content updates or just site updates?
**Content perspective**: Change date when humans.txt content changes
**Site perspective**: Change date when any part of the site deploys
The integration defaults to site perspective (auto-update on every build). This ensures the date always reflects current site state, even if humans.txt content stays static.
But you can override with a specific date if you prefer manual control.
## Social Links and Contact Info
humans.txt is a great place for social links:
```
/* TEAM */
Name: Developer Name
Twitter: @username
GitHub: username
LinkedIn: /in/username
Mastodon: @username@instance.social
```
This provides discoverable contact information without cluttering your UI.
It's particularly valuable for open-source projects where contributors want to connect.
## The Gratitude Practice
Writing a good THANKS section is a gratitude practice. It forces you to acknowledge the shoulders you stand on:
- Which open-source projects made your work possible?
- Who provided feedback, testing, or encouragement?
- What tools, resources, or communities helped you learn?
- Which mistakes taught you valuable lessons?
This reflection benefits you as much as it credits others.
## Humor and Personality
humans.txt invites creativity. Some examples:
```
/* FUN FACTS */
- Entire site built during one caffeinated weekend
- 437 commits with message "fix typo"
- Originally designed in Figma, rebuilt in Sketch, launched from code
- The dog in our 404 page is the CEO's actual dog
- We've used Comic Sans exactly once (regrettably)
```
This personality differentiates you and creates connection.
## When Not to Use Humor
Professional context matters. A bank's humans.txt should be more restrained than a gaming startup's.
Match the tone to your audience and brand. Personality doesn't require jokes.
Simple sincerity works too:
```
/* TEAM */
We're a team of 12 developers across 6 countries
working to make financial services more accessible.
/* THANKS */
To the users who trust us with their financial data -
we take that responsibility seriously every day.
```
## Maintenance Considerations
humans.txt requires maintenance:
- Update when team members change
- Refresh tech stack as you adopt new tools
- Add new thanks as you use new resources
- Keep contact information current
The integration helps by supporting dynamic content:
```typescript
humans: {
team: await getCollection('team'), // Auto-sync with team content
site: {
lastUpdate: 'auto', // Auto-update on each build
techStack: Object.keys(deps) // Extract from package.json
}
}
```
This reduces manual maintenance burden.
## The Browse Experience
Most users never see humans.txt. And that's okay.
The file serves several audiences:
**Curious users**: The 1% who look behind the curtain
**Developers**: Evaluating tech stack for integration or inspiration
**Recruiters**: Understanding team culture and capabilities
**You**: Reflection and gratitude practice during creation
It's not about traffic - it's about transparency and humanity.
## Related Topics
- [Content Collections Integration](/how-to/content-collections/) - Auto-generate team lists
- [Humans.txt Reference](/reference/humans/) - Complete configuration options
- [Examples](/examples/blog/) - See humans.txt in context

View File

@ -1,31 +1,213 @@
--- ---
title: Understanding llms.txt title: Understanding llms.txt
description: What is llms.txt and why it matters description: How AI assistants discover and understand your website
--- ---
Learn about the llms.txt specification and how it helps AI assistants. llms.txt is the newest member of the discovery file family, emerging in response to a fundamental shift in how content is consumed on the web. While search engines index and retrieve, AI language models read, understand, and synthesize.
:::note[Work in Progress] ## Why AI Needs Different Guidance
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon Traditional search engines need to know **what exists and where**. They build indexes mapping keywords to pages.
This section will include: AI assistants need to know **what things mean and how to use them**. They need context, instructions, and understanding of relationships between content.
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Consider the difference:
- [Configuration Reference](/reference/configuration/) **Search engine thinking**: "This page contains the word 'API' and is located at /docs/api"
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? **AI assistant thinking**: "This site offers a REST API at /api/endpoint that requires authentication. When users ask how to integrate, I should explain the auth flow and reference the examples at /docs/examples"
- Check our [FAQ](/community/faq/) llms.txt bridges this gap by providing **semantic context** that goes beyond structural metadata.
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) ## The Information Architecture
llms.txt follows a simple, human-readable structure:
```
# Site Description
> One-line tagline
## Site Information
Basic facts about the site
## For AI Assistants
Instructions and guidelines
## Important Pages
Key resources to know about
## API Endpoints
Available programmatic access
```
This structure mirrors how you'd brief a human assistant about your site. It's not rigid XML or JSON - it's conversational documentation optimized for language model consumption.
## What to Include
The most effective llms.txt files provide:
**Description**: Not just what your site is, but **why it exists**. "E-commerce platform" is weak. "E-commerce platform focused on sustainable products with carbon footprint tracking" gives context.
**Key Features**: The 3-5 things that make your site unique or particularly useful. These help AI assistants understand what problems you solve.
**Important Pages**: Not a sitemap (that's what sitemap.xml is for), but the **handful of pages** that provide disproportionate value. Think: getting started guide, API docs, pricing.
**Instructions**: Specific guidance on how AI should represent your content. This is where you establish voice, correct common misconceptions, and provide task-specific guidance.
**API Endpoints**: If you have programmatic access, describe it. AI assistants can help users integrate with your service if they know endpoints exist.
## The Instruction Set Pattern
The most powerful part of llms.txt is the instructions section. This is where you teach AI assistants how to be helpful about your site.
Effective instructions are:
**Specific**: "When users ask about authentication, explain we use OAuth2 and point them to /docs/auth"
**Actionable**: "Check /api/status before suggesting users try the API"
**Context-aware**: "Remember that we're focused on accessibility - always mention a11y features"
**Preventive**: "We don't offer feature X - suggest alternatives Y or Z instead"
Think of it as training an employee who'll be answering questions about your product. What would you want them to know?
## Brand Voice and Tone
AI assistants can adapt their responses to match your brand if you provide guidance:
```
## Brand Voice
- Professional but approachable
- Technical accuracy over marketing speak
- Always mention open-source nature
- Emphasize privacy and user control
```
This helps ensure AI representations of your site feel consistent with your actual brand identity.
## Tech Stack Transparency
Including your tech stack serves multiple purposes:
1. **Helps AI assistants answer developer questions** ("Can I use this with React?" - "Yes, it's built on React")
2. **Aids troubleshooting** (knowing the framework helps diagnose integration issues)
3. **Attracts contributors** (developers interested in your stack are more likely to contribute)
Be specific but not exhaustive. "Built with Astro, TypeScript, and Tailwind" is better than listing every npm package.
## API Documentation
If your site offers APIs, llms.txt should describe them at a high level:
```
## API Endpoints
- GET /api/products - List all products
Authentication: API key required
Returns: JSON array of product objects
- POST /api/calculate-carbon - Calculate carbon footprint
Authentication: Not required
Accepts: JSON with cart data
Returns: Carbon footprint estimate
```
This isn't meant to replace full API documentation - it's a quick reference so AI assistants know what's possible.
## The Relationship with robots.txt
robots.txt and llms.txt work together:
**robots.txt** says: "AI bots, you can access these paths"
**llms.txt** says: "Here's how to understand what you find there"
The integration coordinates them automatically:
1. robots.txt includes rules for LLM user-agents
2. Those rules reference llms.txt
3. LLM bots follow robots.txt to respect boundaries
4. Then read llms.txt for guidance on content interpretation
## Dynamic vs. Static Content
llms.txt can be either static (same content always) or dynamic (generated at build time):
**Static**: Your site description and brand voice rarely change
**Dynamic**: Current API endpoints, team members, or feature status might update frequently
The integration supports both approaches. You can provide static strings or functions that generate content at build time.
This is particularly useful for:
- Extracting API endpoints from OpenAPI specs
- Listing important pages from content collections
- Keeping tech stack synchronized with package.json
- Generating context from current deployment metadata
## What Not to Include
llms.txt should be concise and focused. Avoid:
**Comprehensive documentation**: Link to it, don't duplicate it
**Entire sitemaps**: That's what sitemap.xml is for
**Legal boilerplate**: Keep it in your terms of service
**Overly specific instructions**: Trust AI to handle common cases
**Marketing copy**: Be informative, not promotional
Think of llms.txt as **strategic context**, not exhaustive documentation.
## Measuring Impact
Unlike traditional SEO, llms.txt impact is harder to measure directly. You won't see "llms.txt traffic" in analytics.
Instead, look for:
- AI assistants correctly representing your product
- Reduction in mischaracterizations or outdated information
- Appropriate use of your APIs by AI-assisted developers
- Consistency in how different AI systems describe your site
The goal is **accurate representation**, not traffic maximization.
## Privacy and Data Concerns
A common concern: "Doesn't llms.txt help AI companies train on my content?"
Important points:
1. **AI training happens regardless** of llms.txt - they crawl public content anyway
2. **llms.txt doesn't grant permission** - it provides context for content they already access
3. **robots.txt controls access** - if you don't want AI crawlers, use robots.txt to block them
4. **llms.txt helps AI represent you accurately** - better context = better representation
Think of it this way: if someone's going to talk about you, would you rather they have accurate information or guess?
## The Evolution of AI Context
llms.txt is a living standard, evolving as AI capabilities grow:
**Current**: Basic site description and instructions
**Near future**: Structured data about capabilities, limitations, and relationships
**Long term**: Semantic graphs of site knowledge and interconnections
By adopting llms.txt now, you're positioning your site to benefit as these capabilities mature.
## Real-World Patterns
**Documentation sites**: Emphasize how to search docs, common pitfalls, and where to find examples
**E-commerce**: Describe product categories, search capabilities, and checkout process
**SaaS products**: Explain core features, authentication, and API availability
**Blogs**: Highlight author expertise, main topics, and content philosophy
The pattern that works best depends on how people use AI to interact with your type of content.
## Related Topics
- [AI Integration Strategy](/explanation/ai-integration/) - Broader AI considerations
- [Robots.txt Coordination](/explanation/robots-explained/) - How robots.txt and llms.txt work together
- [LLMs.txt Reference](/reference/llms/) - Complete configuration options

View File

@ -1,31 +1,182 @@
--- ---
title: Understanding robots.txt title: How robots.txt Works
description: Deep dive into robots.txt and its purpose description: Understanding robots.txt and web crawler communication
--- ---
Comprehensive explanation of robots.txt, its history, and modern usage. Robots.txt is the oldest and most fundamental discovery file on the web. Since 1994, it has served as the **polite agreement** between website owners and automated crawlers about what content can be accessed and how.
:::note[Work in Progress] ## The Gentleman's Agreement
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon robots.txt is not a security mechanism - it's a social contract. It tells crawlers "please don't go here" rather than "you cannot go here." Any crawler can ignore it, and malicious ones often do.
This section will include: This might seem like a weakness, but it's actually a strength. The file works because the overwhelming majority of automated traffic comes from legitimate crawlers (search engines, monitoring tools, archive services) that want to be good citizens of the web.
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Think of it like a "No Trespassing" sign on private property. It won't stop determined intruders, but it clearly communicates boundaries to honest visitors and provides legal/ethical grounds for addressing violations.
- [Configuration Reference](/reference/configuration/) ## What robots.txt Solves
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? Before robots.txt, early search engines would crawl websites aggressively, sometimes overwhelming servers or wasting bandwidth on administrative pages. Website owners had no standard way to communicate crawling preferences.
- Check our [FAQ](/community/faq/) robots.txt provides three critical capabilities:
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) **1. Access Control**: Specify which paths crawlers can and cannot visit
**2. Resource Management**: Set crawl delays to prevent server overload
**3. Signposting**: Point crawlers to important resources like sitemaps
## The User-Agent Model
robots.txt uses a "user-agent" model where rules target specific bots:
```
User-agent: *
Disallow: /admin/
User-agent: GoogleBot
Allow: /api/
```
This allows fine-grained control. You might allow Google to index your API documentation while blocking other crawlers. Or permit archive services to access historical content while disallowing marketing bots.
The `*` wildcard matches all user-agents, providing default rules. Specific user-agents override these defaults for their particular bot.
## The LLM Bot Challenge
The emergence of AI language models created a new category of web consumers. Unlike traditional search engines that index for retrieval, LLMs process content for training data and context.
This raises different concerns:
- Training data usage and attribution
- Content representation accuracy
- Server load from context gathering
- Different resource needs (full pages vs. search snippets)
The integration addresses this by providing dedicated rules for LLM bots (GPTBot, Claude-Web, Anthropic-AI, etc.) while pointing them to llms.txt for additional context.
## Allow vs. Disallow
A common point of confusion is the relationship between Allow and Disallow directives.
**Disallow**: Explicitly forbids access to a path
**Allow**: Creates exceptions to Disallow rules
Consider this example:
```
User-agent: *
Disallow: /admin/
Allow: /admin/public/
```
This says "don't crawl /admin/ except for /admin/public/ which is allowed." The Allow creates a specific exception to the broader Disallow.
Without any rules, everything is implicitly allowed. You don't need `Allow: /` - that's the default state.
## Path Matching
Path patterns in robots.txt support wildcards and prefix matching:
- `/api/` matches `/api/` and everything under it
- `/api/private` matches that specific path
- `*.pdf` matches any URL containing `.pdf`
- `/page$` matches `/page` but not `/page/subpage`
The most specific matching rule wins. If both `/api/` and `/api/public/` have rules for the same user-agent, the longer path takes precedence.
## Crawl-Delay: The Double-Edged Sword
Crawl-delay tells bots to wait between requests:
```
Crawl-delay: 2
```
This means "wait 2 seconds between page requests." It's useful for:
- Protecting servers with limited resources
- Preventing rate limiting from triggering
- Managing bandwidth costs
But there's a trade-off: slower crawling means it takes longer for your content to be indexed. Set it too high and you might delay important updates from appearing in search results.
The integration defaults to 1 second - a balanced compromise between politeness and indexing speed.
## Sitemap Declaration
One of robots.txt's most valuable features is sitemap declaration:
```
Sitemap: https://example.com/sitemap-index.xml
```
This tells crawlers "here's a comprehensive list of all my pages." It's more efficient than discovering pages through link following and ensures crawlers know about pages that might not be linked from elsewhere.
The integration automatically adds your sitemap reference, keeping it synchronized with your Astro site URL.
## Common Mistakes
**Blocking CSS/JS**: Some sites block `/assets/` thinking it saves bandwidth. This prevents search engines from rendering your pages correctly, harming SEO.
**Disallowing Everything**: `Disallow: /` blocks all crawlers completely. This is rarely what you want - even internal tools need access.
**Forgetting About Dynamic Content**: If your search or API routes generate content dynamically, consider whether crawlers should access them.
**Security Through Obscurity**: Don't rely on robots.txt to hide sensitive content. Use proper authentication instead.
## Why Not Just Use Authentication?
You might wonder why we need robots.txt if we can protect content with authentication.
The answer is that most website content should be publicly accessible - that's the point. You want search engines to index your blog, documentation, and product pages.
robots.txt lets you have **public content that crawlers respect** without requiring authentication. It's about communicating intent, not enforcing access control.
## The Integration's Approach
This integration generates robots.txt with opinionated defaults:
- Allow all bots by default (the web works best when discoverable)
- Include LLM-specific bots with llms.txt guidance
- Reference your sitemap automatically
- Set a reasonable 1-second crawl delay
- Provide easy overrides for your specific needs
You can customize any aspect, but the defaults represent best practices for most sites.
## Looking at Real-World Examples
**Wikipedia** (`robots.txt`):
```
User-agent: *
Disallow: /wiki/Special:
Crawl-delay: 1
Sitemap: https://en.wikipedia.org/sitemap.xml
```
Simple and effective. Block special admin pages, allow everything else.
**GitHub** (simplified):
```
User-agent: *
Disallow: /search/
Disallow: */pull/
Allow: */pull$/
```
Notice how they block pull request search but allow individual pull request pages. This prevents crawler loops while keeping content accessible.
## Verification and Testing
After deploying, verify your robots.txt:
1. Visit `yoursite.com/robots.txt` directly
2. Use Google Search Console's robots.txt tester
3. Check specific user-agent rules with online validators
4. Monitor crawler behavior in server logs
The file is cached aggressively by crawlers, so changes may take time to propagate.
## Related Topics
- [SEO Impact](/explanation/seo/) - How robots.txt affects search rankings
- [LLMs.txt Integration](/explanation/llms-explained/) - Connecting bot control with AI guidance
- [Robots.txt Reference](/reference/robots/) - Complete configuration options

View File

@ -1,31 +1,277 @@
--- ---
title: Security.txt Standard (RFC 9116) title: Security.txt Standard (RFC 9116)
description: Understanding the security.txt RFC description: Understanding RFC 9116 and responsible vulnerability disclosure
--- ---
Learn about RFC 9116 and why security.txt is important for responsible disclosure. security.txt, standardized as RFC 9116 in 2022, solves a deceptively simple problem: when a security researcher finds a vulnerability in your website, how do they tell you about it?
:::note[Work in Progress] ## The Responsible Disclosure Problem
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon Before security.txt, researchers faced a frustrating journey:
This section will include: 1. Find vulnerability in example.com
- Detailed explanations 2. Search for security contact information
- Code examples 3. Check footer, about page, contact page
- Best practices 4. Try info@, security@, admin@ email addresses
- Common patterns 5. Hope someone reads it and knows what to do with it
- Troubleshooting tips 6. Wait weeks for response (or get none)
7. Consider public disclosure out of frustration
## Related Pages This process was inefficient for researchers and dangerous for organizations. Vulnerabilities went unreported or were disclosed publicly before fixes could be deployed.
- [Configuration Reference](/reference/configuration/) ## The RFC 9116 Solution
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? RFC 9116 standardizes a machine-readable file at `/.well-known/security.txt` containing:
- Check our [FAQ](/community/faq/) - **Contact**: How to reach your security team (required)
- Visit [Troubleshooting](/community/troubleshooting/) - **Expires**: When this information becomes stale (required)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) - **Canonical**: The authoritative location of this file
- **Encryption**: PGP keys for encrypted communication
- **Acknowledgments**: Hall of fame for researchers
- **Policy**: Your disclosure policy URL
- **Preferred-Languages**: Languages you can handle reports in
- **Hiring**: Security job opportunities
This provides a **standardized, discoverable, machine-readable** security contact mechanism.
## Why .well-known?
The `/.well-known/` directory is an RFC 8615 standard for site-wide metadata. It's where clients expect to find standard configuration files.
By placing security.txt in `/.well-known/security.txt`, the RFC ensures:
- **Consistent location**: No guessing where to find it
- **Standard compliance**: Follows web architecture patterns
- **Tool support**: Security scanners can automatically check for it
The integration generates security.txt at the correct location automatically.
## The Required Fields
RFC 9116 mandates two fields:
### Contact
At least one contact method (email or URL):
```
Contact: mailto:security@example.com
Contact: https://example.com/security-contact
Contact: tel:+1-555-0100
```
Multiple contacts provide redundancy. If one channel fails, researchers have alternatives.
Email addresses automatically get `mailto:` prefixes. URLs should point to security contact forms or issue trackers.
### Expires
An ISO 8601 timestamp indicating when to stop trusting this file:
```
Expires: 2025-12-31T23:59:59Z
```
This is critical - it prevents researchers from reporting to stale contacts that are no longer monitored.
The integration defaults to `expires: 'auto'`, setting expiration to one year from build time. This ensures the field updates on every deployment.
## Optional but Valuable Fields
### Encryption
URLs to PGP public keys for encrypted vulnerability reports:
```
Encryption: https://example.com/pgp-key.txt
Encryption: openpgp4fpr:5F2DE18D3AFE0FD7A1F2F5A3E4562BB79E3B2E80
```
This enables researchers to send sensitive details securely, preventing disclosure to attackers monitoring email.
### Acknowledgments
URL to your security researcher hall of fame:
```
Acknowledgments: https://example.com/security/hall-of-fame
```
Public recognition motivates responsible disclosure. Researchers appreciate being credited for their work.
### Policy
URL to your vulnerability disclosure policy:
```
Policy: https://example.com/security/disclosure-policy
```
This clarifies expectations: response timelines, safe harbor provisions, bug bounty details, and disclosure coordination.
### Preferred-Languages
Languages your security team can handle:
```
Preferred-Languages: en, es, fr
```
This helps international researchers communicate effectively. Use ISO 639-1 language codes.
### Hiring
URL to security job openings:
```
Hiring: https://example.com/careers/security
```
Talented researchers who find vulnerabilities might be hiring prospects. This field provides a connection point.
## The Canonical Field
The Canonical field specifies the authoritative location:
```
Canonical: https://example.com/.well-known/security.txt
```
This matters for:
- **Verification**: Ensures you're reading the correct version
- **Mirrors**: Multiple domains can reference the same canonical file
- **Historical context**: Archives know which version was authoritative
The integration sets this automatically based on your site URL.
## Why Expiration Matters
The Expires field isn't bureaucracy - it's safety.
Consider a scenario:
1. Company sets up security.txt pointing to security@company.com
2. Security team disbands, email is decommissioned
3. Attacker registers security@company.com domain after it expires
4. Researcher reports vulnerability to attacker's email
5. Attacker has vulnerability details before the company does
Expiration prevents this. If security.txt is expired, researchers know not to trust it and must find alternative contact methods.
Best practice: Set expiration to 1 year maximum. The integration's `'auto'` option handles this.
## Security.txt in Practice
A minimal production security.txt:
```
Canonical: https://example.com/.well-known/security.txt
Contact: mailto:security@example.com
Expires: 2025-11-08T00:00:00.000Z
```
A comprehensive implementation:
```
Canonical: https://example.com/.well-known/security.txt
Contact: mailto:security@example.com
Contact: https://example.com/security-report
Expires: 2025-11-08T00:00:00.000Z
Encryption: https://example.com/pgp-key.asc
Acknowledgments: https://example.com/security/researchers
Preferred-Languages: en, de, ja
Policy: https://example.com/security/disclosure
Hiring: https://example.com/careers/security-engineer
```
## Common Mistakes
**Using relative URLs**: All URLs must be absolute (`https://...`)
**Missing mailto: prefix**: Email addresses need `mailto:` - the integration adds this automatically
**Far-future expiration**: Don't set expiration 10 years out. Keep it to 1 year maximum.
**No monitoring**: Set up alerts when security.txt approaches expiration
**Stale contacts**: Verify listed contacts still work
## Building a Disclosure Program
security.txt is the entry point to vulnerability disclosure, but you need supporting infrastructure:
**Monitoring**: Watch the security inbox religiously
**Triage process**: Quick initial response (even if just "we're investigating")
**Fix timeline**: Clear expectations about patch development
**Disclosure coordination**: Work with researcher on public disclosure timing
**Recognition**: Credit researchers in release notes and acknowledgments page
The integration makes the entry point easy. The program around it requires organizational commitment.
## Security Through Transparency
Some organizations hesitate to publish security.txt, fearing it invites attacks.
The reality: security researchers are already looking. security.txt helps them help you.
Without it:
- Vulnerabilities go unreported
- Researchers waste time finding contacts
- Frustration leads to premature public disclosure
- You look unprofessional to security community
With it:
- Clear channel for responsible disclosure
- Faster vulnerability reports
- Better researcher relationships
- Professional security posture
## Verification and Monitoring
After deploying security.txt:
1. Verify it's accessible at `/.well-known/security.txt`
2. Check field formatting with RFC 9116 validators
3. Test contact methods work
4. Set up monitoring for expiration date
5. Create calendar reminder to refresh before expiration
Many organizations set up automated checks that alert if security.txt will expire within 30 days.
## Integration with Bug Bounty Programs
If you run a bug bounty program, reference it in your policy:
```
Policy: https://example.com/bug-bounty
```
This connects researchers to your incentive program immediately.
security.txt and bug bounties work together - the file provides discovery, the program provides incentive structure.
## Legal Considerations
security.txt should coordinate with your legal team's disclosure policy.
Consider including:
- Safe harbor provisions (no legal action against good-faith researchers)
- Scope definition (what systems are in/out of scope)
- Rules of engagement (don't exfiltrate data, etc.)
- Disclosure timeline expectations
These protect both your organization and researchers.
## Related Topics
- [Canary.txt Explained](/explanation/canary-explained/) - Complementary transparency mechanism
- [Security.txt Reference](/reference/security/) - Complete configuration options
- [Security Best Practices](/how-to/environment-config/) - Securing your deployment

View File

@ -1,31 +1,327 @@
--- ---
title: SEO & Discoverability title: SEO & Discoverability
description: How discovery files improve SEO description: How discovery files improve search engine optimization
--- ---
Understand how properly configured discovery files enhance search engine optimization. Discovery files and SEO have a symbiotic relationship. While some files (like humans.txt) don't directly impact rankings, others (robots.txt, sitemaps) are foundational to how search engines understand and index your site.
:::note[Work in Progress] ## Robots.txt: The SEO Foundation
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon robots.txt is one of the first files search engines request. It determines:
This section will include: - Which pages can be crawled and indexed
- Detailed explanations - How aggressively to crawl (via crawl-delay)
- Code examples - Where to find your sitemap
- Best practices - Special instructions for specific bots
- Common patterns
- Troubleshooting tips
## Related Pages ### Crawl Budget Optimization
- [Configuration Reference](/reference/configuration/) Search engines allocate limited resources to each site - your "crawl budget." robots.txt helps you spend it wisely:
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? **Block low-value pages**: Admin sections, search result pages, and duplicate content waste crawl budget
**Allow high-value content**: Ensure important pages are accessible
**Set appropriate crawl-delay**: Balance thorough indexing against server load
- Check our [FAQ](/community/faq/) Example SEO-optimized robots.txt:
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) ```
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /search?
Disallow: /*?sort=*
Disallow: /api/
Crawl-delay: 1
Sitemap: https://example.com/sitemap-index.xml
```
This blocks non-content pages while allowing crawlers to efficiently index your actual content.
### The CSS/JS Trap
A common SEO mistake:
```
# DON'T DO THIS
Disallow: /assets/
Disallow: /*.css
Disallow: /*.js
```
This prevents search engines from fully rendering your pages. Modern SEO requires JavaScript execution for SPAs and interactive content.
The integration doesn't block assets by default - this is intentional and SEO-optimal.
### Sitemap Declaration
The `Sitemap:` directive in robots.txt is critical for SEO. It tells search engines:
- All your pages exist (even if not linked)
- When pages were last modified
- Relative priority of pages
- Alternative language versions
This dramatically improves indexing coverage and freshness.
## Sitemaps: The SEO Roadmap
Sitemaps serve multiple SEO functions:
### Discoverability
Pages not linked from your navigation can still be indexed. This matters for:
- Deep content structures
- Recently published pages not yet linked
- Orphaned pages with valuable content
- Alternative language versions
### Update Frequency
The `<lastmod>` element signals content freshness:
```xml
<url>
<loc>https://example.com/article</loc>
<lastmod>2024-11-08T12:00:00Z</lastmod>
<changefreq>weekly</changefreq>
</url>
```
Search engines prioritize recently updated content. Fresh `lastmod` dates encourage re-crawling.
### Priority Hints
The `<priority>` element suggests relative importance:
```xml
<url>
<loc>https://example.com/important-page</loc>
<priority>0.9</priority>
</url>
<url>
<loc>https://example.com/minor-page</loc>
<priority>0.3</priority>
</url>
```
This is a hint, not a directive. Search engines use it along with other signals.
### International SEO
For multilingual sites, sitemaps declare language alternatives:
```xml
<url>
<loc>https://example.com/page</loc>
<xhtml:link rel="alternate" hreflang="es"
href="https://example.com/es/page"/>
<xhtml:link rel="alternate" hreflang="fr"
href="https://example.com/fr/page"/>
</url>
```
This prevents duplicate content penalties while ensuring all language versions are indexed.
## LLMs.txt: The AI SEO Frontier
Traditional SEO optimizes for search retrieval. llms.txt optimizes for AI representation - the emerging frontier of discoverability.
### AI-Generated Summaries
Search engines increasingly show AI-generated answer boxes. llms.txt helps ensure these summaries:
- Accurately represent your content
- Use your preferred terminology and brand voice
- Highlight your key differentiators
- Link to appropriate pages
### Voice Search Optimization
Voice assistants rely on AI understanding. llms.txt provides:
- Natural language context for your content
- Clarification of ambiguous terms
- Guidance on how to answer user questions
- References to authoritative pages
This improves your chances of being the source for voice search answers.
### Content Attribution
When AI systems reference your content, llms.txt helps ensure:
- Proper context is maintained
- Your brand is correctly associated
- Key features aren't misrepresented
- Updates propagate to AI models
Think of it as structured data for AI agents.
## Humans.txt: The Indirect SEO Value
humans.txt doesn't directly impact rankings, but it supports SEO indirectly:
### Technical Transparency
Developers evaluating integration with your platform check humans.txt for tech stack info. This can lead to:
- Backlinks from integration tutorials
- Technical blog posts mentioning your stack
- Developer community discussions
All of which generate valuable backlinks and traffic.
### Brand Signals
A well-crafted humans.txt signals:
- Active development and maintenance
- Professional operations
- Transparent communication
- Company culture
These contribute to overall site authority and trustworthiness.
## Security.txt: Trust Signals
Security.txt demonstrates professionalism and security-consciousness. While not a ranking factor, it:
- Builds trust with security-conscious users
- Prevents security incidents that could damage SEO (hacked site penalties)
- Shows organizational maturity
- Enables faster vulnerability fixes (preserving site integrity)
Search engines penalize compromised sites heavily. security.txt helps prevent those penalties.
## Integration SEO Benefits
This integration provides several SEO advantages:
### Consistency
All discovery files reference the same site URL from your Astro config. This prevents:
- Mixed http/https signals
- www vs. non-www confusion
- Subdomain inconsistencies
Consistency is an underrated SEO factor.
### Freshness
Auto-generated timestamps keep discovery files fresh:
- Sitemaps show current lastmod dates
- security.txt expiration updates with each build
- canary.txt timestamps reflect current build
Fresh content signals active maintenance.
### Correctness
The integration handles RFC compliance automatically:
- security.txt follows RFC 9116 exactly
- robots.txt uses correct syntax
- Sitemaps follow XML schema
- WebFinger implements RFC 7033
Malformed discovery files can harm SEO. The integration prevents errors.
## Monitoring SEO Impact
Track discovery file effectiveness:
**Google Search Console**:
- Sitemap coverage reports
- Crawl statistics
- Indexing status
- Mobile usability
**Crawl behavior analysis**:
- Server logs showing crawler patterns
- Crawl-delay effectiveness
- Blocked vs. allowed URL ratio
- Time to index new content
**AI representation monitoring**:
- How AI assistants describe your site
- Accuracy of information
- Attribution and links
- Brand voice consistency
## Common SEO Mistakes
### Over-blocking
Blocking too much harms SEO:
```
# Too restrictive
Disallow: /blog/?
Disallow: /products/?
```
This might block legitimate content URLs. Be specific:
```
# Better
Disallow: /blog?*
Disallow: /products?sort=*
```
### Sitemap bloat
Including every URL hurts more than helps:
- Don't include parameter variations
- Skip pagination (keep to representative pages)
- Exclude search result pages
- Filter out duplicate content
Quality over quantity.
### Ignoring crawl errors
Monitor Search Console for:
- 404s in sitemap
- Blocked resources search engines need
- Redirect chains
- Server errors
Fix these promptly - they impact ranking.
### Stale sitemaps
Ensure sitemaps update with your content:
- New pages appear quickly
- Deleted pages are removed
- lastmod timestamps are accurate
- Priority reflects current importance
The integration's automatic generation ensures freshness.
## Future SEO Trends
Discovery files will evolve with search:
**AI-first indexing**: Search engines will increasingly rely on structured context (llms.txt) rather than pure crawling
**Federated discovery**: WebFinger and similar protocols may influence how distributed content is discovered and indexed
**Transparency signals**: Files like security.txt and canary.txt may become trust signals in ranking algorithms
**Structured data expansion**: Discovery files complement schema.org markup as structured communication channels
By implementing comprehensive discovery now, you're positioned for these trends.
## Related Topics
- [Robots.txt Configuration](/reference/robots/) - SEO-optimized robot settings
- [Sitemap Optimization](/how-to/filter-sitemap/) - Filtering for better SEO
- [AI Integration Strategy](/explanation/ai-integration/) - Preparing for AI-first search

View File

@ -1,31 +1,309 @@
--- ---
title: WebFinger Protocol (RFC 7033) title: WebFinger Protocol (RFC 7033)
description: Understanding WebFinger resource discovery description: Understanding WebFinger and federated resource discovery
--- ---
Deep dive into the WebFinger protocol and its role in federated identity. WebFinger (RFC 7033) solves a fundamental problem of the decentralized web: how do you discover information about a resource (person, service, device) when you only have an identifier?
:::note[Work in Progress] ## The Discovery Challenge
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon On centralized platforms, discovery is simple. Twitter knows about @username because it's all in one database. But in decentralized systems (email, federated social networks, distributed identity), there's no central registry.
This section will include: WebFinger provides a standardized way to ask: "Given this identifier (email, account name, URL), what can you tell me about it?"
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages ## The Query Pattern
- [Configuration Reference](/reference/configuration/) WebFinger uses a simple HTTP GET request:
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? ```
GET /.well-known/webfinger?resource=acct:alice@example.com
```
- Check our [FAQ](/community/faq/) This asks: "What do you know about alice@example.com?"
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) The server responds with a JSON Resource Descriptor (JRD) containing links, properties, and metadata about that resource.
## Real-World Use Cases
### ActivityPub / Mastodon
When you follow `@alice@example.com` on Mastodon, your instance:
1. Queries `example.com/.well-known/webfinger?resource=acct:alice@example.com`
2. Gets back Alice's ActivityPub profile URL
3. Fetches her profile and posts from that URL
4. Subscribes to updates
WebFinger is the discovery layer that makes federation work.
### OpenID Connect
OAuth/OpenID providers use WebFinger for issuer discovery:
1. User enters email address
2. Client extracts domain
3. Queries WebFinger for OpenID configuration
4. Discovers authentication endpoints
5. Initiates OAuth flow
This enables "email address as identity" without hardcoding provider lists.
### Contact Discovery
Email clients and contact apps use WebFinger to discover:
- Profile photos and avatars
- Public keys for encryption
- Social media profiles
- Calendar availability
- Preferred contact methods
## The JRD Response Format
A WebFinger response looks like:
```json
{
"subject": "acct:alice@example.com",
"aliases": [
"https://example.com/@alice",
"https://example.com/users/alice"
],
"properties": {
"http://schema.org/name": "Alice Developer"
},
"links": [
{
"rel": "self",
"type": "application/activity+json",
"href": "https://example.com/users/alice"
},
{
"rel": "http://webfinger.net/rel/profile-page",
"type": "text/html",
"href": "https://example.com/@alice"
},
{
"rel": "http://webfinger.net/rel/avatar",
"type": "image/jpeg",
"href": "https://example.com/avatars/alice.jpg"
}
]
}
```
**Subject**: The resource being described (often same as query)
**Aliases**: Alternative identifiers for the same resource
**Properties**: Key-value metadata (property names must be URIs)
**Links**: Related resources with relationship types
## Link Relations
The `rel` field uses standardized link relation types:
**IANA registered**: `self`, `alternate`, `canonical`, etc.
**WebFinger specific**: `http://webfinger.net/rel/profile-page`, etc.
**Custom/domain-specific**: Any URI works
This extensibility allows WebFinger to serve many use cases while remaining standardized.
## Static vs. Dynamic Resources
The integration supports both approaches:
### Static Resources
Define specific resources explicitly:
```typescript
webfinger: {
resources: [
{
resource: 'acct:alice@example.com',
links: [...]
}
]
}
```
Use this for a small, known set of identities.
### Content Collection Integration
Generate resources dynamically from Astro content collections:
```typescript
webfinger: {
collections: [{
name: 'team',
resourceTemplate: 'acct:{slug}@example.com',
linksBuilder: (member) => [...]
}]
}
```
This auto-generates WebFinger responses for all collection entries. Add a team member to your content collection, and they become discoverable via WebFinger automatically.
## Template Variables
Resource and subject templates support variables:
- `{slug}`: Collection entry slug
- `{id}`: Collection entry ID
- `{data.fieldName}`: Any field from entry data
- `{siteURL}`: Your configured site URL
Example:
```typescript
resourceTemplate: 'acct:{data.username}@{siteURL.hostname}'
```
For a team member with `username: 'alice'` on `example.com`, this generates:
`acct:alice@example.com`
## CORS and Security
WebFinger responses include:
```
Access-Control-Allow-Origin: *
```
This is intentional - WebFinger is designed for public discovery. If information shouldn't be public, don't put it in WebFinger.
The protocol assumes:
- Resources are intentionally discoverable
- Information is public or intended for sharing
- Authentication happens at linked resources, not discovery layer
## Rel Filtering
Clients can request specific link types:
```
GET /.well-known/webfinger?resource=acct:alice@example.com&rel=self
```
The server returns only links matching that relation type. This reduces bandwidth and focuses the response.
The integration handles this automatically.
## Why Dynamic Routes
Unlike other discovery files, WebFinger uses a dynamic route (`prerender: false`). This is because:
1. Query parameters determine the response
2. Content collection resources may be numerous
3. Responses are lightweight enough to generate on-demand
Static generation would require pre-rendering every possible query, which is impractical for collections.
## Building for Federation
If you want your site to participate in federated protocols:
**Enable WebFinger**: Makes your users/resources discoverable
**Implement ActivityPub**: Provide the linked profile/actor endpoints
**Support WebFinger lookup**: Allow others to discover your resources
WebFinger is the discovery layer; ActivityPub (or other protocols) provide the functionality.
## Team/Author Discovery
A common pattern for blogs and documentation:
```typescript
webfinger: {
collections: [{
name: 'authors',
resourceTemplate: 'acct:{slug}@myblog.com',
linksBuilder: (author) => [
{
rel: 'http://webfinger.net/rel/profile-page',
href: `https://myblog.com/authors/${author.slug}`,
type: 'text/html'
},
{
rel: 'http://webfinger.net/rel/avatar',
href: author.data.avatar,
type: 'image/jpeg'
}
],
propertiesBuilder: (author) => ({
'http://schema.org/name': author.data.name,
'http://schema.org/email': author.data.email
})
}]
}
```
Now `acct:alice@myblog.com` resolves to Alice's author page, avatar, and contact info.
## Testing WebFinger
After deployment:
1. Query directly: `curl 'https://example.com/.well-known/webfinger?resource=acct:alice@example.com'`
2. Use WebFinger validators/debuggers
3. Test from federated clients (Mastodon, etc.)
4. Verify CORS headers are present
5. Check rel filtering works
## Privacy Considerations
WebFinger makes information **discoverable**. Consider:
- Don't expose private email addresses or contact info
- Limit to intentionally public resources
- Understand that responses are cached
- Remember `Access-Control-Allow-Origin: *` makes responses widely accessible
If information shouldn't be public, don't include it in WebFinger responses.
## Beyond Social Networks
WebFinger isn't just for social media. Other applications:
**Device discovery**: IoT devices announcing capabilities
**Service discovery**: API endpoints and configurations
**Calendar/availability**: Free/busy status and booking links
**Payment addresses**: Cryptocurrency addresses and payment methods
**Professional profiles**: Credentials, certifications, and portfolios
The protocol is general-purpose resource discovery.
## The Integration's Approach
This integration makes WebFinger accessible without boilerplate:
- Auto-generates from content collections
- Handles template variable substitution
- Manages CORS and rel filtering
- Provides type-safe configuration
- Supports both static and dynamic resources
You define the mappings, the integration handles the protocol.
## When to Use WebFinger
Enable WebFinger if:
- You want to participate in federated protocols
- Your site has user profiles or authors
- You're building decentralized services
- You want discoverable team members
- You're implementing OAuth/OpenID
Skip it if:
- Your site is purely informational with no identity component
- You don't want to expose resource discovery
- You're not integrating with federated services
## Related Topics
- [ActivityPub Integration](/how-to/activitypub/) - Building on WebFinger for federation
- [WebFinger Reference](/reference/webfinger/) - Complete configuration options
- [Content Collections](/how-to/content-collections/) - Dynamic resource generation

View File

@ -1,31 +1,130 @@
--- ---
title: Why Use Discovery Files? title: Why Use Discovery Files?
description: Understanding the importance of discovery files description: Understanding the importance of discovery files for modern websites
--- ---
Learn why discovery files are essential for modern websites and their benefits. Discovery files are the polite introduction your website makes to the automated systems that visit it every day. Just as you might put up a sign directing visitors to your front door, these files tell bots, AI assistants, search engines, and other automated systems where to go and what they can do.
:::note[Work in Progress] ## The Discovery Problem
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon Every website faces a fundamental challenge: how do automated systems know what your site contains, where security issues should be reported, or how AI assistants should interact with your content?
This section will include: Without standardized discovery mechanisms, each bot must guess. Search engines might crawl your entire site inefficiently. AI systems might misrepresent your content. Security researchers won't know how to contact you responsibly. Federated services can't find your user profiles.
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Discovery files solve this by providing **machine-readable contracts** that answer specific questions:
- [Configuration Reference](/reference/configuration/) - **robots.txt**: "What can I crawl and where?"
- [API Reference](/reference/api/) - **llms.txt**: "How should AI assistants understand and represent your site?"
- [Examples](/examples/ecommerce/) - **humans.txt**: "Who built this and what technologies were used?"
- **security.txt**: "Where do I report security vulnerabilities?"
- **canary.txt**: "Has your organization received certain legal orders?"
- **webfinger**: "How do I discover user profiles and federated identities?"
## Need Help? ## Why Multiple Files?
- Check our [FAQ](/community/faq/) You might wonder why we need separate files instead of one unified discovery document. The answer lies in **separation of concerns** and **backwards compatibility**.
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) Each file serves a distinct audience and purpose:
- **robots.txt** targets web crawlers and has been the standard since 1994
- **llms.txt** addresses the new reality of AI assistants processing web content
- **humans.txt** provides transparency for developers and users curious about your stack
- **security.txt** (RFC 9116) offers a standardized security contact mechanism
- **canary.txt** enables transparency about legal obligations
- **webfinger** (RFC 7033) enables decentralized resource discovery
Different systems read different files. A search engine ignores humans.txt. A developer looking at your tech stack won't read robots.txt. A security researcher needs security.txt, not your sitemap.
This modularity also means you can adopt discovery files incrementally. Start with robots.txt and sitemap.xml, add llms.txt when you want AI assistance, enable security.txt when you're ready to accept vulnerability reports.
## The Visibility Trade-off
Discovery files involve an important trade-off: **transparency versus obscurity**.
By publishing robots.txt, you tell both polite crawlers and malicious scrapers about your site structure. Security.txt reveals your security team's contact information. Humans.txt exposes your technology stack.
This is deliberate. Discovery files embrace the principle that **security through obscurity is not security**. The benefits of standardized, polite communication with automated systems outweigh the minimal risks of exposing this information.
Consider that:
- Attackers can discover your tech stack through other means (HTTP headers, page analysis, etc.)
- Security.txt makes responsible disclosure easier, reducing time-to-fix for vulnerabilities
- Robots.txt only controls *polite* bots - malicious actors ignore it anyway
- The transparency builds trust with users, developers, and security researchers
## The Evolution of Discovery
Discovery mechanisms have evolved alongside the web itself:
**1994**: robots.txt emerges as an informal standard for crawler communication
**2000s**: Sitemaps become essential for SEO as the web grows exponentially
**2008**: humans.txt proposed to add personality and transparency to websites
**2017**: RFC 9116 standardizes security.txt after years of ad-hoc security contact methods
**2023**: llms.txt proposed as AI assistants become major consumers of web content
**2024**: Warrant canaries and webfinger integration emerge for transparency and federation
Each new discovery file addresses a real need that emerged as the web ecosystem grew. The integration brings them together because **modern websites need to communicate with an increasingly diverse set of automated visitors**.
## Discovery as Infrastructure
Think of discovery files as **critical infrastructure for your website**. They're not optional extras - they're the foundation for how your site interacts with the broader web ecosystem.
Without proper discovery files:
- Search engines may crawl inefficiently, wasting your server resources
- AI assistants may misunderstand your content or ignore important context
- Security researchers may struggle to report vulnerabilities responsibly
- Developers can't easily understand your technical choices
- Federated services can't integrate with your user profiles
With comprehensive discovery:
- You control how bots interact with your site
- AI assistants have proper context for representing your content
- Security issues can be reported through established channels
- Your tech stack and team are properly credited
- Your site integrates seamlessly with federated protocols
## The Cost-Benefit Analysis
Setting up discovery files manually for each project is tedious and error-prone. You need to:
- Remember the correct format for each file type
- Keep URLs and sitemaps synchronized with your site config
- Update expiration dates for security.txt and canary.txt
- Maintain consistency across different discovery mechanisms
- Handle edge cases and RFC compliance
An integration automates all of this, ensuring:
- **Consistency**: All discovery files reference the same site URL
- **Correctness**: RFC compliance is handled automatically
- **Maintenance**: Expiration dates and timestamps update on each build
- **Flexibility**: Configuration changes propagate to all relevant files
- **Best Practices**: Sensible defaults that you can override as needed
The cost is minimal - a single integration in your Astro config. The benefit is comprehensive, standards-compliant discovery across your entire site.
## Looking Forward
As the web continues to evolve, discovery mechanisms will too. We're already seeing:
- AI systems becoming more sophisticated in how they consume web content
- Federated protocols gaining adoption for decentralized social networks
- Increased emphasis on security transparency and responsible disclosure
- Growing need for machine-readable metadata as automation increases
Discovery files aren't a trend - they're fundamental communication protocols that will remain relevant as long as automated systems interact with websites.
By implementing comprehensive discovery now, you're **future-proofing** your site for whatever new automated visitors emerge next.
## Related Topics
- [SEO Implications](/explanation/seo/) - How discovery files affect search rankings
- [AI Integration Strategy](/explanation/ai-integration/) - Making your content AI-friendly
- [Architecture](/explanation/architecture/) - How the integration works internally

View File

@ -3,29 +3,190 @@ title: First Steps
description: Learn the basics of using @astrojs/discovery description: Learn the basics of using @astrojs/discovery
--- ---
This guide covers the fundamental concepts and first steps with @astrojs/discovery. Now that we have @astrojs/discovery installed, let's explore what you've created and understand how it works.
:::note[Work in Progress] ## What You Just Built
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon When you added the discovery integration to your Astro project, you enabled automatic generation of four essential discovery files. Let's see what each one does for your site.
This section will include: ## Step 1: Build Your Site
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages First, let's build the site to generate the discovery files:
- [Configuration Reference](/reference/configuration/) ```bash
- [API Reference](/reference/api/) npm run build
- [Examples](/examples/ecommerce/) ```
## Need Help? You should see output indicating that your site has been built successfully.
- Check our [FAQ](/community/faq/) ## Step 2: Check the Generated Files
Navigate to your `dist` folder. You'll find these new files:
```bash
ls dist/
```
You should see:
- `robots.txt`
- `llms.txt`
- `humans.txt`
- `sitemap-index.xml`
Let's look at each one!
## Step 3: Explore robots.txt
Open `dist/robots.txt` in your text editor:
```bash
cat dist/robots.txt
```
You'll see something like this:
```txt
User-agent: *
Allow: /
# Sitemaps
Sitemap: https://your-site.com/sitemap-index.xml
# LLM-specific resources
User-agent: Anthropic-AI
User-agent: Claude-Web
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: cohere-ai
User-agent: Google-Extended
Allow: /llms.txt
Crawl-delay: 1
```
This file tells search engines and AI bots:
- All bots are allowed to crawl your site (`User-agent: *` with `Allow: /`)
- Where to find your sitemap
- AI bots can access `/llms.txt` for additional context
- To wait 1 second between requests (crawl delay)
## Step 4: Explore llms.txt
Now look at `dist/llms.txt`:
```bash
cat dist/llms.txt
```
You'll see a structured file that helps AI assistants understand your site:
```markdown
# your-site.com
> Site built with Astro
## Site Information
- Name: your-site.com
- URL: https://your-site.com
## For AI Assistants
This site is built with Astro and the @astrojs/discovery integration.
## Tech Stack
### Frontend
- Astro
## Important Pages
- Home: https://your-site.com/
```
This file provides context to AI assistants like Claude, helping them understand and reference your site correctly.
## Step 5: Explore humans.txt
Check `dist/humans.txt`:
```bash
cat dist/humans.txt
```
You'll see credit information:
```txt
/* SITE */
Last update: 2025-01-08
Language: English
Doctype: HTML5
Tech stack: Astro
```
This file credits the humans behind your site and documents your tech stack.
## Step 6: View Your Sitemap
Finally, look at `dist/sitemap-index.xml`:
```bash
cat dist/sitemap-index.xml
```
You'll see an XML file listing all your pages, helping search engines index your site.
## Step 7: Test in Development
Now let's see these files in action during development:
```bash
npm run dev
```
Once your dev server is running, open your browser and visit:
- `http://localhost:4321/robots.txt`
- `http://localhost:4321/llms.txt`
- `http://localhost:4321/humans.txt`
- `http://localhost:4321/sitemap-index.xml`
All these files are served dynamically!
## What You've Learned
You now know:
- How to build your site to generate discovery files
- What each discovery file contains
- How to view the files in both build and dev modes
- The purpose of each file
## Next Steps
Now that you understand the basics, let's customize these files:
- [Basic Setup](/tutorials/basic-setup/) - Learn to customize the integration
- [Configure robots.txt](/tutorials/configure-robots/) - Control bot access
- [Setup llms.txt](/tutorials/setup-llms/) - Provide better AI context
- [Create humans.txt](/tutorials/create-humans/) - Credit your team
## Troubleshooting
### Files Not Showing Up?
Make sure you have the `site` configured in `astro.config.mjs`:
```typescript
export default defineConfig({
site: 'https://your-site.com', // This is required!
integrations: [discovery()]
});
```
### Wrong URLs in Files?
Check that your `site` URL matches your production domain. The integration uses this URL to generate absolute links.
### Need More Help?
- Check the [FAQ](/community/faq/)
- Visit [Troubleshooting](/community/troubleshooting/) - Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)

View File

@ -3,29 +3,382 @@ title: ActivityPub Integration
description: Connect with the Fediverse via WebFinger description: Connect with the Fediverse via WebFinger
--- ---
Set up ActivityPub integration to make your site discoverable on Mastodon and the Fediverse. Enable WebFinger to make your site discoverable on Mastodon and other ActivityPub-compatible services in the Fediverse.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Understanding of ActivityPub and WebFinger protocols
- Knowledge of your site's user or author structure
- ActivityPub server endpoints (or static actor files)
This section will include: ## Basic Static Profile
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Create a single discoverable profile:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
webfinger: {
enabled: true,
resources: [
{
resource: 'acct:yourname@example.com',
subject: 'acct:yourname@example.com',
aliases: [
'https://example.com/@yourname'
],
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: 'https://example.com/@yourname'
},
{
rel: 'self',
type: 'application/activity+json',
href: 'https://example.com/users/yourname'
}
]
}
]
}
})
```
## Need Help? Query: `GET /.well-known/webfinger?resource=acct:yourname@example.com`
- Check our [FAQ](/community/faq/) ## Multiple Authors
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) Enable discovery for all blog authors:
```typescript
discovery({
webfinger: {
enabled: true,
resources: [
{
resource: 'acct:alice@example.com',
links: [
{
rel: 'self',
type: 'application/activity+json',
href: 'https://example.com/users/alice'
},
{
rel: 'http://webfinger.net/rel/profile-page',
href: 'https://example.com/authors/alice'
}
]
},
{
resource: 'acct:bob@example.com',
links: [
{
rel: 'self',
type: 'application/activity+json',
href: 'https://example.com/users/bob'
},
{
rel: 'http://webfinger.net/rel/profile-page',
href: 'https://example.com/authors/bob'
}
]
}
]
}
})
```
## Dynamic Authors from Content Collection
Load authors from Astro content collection:
**Step 1**: Create authors collection:
```typescript
// src/content.config.ts
const authorsCollection = defineCollection({
type: 'data',
schema: z.object({
name: z.string(),
email: z.string().email(),
bio: z.string(),
avatar: z.string().url(),
mastodon: z.string().optional(),
})
});
```
**Step 2**: Add author data:
```yaml
# src/content/authors/alice.yaml
name: Alice Developer
email: alice@example.com
bio: Full-stack developer and writer
avatar: https://example.com/avatars/alice.jpg
mastodon: '@alice@mastodon.social'
```
**Step 3**: Configure WebFinger collection:
```typescript
discovery({
webfinger: {
enabled: true,
collections: [{
name: 'authors',
resourceTemplate: 'acct:{slug}@example.com',
linksBuilder: (author) => [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: `https://example.com/authors/${author.slug}`
},
{
rel: 'http://webfinger.net/rel/avatar',
type: 'image/jpeg',
href: author.data.avatar
},
{
rel: 'self',
type: 'application/activity+json',
href: `https://example.com/users/${author.slug}`
}
],
propertiesBuilder: (author) => ({
'http://schema.org/name': author.data.name,
'http://schema.org/description': author.data.bio
}),
aliasesBuilder: (author) => [
`https://example.com/@${author.slug}`
]
}]
}
})
```
## Create ActivityPub Actor Endpoint
WebFinger discovery requires an ActivityPub actor endpoint. Create it:
```typescript
// src/pages/users/[author].json.ts
import type { APIRoute } from 'astro';
import { getCollection } from 'astro:content';
export async function getStaticPaths() {
const authors = await getCollection('authors');
return authors.map(author => ({
params: { author: author.slug }
}));
}
export const GET: APIRoute = async ({ params, site }) => {
const authors = await getCollection('authors');
const author = authors.find(a => a.slug === params.author);
if (!author) {
return new Response(null, { status: 404 });
}
const actor = {
'@context': [
'https://www.w3.org/ns/activitystreams',
'https://w3id.org/security/v1'
],
'type': 'Person',
'id': `${site}users/${author.slug}`,
'preferredUsername': author.slug,
'name': author.data.name,
'summary': author.data.bio,
'url': `${site}authors/${author.slug}`,
'icon': {
'type': 'Image',
'mediaType': 'image/jpeg',
'url': author.data.avatar
},
'inbox': `${site}users/${author.slug}/inbox`,
'outbox': `${site}users/${author.slug}/outbox`,
'followers': `${site}users/${author.slug}/followers`,
'following': `${site}users/${author.slug}/following`,
};
return new Response(JSON.stringify(actor, null, 2), {
status: 200,
headers: {
'Content-Type': 'application/activity+json'
}
});
};
```
## Link from Mastodon
Users can find your profile on Mastodon:
1. Go to Mastodon search
2. Enter `@yourname@example.com`
3. Mastodon queries WebFinger at your site
4. Gets ActivityPub actor URL
5. Displays profile with follow button
## Add Profile Link in Bio
Link your Mastodon profile:
```typescript
discovery({
webfinger: {
enabled: true,
collections: [{
name: 'authors',
resourceTemplate: 'acct:{slug}@example.com',
linksBuilder: (author) => {
const links = [
{
rel: 'self',
type: 'application/activity+json',
href: `https://example.com/users/${author.slug}`
}
];
// Add Mastodon link if available
if (author.data.mastodon) {
const mastodonUrl = author.data.mastodon.startsWith('http')
? author.data.mastodon
: `https://mastodon.social/${author.data.mastodon}`;
links.push({
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: mastodonUrl
});
}
return links;
}
}]
}
})
```
## Testing WebFinger
Test your WebFinger endpoint:
```bash
# Build the site
npm run build
npm run preview
# Test WebFinger query
curl "http://localhost:4321/.well-known/webfinger?resource=acct:alice@example.com"
```
Expected response:
```json
{
"subject": "acct:alice@example.com",
"aliases": [
"https://example.com/@alice"
],
"links": [
{
"rel": "http://webfinger.net/rel/profile-page",
"type": "text/html",
"href": "https://example.com/authors/alice"
},
{
"rel": "self",
"type": "application/activity+json",
"href": "https://example.com/users/alice"
}
]
}
```
## Test ActivityPub Actor
Verify actor endpoint:
```bash
curl "http://localhost:4321/users/alice" \
-H "Accept: application/activity+json"
```
Should return actor JSON with inbox, outbox, followers, etc.
## Configure CORS
WebFinger requires CORS headers:
The integration automatically adds:
```
Access-Control-Allow-Origin: *
```
For production with an ActivityPub server, configure appropriate CORS in your hosting.
## Implement Full ActivityPub
For complete Fediverse integration:
1. **Implement inbox**: Handle incoming activities (follows, likes, shares)
2. **Implement outbox**: Serve your posts/activities
3. **Generate keypairs**: For signing activities
4. **Handle followers**: Maintain follower/following lists
5. **Send activities**: Notify followers of new posts
This is beyond WebFinger scope. Consider using:
- [Bridgy Fed](https://fed.brid.gy/) for easy federation
- [WriteFreely](https://writefreely.org/) for federated blogging
- [GoToSocial](https://gotosocial.org/) for self-hosted instances
## Expected Result
Your site becomes discoverable in the Fediverse:
1. Users search `@yourname@example.com` on Mastodon
2. Mastodon fetches WebFinger from `/.well-known/webfinger`
3. Gets ActivityPub actor URL
4. Displays your profile
5. Users can follow/interact (if full ActivityPub implemented)
## Alternative Approaches
**Static site**: Use WebFinger for discovery only, point to external Mastodon account.
**Proxy to Mastodon**: WebFinger points to your Mastodon instance.
**Bridgy Fed**: Use Bridgy Fed to handle ActivityPub protocol, just provide WebFinger.
**Full implementation**: Build complete ActivityPub server with inbox/outbox.
## Common Issues
**WebFinger not found**: Ensure `webfinger.enabled: true` and resources/collections configured.
**CORS errors**: Integration adds CORS automatically. Check if hosting overrides headers.
**Actor URL 404**: Create the actor endpoint at the URL specified in WebFinger links.
**Mastodon can't find profile**: Ensure `rel: 'self'` link with `type: 'application/activity+json'` exists.
**Incorrect format**: WebFinger must return valid JRD JSON. Test with curl.
**Case sensitivity**: Resource URIs are case-sensitive. `acct:alice@example.com``acct:Alice@example.com`
## Additional Resources
- [WebFinger RFC 7033](https://datatracker.ietf.org/doc/html/rfc7033)
- [ActivityPub Spec](https://www.w3.org/TR/activitypub/)
- [Mastodon Documentation](https://docs.joinmastodon.org/)
- [Bridgy Fed](https://fed.brid.gy/)

View File

@ -3,29 +3,248 @@ title: Add Team Members
description: Add team member information to humans.txt description: Add team member information to humans.txt
--- ---
Learn how to add team members and collaborators to your humans.txt file. Document your team and contributors in humans.txt for public recognition.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Team member information (names, roles, contact details)
- Permission from team members to share their information
This section will include: ## Add a Single Team Member
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Configure basic team information:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
humans: {
team: [
{
name: 'Jane Developer',
role: 'Lead Developer',
contact: 'jane@example.com'
}
]
}
})
```
## Need Help? ## Add Multiple Team Members
- Check our [FAQ](/community/faq/) Include your full team:
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) ```typescript
discovery({
humans: {
team: [
{
name: 'Jane Developer',
role: 'Lead Developer',
contact: 'jane@example.com',
location: 'San Francisco, CA'
},
{
name: 'John Designer',
role: 'UI/UX Designer',
contact: 'john@example.com',
location: 'New York, NY'
},
{
name: 'Sarah Product',
role: 'Product Manager',
location: 'London, UK'
}
]
}
})
```
## Include Social Media Profiles
Add Twitter and GitHub handles:
```typescript
discovery({
humans: {
team: [
{
name: 'Alex Dev',
role: 'Full Stack Developer',
contact: 'alex@example.com',
twitter: '@alexdev',
github: 'alex-codes'
}
]
}
})
```
## Load from Content Collections
Dynamically generate team list from content:
```typescript
import { getCollection } from 'astro:content';
discovery({
humans: {
team: async () => {
const teamMembers = await getCollection('team');
return teamMembers.map(member => ({
name: member.data.name,
role: member.data.role,
contact: member.data.email,
location: member.data.city,
twitter: member.data.twitter,
github: member.data.github
}));
}
}
})
```
Create a content collection in `src/content/team/`:
```yaml
# src/content/team/jane.yaml
name: Jane Developer
role: Lead Developer
email: jane@example.com
city: San Francisco, CA
twitter: '@janedev'
github: jane-codes
```
## Load from External Source
Fetch team data from your API or database:
```typescript
discovery({
humans: {
team: async () => {
const response = await fetch('https://api.example.com/team');
const teamData = await response.json();
return teamData.members.map(member => ({
name: member.fullName,
role: member.position,
contact: member.publicEmail,
location: member.location
}));
}
}
})
```
## Add Acknowledgments
Thank contributors and inspirations:
```typescript
discovery({
humans: {
team: [/* ... */],
thanks: [
'The Astro team for the amazing framework',
'All our open source contributors',
'Stack Overflow community',
'Our beta testers',
'Coffee and late nights'
]
}
})
```
## Include Project Story
Add context about your project:
```typescript
discovery({
humans: {
team: [/* ... */],
story: `
This project was born from a hackathon in 2024. What started as
a weekend experiment grew into a tool used by thousands. Our team
came together from different timezones and backgrounds, united by
a passion for making the web more discoverable.
`.trim()
}
})
```
## Add Fun Facts
Make it personal:
```typescript
discovery({
humans: {
team: [/* ... */],
funFacts: [
'Built entirely remotely across 4 continents',
'Powered by 1,247 cups of coffee',
'Deployed on a Friday (we live dangerously)',
'First commit was at 2:47 AM',
'Named after a recurring inside joke'
]
}
})
```
## Verify Your Configuration
Build and check the output:
```bash
npm run build
npm run preview
curl http://localhost:4321/humans.txt
```
## Expected Result
Your humans.txt will contain formatted team information:
```
/* TEAM */
Name: Jane Developer
Role: Lead Developer
Contact: jane@example.com
From: San Francisco, CA
Twitter: @janedev
GitHub: jane-codes
Name: John Designer
Role: UI/UX Designer
Contact: john@example.com
From: New York, NY
/* THANKS */
The Astro team for the amazing framework
All our open source contributors
Coffee and late nights
```
## Alternative Approaches
**Privacy-first**: Use team roles without names or contact details for privacy.
**Department-based**: Group team members by department rather than listing individually.
**Rotating spotlight**: Highlight different team members each month using dynamic content.
## Common Issues
**Missing permissions**: Always get consent before publishing personal information.
**Outdated information**: Keep contact details current. Use dynamic loading to stay fresh.
**Too much detail**: Stick to professional information. Avoid personal addresses or phone numbers.
**Special characters**: Use plain ASCII in humans.txt. Avoid emojis unless necessary.

View File

@ -1,31 +1,169 @@
--- ---
title: Block Specific Bots title: Block Specific Bots
description: How to block unwanted bots from crawling your site description: Control which bots can crawl your site using robots.txt rules
--- ---
Learn how to block specific bots or user agents from accessing your site. Block unwanted bots or user agents from accessing specific parts of your site.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Basic familiarity with robots.txt format
- Knowledge of which bot user agents to block
This section will include: ## Block a Single Bot Completely
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages To prevent a specific bot from crawling your entire site:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
robots: {
additionalAgents: [
{
userAgent: 'BadBot',
disallow: ['/']
}
]
}
})
```
## Need Help? This creates a rule that blocks `BadBot` from all pages.
- Check our [FAQ](/community/faq/) ## Block Multiple Bots
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) Add multiple entries to the `additionalAgents` array:
```typescript
discovery({
robots: {
additionalAgents: [
{ userAgent: 'BadBot', disallow: ['/'] },
{ userAgent: 'SpamCrawler', disallow: ['/'] },
{ userAgent: 'AnnoyingBot', disallow: ['/'] }
]
}
})
```
## Block Bots from Specific Paths
Allow a bot access to most content, but block sensitive areas:
```typescript
discovery({
robots: {
additionalAgents: [
{
userAgent: 'PriceBot',
allow: ['/'],
disallow: ['/checkout', '/account', '/api']
}
]
}
})
```
**Order matters**: Specific rules (`/checkout`) should come after general rules (`/`).
## Disable All LLM Bots
To block all AI crawler bots:
```typescript
discovery({
robots: {
llmBots: {
enabled: false
}
}
})
```
This removes the allow rules for Anthropic-AI, Claude-Web, GPTBot, and other LLM crawlers.
## Block Specific LLM Bots
Keep some LLM bots while blocking others:
```typescript
discovery({
robots: {
llmBots: {
enabled: true,
agents: ['Anthropic-AI', 'Claude-Web'] // Only allow these
},
additionalAgents: [
{ userAgent: 'GPTBot', disallow: ['/'] },
{ userAgent: 'Google-Extended', disallow: ['/'] }
]
}
})
```
## Add Custom Rules
For complex scenarios, use `customRules` to add raw robots.txt content:
```typescript
discovery({
robots: {
customRules: `
# Block aggressive crawlers
User-agent: AggressiveBot
Crawl-delay: 30
Disallow: /
# Special rule for search engine
User-agent: Googlebot
Allow: /api/public
Disallow: /api/private
`.trim()
}
})
```
## Verify Your Configuration
After configuration, build your site and check `/robots.txt`:
```bash
npm run build
npm run preview
curl http://localhost:4321/robots.txt
```
Look for your custom agent rules in the output.
## Expected Result
Your robots.txt will contain entries like:
```
User-agent: BadBot
Disallow: /
User-agent: PriceBot
Allow: /
Disallow: /checkout
Disallow: /account
```
Blocked bots should respect these rules and avoid crawling restricted areas.
## Alternative Approaches
**Server-level blocking**: For malicious bots that ignore robots.txt, consider blocking at the server/firewall level.
**User-agent detection**: Implement server-side detection to return 403 Forbidden for specific user agents.
**Rate limiting**: Use crawl delays to slow down aggressive crawlers rather than blocking them completely.
## Common Issues
**Bots ignoring rules**: robots.txt is advisory only. Malicious bots may not respect it.
**Overly broad patterns**: Be specific with disallow paths. `/api` blocks `/api/public` too.
**Typos in user agents**: User agent strings are case-sensitive. Check bot documentation for exact values.

View File

@ -3,29 +3,224 @@ title: Set Cache Headers
description: Configure HTTP caching for discovery files description: Configure HTTP caching for discovery files
--- ---
Optimize cache headers for discovery files to balance freshness and performance. Optimize cache headers for discovery files to balance freshness with server load and client performance.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Understanding of HTTP caching concepts
- Knowledge of your content update frequency
This section will include: ## Set Cache Duration for All Files
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Configure caching in seconds:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
caching: {
robots: 3600, // 1 hour
llms: 3600, // 1 hour
humans: 86400, // 24 hours
security: 86400, // 24 hours
canary: 3600, // 1 hour
webfinger: 3600, // 1 hour
sitemap: 3600 // 1 hour
}
})
```
## Need Help? These values set `Cache-Control: public, max-age=<seconds>` headers.
- Check our [FAQ](/community/faq/) ## Short Cache for Frequently Updated Content
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) Update canary.txt daily? Use short cache:
```typescript
discovery({
caching: {
canary: 1800 // 30 minutes
}
})
```
Bots will check for updates more frequently.
## Long Cache for Static Content
Rarely change humans.txt? Cache longer:
```typescript
discovery({
caching: {
humans: 604800 // 1 week (7 days)
}
})
```
Reduces server load for static content.
## Disable Caching for Development
Different caching for development vs production:
```typescript
discovery({
caching: import.meta.env.PROD
? {
// Production: aggressive caching
robots: 3600,
llms: 3600,
humans: 86400
}
: {
// Development: no caching
robots: 0,
llms: 0,
humans: 0
}
})
```
Zero seconds means no caching (always fresh).
## Match Cache to Update Frequency
Align with your content update schedule:
```typescript
discovery({
caching: {
// Updated hourly via CI/CD
llms: 3600, // 1 hour
// Updated daily
canary: 7200, // 2 hours (some buffer)
// Updated weekly
humans: 86400, // 24 hours
// Rarely changes
robots: 604800, // 1 week
security: 2592000 // 30 days
}
})
```
## Conservative Caching
When in doubt, cache shorter:
```typescript
discovery({
caching: {
robots: 1800, // 30 min
llms: 1800, // 30 min
humans: 3600, // 1 hour
sitemap: 1800 // 30 min
}
})
```
Ensures content stays relatively fresh.
## Aggressive Caching
Optimize for performance when content is stable:
```typescript
discovery({
caching: {
robots: 86400, // 24 hours
llms: 43200, // 12 hours
humans: 604800, // 1 week
security: 2592000, // 30 days
sitemap: 86400 // 24 hours
}
})
```
## Understand Cache Behavior
Different cache durations affect different use cases:
**robots.txt** (crawl bots):
- Short cache (1 hour): Quickly reflect changes to bot permissions
- Long cache (24 hours): Reduce load from frequent bot checks
**llms.txt** (AI assistants):
- Short cache (1 hour): Keep instructions current
- Medium cache (6 hours): Balance freshness and performance
**humans.txt** (curious visitors):
- Long cache (24 hours - 1 week): Team info changes rarely
**security.txt** (security researchers):
- Long cache (24 hours - 30 days): Contact info is stable
**canary.txt** (transparency):
- Short cache (30 min - 1 hour): Must be checked frequently
## Verify Cache Headers
Test with curl:
```bash
npm run build
npm run preview
# Check cache headers
curl -I http://localhost:4321/robots.txt
curl -I http://localhost:4321/llms.txt
curl -I http://localhost:4321/humans.txt
```
Look for `Cache-Control` header in the response:
```
Cache-Control: public, max-age=3600
```
## Expected Result
Browsers and CDNs will cache files according to your settings. Subsequent requests within the cache period will be served from cache, reducing server load.
For a 1-hour cache:
1. First request at 10:00 AM: Server serves fresh content
2. Request at 10:30 AM: Served from cache
3. Request at 11:01 AM: Cache expired, server serves fresh content
## Alternative Approaches
**CDN-level caching**: Configure caching at your CDN (Cloudflare, Fastly) rather than in the integration.
**Surrogate-Control header**: Use `Surrogate-Control` for CDN caching while controlling browser cache separately.
**ETags**: Add ETag support for efficient conditional requests.
**Vary header**: Consider adding `Vary: Accept-Encoding` for compressed responses.
## Common Issues
**Cache too long**: Content changes not reflected quickly. Reduce cache duration.
**Cache too short**: High server load from repeated requests. Increase cache duration.
**No caching in production**: Check if your hosting platform overrides headers.
**Stale content after updates**: Deploy a new version with a build timestamp to bust caches.
**Different behavior in CDN**: CDN may have its own caching rules. Check CDN configuration.
## Cache Duration Guidelines
**Rule of thumb**:
- Update frequency = Daily → Cache 2-6 hours
- Update frequency = Weekly → Cache 12-24 hours
- Update frequency = Monthly → Cache 1-7 days
- Update frequency = Rarely → Cache 7-30 days
**Special cases**:
- Canary.txt: Cache < update frequency (if daily, cache 2-12 hours)
- Security.txt: Cache longer (expires field handles staleness)
- Development: Cache 0 or very short (60 seconds)

View File

@ -3,29 +3,376 @@ title: Use with Content Collections
description: Integrate with Astro content collections description: Integrate with Astro content collections
--- ---
Automatically generate discovery content from your Astro content collections. Automatically generate discovery content from your Astro content collections for dynamic, maintainable configuration.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Astro content collections set up
- Understanding of async configuration functions
This section will include: ## Load Team from Collection
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Create a team content collection and populate humans.txt:
- [Configuration Reference](/reference/configuration/) **Step 1**: Define the collection schema:
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? ```typescript
// src/content.config.ts
import { defineCollection, z } from 'astro:content';
- Check our [FAQ](/community/faq/) const teamCollection = defineCollection({
- Visit [Troubleshooting](/community/troubleshooting/) type: 'data',
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) schema: z.object({
name: z.string(),
role: z.string(),
email: z.string().email(),
location: z.string().optional(),
twitter: z.string().optional(),
github: z.string().optional(),
})
});
export const collections = {
team: teamCollection
};
```
**Step 2**: Add team members:
```yaml
# src/content/team/alice.yaml
name: Alice Johnson
role: Lead Developer
email: alice@example.com
location: San Francisco, CA
github: alice-codes
```
```yaml
# src/content/team/bob.yaml
name: Bob Smith
role: Designer
email: bob@example.com
location: New York, NY
twitter: '@bobdesigns'
```
**Step 3**: Load in discovery config:
```typescript
// astro.config.mjs
import { getCollection } from 'astro:content';
discovery({
humans: {
team: async () => {
const members = await getCollection('team');
return members.map(member => ({
name: member.data.name,
role: member.data.role,
contact: member.data.email,
location: member.data.location,
twitter: member.data.twitter,
github: member.data.github
}));
}
}
})
```
## Generate Important Pages from Docs
List featured documentation pages in llms.txt:
**Step 1**: Add featured flag to doc frontmatter:
```markdown
---
# src/content/docs/getting-started.md
title: Getting Started Guide
description: Quick start guide for new users
featured: true
---
```
**Step 2**: Load featured docs:
```typescript
discovery({
llms: {
importantPages: async () => {
const docs = await getCollection('docs');
return docs
.filter(doc => doc.data.featured)
.sort((a, b) => (a.data.order || 0) - (b.data.order || 0))
.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
}
})
```
## WebFinger from Author Collection
Make blog authors discoverable via WebFinger:
**Step 1**: Define authors collection:
```typescript
// src/content.config.ts
const authorsCollection = defineCollection({
type: 'data',
schema: z.object({
name: z.string(),
email: z.string().email(),
bio: z.string(),
avatar: z.string().url(),
mastodon: z.string().url().optional(),
website: z.string().url().optional()
})
});
```
**Step 2**: Add author data:
```yaml
# src/content/authors/alice.yaml
name: Alice Developer
email: alice@example.com
bio: Full-stack developer and open source enthusiast
avatar: https://example.com/avatars/alice.jpg
mastodon: https://mastodon.social/@alice
website: https://alice.dev
```
**Step 3**: Configure WebFinger:
```typescript
discovery({
webfinger: {
enabled: true,
collections: [{
name: 'authors',
resourceTemplate: 'acct:{slug}@example.com',
linksBuilder: (author) => [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: `https://example.com/authors/${author.slug}`
},
{
rel: 'http://webfinger.net/rel/avatar',
type: 'image/jpeg',
href: author.data.avatar
},
...(author.data.mastodon ? [{
rel: 'self',
type: 'application/activity+json',
href: author.data.mastodon
}] : [])
],
propertiesBuilder: (author) => ({
'http://schema.org/name': author.data.name,
'http://schema.org/description': author.data.bio
})
}]
}
})
```
Query with: `GET /.well-known/webfinger?resource=acct:alice@example.com`
## Load API Endpoints from Spec
Generate API documentation from a collection:
```typescript
// src/content.config.ts
const apiCollection = defineCollection({
type: 'data',
schema: z.object({
path: z.string(),
method: z.enum(['GET', 'POST', 'PUT', 'DELETE', 'PATCH']),
description: z.string(),
public: z.boolean().default(true)
})
});
```
```yaml
# src/content/api/search.yaml
path: /api/search
method: GET
description: Search products by name, category, or tag
public: true
```
```typescript
discovery({
llms: {
apiEndpoints: async () => {
const endpoints = await getCollection('api');
return endpoints
.filter(ep => ep.data.public)
.map(ep => ({
path: ep.data.path,
method: ep.data.method,
description: ep.data.description
}));
}
}
})
```
## Multiple Collections
Combine data from several collections:
```typescript
discovery({
humans: {
team: async () => {
const [coreTeam, contributors] = await Promise.all([
getCollection('team'),
getCollection('contributors')
]);
return [
...coreTeam.map(m => ({ ...m.data, role: `Core - ${m.data.role}` })),
...contributors.map(m => ({ ...m.data, role: `Contributor - ${m.data.role}` }))
];
},
thanks: async () => {
const sponsors = await getCollection('sponsors');
return sponsors.map(s => s.data.name);
}
}
})
```
## Filter and Sort Collections
Control which items are included:
```typescript
discovery({
llms: {
importantPages: async () => {
const allDocs = await getCollection('docs');
return allDocs
// Only published docs
.filter(doc => doc.data.published !== false)
// Only important ones
.filter(doc => doc.data.priority === 'high')
// Sort by custom order
.sort((a, b) => {
const orderA = a.data.order ?? 999;
const orderB = b.data.order ?? 999;
return orderA - orderB;
})
// Map to format
.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
}
})
```
## Localized Content
Support multiple languages:
```typescript
discovery({
llms: {
importantPages: async () => {
const docs = await getCollection('docs');
// Group by language
const enDocs = docs.filter(d => d.slug.startsWith('en/'));
const esDocs = docs.filter(d => d.slug.startsWith('es/'));
// Return English docs, with links to translations
return enDocs.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description,
// Could add: translations: ['/docs/es/...']
}));
}
}
})
```
## Cache Collection Queries
Optimize build performance:
```typescript
// Cache at module level
let cachedTeam = null;
discovery({
humans: {
team: async () => {
if (!cachedTeam) {
const members = await getCollection('team');
cachedTeam = members.map(m => ({
name: m.data.name,
role: m.data.role,
contact: m.data.email
}));
}
return cachedTeam;
}
}
})
```
## Expected Result
Content collections automatically populate discovery files:
**Adding a team member**:
1. Create `src/content/team/new-member.yaml`
2. Run `npm run build`
3. humans.txt includes new member
**Marking a doc as featured**:
1. Add `featured: true` to frontmatter
2. Run `npm run build`
3. llms.txt lists the new important page
## Alternative Approaches
**Static data**: Use plain JavaScript objects when data rarely changes.
**External API**: Fetch from CMS or API during build instead of using collections.
**Hybrid**: Use collections for core data, enhance with API data.
## Common Issues
**Async not awaited**: Ensure you use `async () => {}` and `await getCollection()`.
**Build-time only**: Collections are loaded at build time, not runtime.
**Type errors**: Ensure collection schema matches the data structure you're mapping.
**Missing data**: Check that collection files exist and match the schema.
**Slow builds**: Cache collection queries if used multiple times in config.

View File

@ -3,29 +3,417 @@ title: Custom Templates
description: Create custom templates for discovery files description: Create custom templates for discovery files
--- ---
Override default templates to fully customize the output of discovery files. Override default templates to fully customize the output format of discovery files.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Understanding of the file formats (robots.txt, llms.txt, etc.)
- Knowledge of template function signatures
This section will include: ## Override robots.txt Template
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Complete control over robots.txt output:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
templates: {
robots: (config, siteURL) => {
const lines = [];
## Need Help? // Custom header
lines.push('# Custom robots.txt');
lines.push(`# Site: ${siteURL.hostname}`);
lines.push('# Last generated: ' + new Date().toISOString());
lines.push('');
- Check our [FAQ](/community/faq/) // Default rule
- Visit [Troubleshooting](/community/troubleshooting/) lines.push('User-agent: *');
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) lines.push('Allow: /');
lines.push('');
// Add sitemap
lines.push(`Sitemap: ${new URL('sitemap-index.xml', siteURL).href}`);
return lines.join('\n') + '\n';
}
}
})
```
## Override llms.txt Template
Custom format for AI instructions:
```typescript
discovery({
templates: {
llms: async (config, siteURL) => {
const lines = [];
// Header
lines.push(`=`.repeat(60));
lines.push(`AI ASSISTANT GUIDE FOR ${siteURL.hostname.toUpperCase()}`);
lines.push(`=`.repeat(60));
lines.push('');
// Description
const description = typeof config.description === 'function'
? config.description()
: config.description;
if (description) {
lines.push(description);
lines.push('');
}
// Instructions
if (config.instructions) {
lines.push('IMPORTANT INSTRUCTIONS:');
lines.push(config.instructions);
lines.push('');
}
// API endpoints in custom format
if (config.apiEndpoints && config.apiEndpoints.length > 0) {
lines.push('AVAILABLE APIs:');
config.apiEndpoints.forEach(ep => {
lines.push(` [${ep.method || 'GET'}] ${ep.path}`);
lines.push(` → ${ep.description}`);
});
lines.push('');
}
// Footer
lines.push(`=`.repeat(60));
lines.push(`Generated: ${new Date().toISOString()}`);
return lines.join('\n') + '\n';
}
}
})
```
## Override humans.txt Template
Custom humans.txt format:
```typescript
discovery({
templates: {
humans: (config, siteURL) => {
const lines = [];
lines.push('========================================');
lines.push(' HUMANS BEHIND THE SITE ');
lines.push('========================================');
lines.push('');
// Team in custom format
if (config.team && config.team.length > 0) {
lines.push('OUR TEAM:');
lines.push('');
config.team.forEach((member, i) => {
if (i > 0) lines.push('---');
lines.push(`Name : ${member.name}`);
if (member.role) lines.push(`Role : ${member.role}`);
if (member.contact) lines.push(`Email : ${member.contact}`);
if (member.github) lines.push(`GitHub : https://github.com/${member.github}`);
lines.push('');
});
}
// Stack info
if (config.site?.techStack) {
lines.push('BUILT WITH:');
lines.push(config.site.techStack.join(' | '));
lines.push('');
}
return lines.join('\n') + '\n';
}
}
})
```
## Override security.txt Template
Custom security.txt with additional fields:
```typescript
discovery({
templates: {
security: (config, siteURL) => {
const lines = [];
// Canonical (required by RFC 9116)
const canonical = config.canonical ||
new URL('.well-known/security.txt', siteURL).href;
lines.push(`Canonical: ${canonical}`);
// Contact (required)
const contacts = Array.isArray(config.contact)
? config.contact
: [config.contact];
contacts.forEach(contact => {
const contactValue = contact.includes('@') && !contact.startsWith('mailto:')
? `mailto:${contact}`
: contact;
lines.push(`Contact: ${contactValue}`);
});
// Expires (recommended)
const expires = config.expires === 'auto'
? new Date(Date.now() + 365 * 24 * 60 * 60 * 1000).toISOString()
: config.expires;
if (expires) {
lines.push(`Expires: ${expires}`);
}
// Optional fields
if (config.encryption) {
const encryptions = Array.isArray(config.encryption)
? config.encryption
: [config.encryption];
encryptions.forEach(enc => lines.push(`Encryption: ${enc}`));
}
if (config.policy) {
lines.push(`Policy: ${config.policy}`);
}
if (config.acknowledgments) {
lines.push(`Acknowledgments: ${config.acknowledgments}`);
}
// Add custom comment
lines.push('');
lines.push('# Thank you for helping keep our users safe!');
return lines.join('\n') + '\n';
}
}
})
```
## Override canary.txt Template
Custom warrant canary format:
```typescript
discovery({
templates: {
canary: (config, siteURL) => {
const lines = [];
const today = new Date().toISOString().split('T')[0];
lines.push('=== WARRANT CANARY ===');
lines.push('');
lines.push(`Organization: ${config.organization || siteURL.hostname}`);
lines.push(`Date Issued: ${today}`);
lines.push('');
lines.push('As of this date, we confirm:');
lines.push('');
// List what has NOT been received
const statements = typeof config.statements === 'function'
? config.statements()
: config.statements || [];
statements
.filter(s => !s.received)
.forEach(statement => {
lines.push(`✓ NO ${statement.description} received`);
});
lines.push('');
lines.push('This canary will be updated regularly.');
lines.push('Absence of an update should be considered significant.');
lines.push('');
if (config.verification) {
lines.push(`Verification: ${config.verification}`);
}
return lines.join('\n') + '\n';
}
}
})
```
## Combine Default Generator with Custom Content
Use default generator, add custom content:
```typescript
import { generateRobotsTxt } from '@astrojs/discovery/generators';
discovery({
templates: {
robots: (config, siteURL) => {
// Generate default content
const defaultContent = generateRobotsTxt(config, siteURL);
// Add custom rules
const customRules = `
# Custom section
User-agent: MySpecialBot
Crawl-delay: 20
Allow: /special
# Rate limiting comment
# Please be respectful of our server resources
`.trim();
return defaultContent + '\n\n' + customRules + '\n';
}
}
})
```
## Load Template from File
Keep templates separate:
```typescript
// templates/robots.txt.js
export default (config, siteURL) => {
return `
User-agent: *
Allow: /
Sitemap: ${new URL('sitemap-index.xml', siteURL).href}
`.trim() + '\n';
};
```
```typescript
// astro.config.mjs
import robotsTemplate from './templates/robots.txt.js';
discovery({
templates: {
robots: robotsTemplate
}
})
```
## Conditional Template Logic
Different templates per environment:
```typescript
discovery({
templates: {
llms: import.meta.env.PROD
? (config, siteURL) => {
// Production: detailed guide
return `# Production site guide\n...detailed content...`;
}
: (config, siteURL) => {
// Development: simple warning
return `# Development environment\nThis is a development site.\n`;
}
}
})
```
## Template with External Data
Fetch additional data in template:
```typescript
discovery({
templates: {
llms: async (config, siteURL) => {
// Fetch latest API spec
const response = await fetch('https://api.example.com/openapi.json');
const spec = await response.json();
const lines = [];
lines.push(`# ${siteURL.hostname} API Guide`);
lines.push('');
lines.push('Available endpoints:');
Object.entries(spec.paths).forEach(([path, methods]) => {
Object.keys(methods).forEach(method => {
lines.push(`- ${method.toUpperCase()} ${path}`);
});
});
return lines.join('\n') + '\n';
}
}
})
```
## Verify Custom Templates
Test your templates:
```bash
npm run build
npm run preview
# Check each file
curl http://localhost:4321/robots.txt
curl http://localhost:4321/llms.txt
curl http://localhost:4321/humans.txt
curl http://localhost:4321/.well-known/security.txt
```
Ensure format is correct and content appears as expected.
## Expected Result
Your custom templates completely control output format:
**Custom robots.txt**:
```
# Custom robots.txt
# Site: example.com
# Last generated: 2025-11-08T12:00:00.000Z
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap-index.xml
```
**Custom llms.txt**:
```
============================================================
AI ASSISTANT GUIDE FOR EXAMPLE.COM
============================================================
Your site description here
IMPORTANT INSTRUCTIONS:
...
```
## Alternative Approaches
**Partial overrides**: Extend default generators rather than replacing entirely.
**Post-processing**: Generate default content, then modify it with string manipulation.
**Multiple templates**: Use different templates based on configuration flags.
## Common Issues
**Missing newline at end**: Ensure template returns content ending with `\n`.
**Async templates**: llms.txt template can be async, others are sync. Don't mix.
**Type errors**: Template signature must match: `(config: Config, siteURL: URL) => string`
**Breaking specs**: security.txt and robots.txt have specific formats. Don't break them.
**Config not available**: Only config passed to that section is available. Can't access other sections.

View File

@ -1,31 +1,255 @@
--- ---
title: Customize LLM Instructions title: Customize LLM Instructions
description: Provide specific instructions for AI assistants description: Provide custom instructions for AI assistants using llms.txt
--- ---
Create custom instructions for AI assistants to follow when helping users with your site. Configure how AI assistants interact with your site by customizing instructions in llms.txt.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Understanding of your site's main use cases
- Knowledge of your API endpoints (if applicable)
This section will include: ## Add Basic Instructions
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Provide clear guidance for AI assistants:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
llms: {
description: 'Technical documentation for the Discovery API',
instructions: `
When helping users with this site:
1. Check the documentation before answering
2. Provide code examples when relevant
3. Link to specific documentation pages
4. Use the search API for queries
`.trim()
}
})
```
## Need Help? ## Highlight Key Features
- Check our [FAQ](/community/faq/) Guide AI assistants to important capabilities:
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) ```typescript
discovery({
llms: {
description: 'E-commerce platform for sustainable products',
keyFeatures: [
'Carbon footprint calculator for all products',
'Subscription management with flexible billing',
'AI-powered product recommendations',
'Real-time inventory tracking'
]
}
})
```
## Document Important Pages
Direct AI assistants to critical resources:
```typescript
discovery({
llms: {
importantPages: [
{
name: 'API Documentation',
path: '/docs/api',
description: 'Complete API reference with examples'
},
{
name: 'Getting Started Guide',
path: '/docs/quick-start',
description: 'Step-by-step setup instructions'
},
{
name: 'FAQ',
path: '/help/faq',
description: 'Common questions and solutions'
}
]
}
})
```
## Describe Your APIs
Help AI assistants use your endpoints correctly:
```typescript
discovery({
llms: {
apiEndpoints: [
{
path: '/api/search',
method: 'GET',
description: 'Search products by name, category, or tag'
},
{
path: '/api/products/:id',
method: 'GET',
description: 'Get detailed product information'
},
{
path: '/api/calculate-carbon',
method: 'POST',
description: 'Calculate carbon footprint for a cart'
}
]
}
})
```
## Set Brand Voice Guidelines
Maintain consistent communication style:
```typescript
discovery({
llms: {
brandVoice: [
'Professional yet approachable',
'Focus on sustainability and environmental impact',
'Use concrete examples, not abstract concepts',
'Avoid jargon unless explaining technical features',
'Emphasize long-term value over short-term savings'
]
}
})
```
## Load Content Dynamically
Pull important pages from content collections:
```typescript
import { getCollection } from 'astro:content';
discovery({
llms: {
importantPages: async () => {
const docs = await getCollection('docs');
// Filter to featured pages only
return docs
.filter(doc => doc.data.featured)
.map(doc => ({
name: doc.data.title,
path: `/docs/${doc.slug}`,
description: doc.data.description
}));
}
}
})
```
## Add Custom Sections
Include specialized information:
```typescript
discovery({
llms: {
customSections: {
'Data Privacy': `
We are GDPR compliant. User data is encrypted at rest and in transit.
Data retention policy: 90 days for analytics, 7 years for transactions.
`.trim(),
'Rate Limits': `
API rate limits:
- Authenticated: 1000 requests/hour
- Anonymous: 60 requests/hour
- Burst: 20 requests/second
`.trim(),
'Support Channels': `
For assistance:
- Documentation: https://example.com/docs
- Email: support@example.com (response within 24h)
- Community: https://discord.gg/example
`.trim()
}
}
})
```
## Environment-Specific Instructions
Different instructions for development vs production:
```typescript
discovery({
llms: {
instructions: import.meta.env.PROD
? `Production site - use live API endpoints at https://api.example.com`
: `Development site - API endpoints may be mocked or unavailable`
}
})
```
## Verify Your Configuration
Build and check the output:
```bash
npm run build
npm run preview
curl http://localhost:4321/llms.txt
```
Look for your instructions, features, and API documentation in the formatted output.
## Expected Result
Your llms.txt will contain structured information:
```markdown
# example.com
> E-commerce platform for sustainable products
---
## Key Features
- Carbon footprint calculator for all products
- AI-powered product recommendations
## Instructions for AI Assistants
When helping users with this site:
1. Check the documentation before answering
2. Provide code examples when relevant
## API Endpoints
- `GET /api/search`
Search products by name, category, or tag
Full URL: https://example.com/api/search
```
AI assistants will use this information to provide accurate, context-aware help.
## Alternative Approaches
**Multiple llms.txt files**: Create llms-full.txt for comprehensive docs, llms.txt for summary.
**Dynamic generation**: Use a build script to extract API docs from OpenAPI specs.
**Language-specific versions**: Generate different files for different locales (llms-en.txt, llms-es.txt).
## Common Issues
**Too much information**: Keep it concise. AI assistants prefer focused, actionable guidance.
**Outdated instructions**: Use `lastUpdate: 'auto'` or automate updates from your CMS.
**Missing context**: Don't assume knowledge. Explain domain-specific terms and workflows.
**Unclear priorities**: List most important pages/features first. AI assistants may prioritize early content.

View File

@ -3,29 +3,322 @@ title: Environment-specific Configuration
description: Use different configs for dev and production description: Use different configs for dev and production
--- ---
Configure different settings for development and production environments. Configure different settings for development and production environments to optimize for local testing vs deployed sites.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Understanding of Astro environment variables
- Knowledge of your deployment setup
This section will include: ## Basic Environment Switching
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Use `import.meta.env.PROD` to detect production:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
robots: {
// Block all bots in development
allowAllBots: import.meta.env.PROD
}
})
```
## Need Help? Development: Bots blocked. Production: Bots allowed.
- Check our [FAQ](/community/faq/) ## Different Site URLs
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) Use different domains for staging and production:
```typescript
export default defineConfig({
site: import.meta.env.PROD
? 'https://example.com'
: 'http://localhost:4321',
integrations: [
discovery({
// Config automatically uses correct site URL
})
]
})
```
## Conditional Feature Enablement
Enable security.txt and canary.txt only in production:
```typescript
discovery({
security: import.meta.env.PROD
? {
contact: 'security@example.com',
expires: 'auto'
}
: undefined, // Disabled in development
canary: import.meta.env.PROD
? {
organization: 'Example Corp',
contact: 'canary@example.com',
frequency: 'monthly'
}
: undefined // Disabled in development
})
```
## Environment-Specific Instructions
Different LLM instructions for each environment:
```typescript
discovery({
llms: {
description: import.meta.env.PROD
? 'Production e-commerce platform'
: 'Development/Staging environment - data may be test data',
instructions: import.meta.env.PROD
? `
When helping users:
1. Use production API at https://api.example.com
2. Data is live - be careful with modifications
3. Refer to https://docs.example.com for documentation
`.trim()
: `
Development environment - for testing only:
1. API endpoints may be mocked
2. Database is reset nightly
3. Some features may not work
`.trim()
}
})
```
## Custom Environment Variables
Use `.env` files for configuration:
```bash
# .env.production
PUBLIC_SECURITY_EMAIL=security@example.com
PUBLIC_CANARY_ENABLED=true
PUBLIC_CONTACT_EMAIL=contact@example.com
# .env.development
PUBLIC_SECURITY_EMAIL=dev-security@localhost
PUBLIC_CANARY_ENABLED=false
PUBLIC_CONTACT_EMAIL=dev@localhost
```
Then use in config:
```typescript
discovery({
security: import.meta.env.PUBLIC_CANARY_ENABLED === 'true'
? {
contact: import.meta.env.PUBLIC_SECURITY_EMAIL,
expires: 'auto'
}
: undefined,
humans: {
team: [
{
name: 'Team',
contact: import.meta.env.PUBLIC_CONTACT_EMAIL
}
]
}
})
```
## Staging Environment
Support three environments: dev, staging, production:
```typescript
const ENV = import.meta.env.MODE; // 'development', 'staging', or 'production'
const siteURLs = {
development: 'http://localhost:4321',
staging: 'https://staging.example.com',
production: 'https://example.com'
};
export default defineConfig({
site: siteURLs[ENV],
integrations: [
discovery({
robots: {
// Block bots in dev and staging
allowAllBots: ENV === 'production',
additionalAgents: ENV !== 'production'
? [{ userAgent: '*', disallow: ['/'] }]
: []
},
llms: {
description: ENV === 'production'
? 'Production site'
: `${ENV} environment - not for public use`
}
})
]
})
```
Run with: `astro build --mode staging`
## Different Cache Headers
Aggressive caching in production, none in development:
```typescript
discovery({
caching: import.meta.env.PROD
? {
// Production: cache aggressively
robots: 86400,
llms: 3600,
humans: 604800
}
: {
// Development: no caching
robots: 0,
llms: 0,
humans: 0
}
})
```
## Feature Flags
Use environment variables as feature flags:
```typescript
discovery({
webfinger: {
enabled: import.meta.env.PUBLIC_ENABLE_WEBFINGER === 'true',
resources: [/* ... */]
},
canary: import.meta.env.PUBLIC_ENABLE_CANARY === 'true'
? {
organization: 'Example Corp',
frequency: 'monthly'
}
: undefined
})
```
Set in `.env`:
```bash
PUBLIC_ENABLE_WEBFINGER=false
PUBLIC_ENABLE_CANARY=true
```
## Test vs Production Data
Load different team data per environment:
```typescript
import { getCollection } from 'astro:content';
discovery({
humans: {
team: import.meta.env.PROD
? await getCollection('team') // Real team
: [
{
name: 'Test Developer',
role: 'Developer',
contact: 'test@localhost'
}
]
}
})
```
## Preview Deployments
Handle preview/branch deployments:
```typescript
const isPreview = import.meta.env.PREVIEW === 'true';
const isProd = import.meta.env.PROD && !isPreview;
discovery({
robots: {
allowAllBots: isProd, // Block on previews too
additionalAgents: !isProd
? [
{
userAgent: '*',
disallow: ['/']
}
]
: []
}
})
```
## Verify Environment Config
Test each environment:
```bash
# Development
npm run dev
curl http://localhost:4321/robots.txt
# Production build
npm run build
npm run preview
curl http://localhost:4321/robots.txt
# Staging (if configured)
astro build --mode staging
```
Check that content differs appropriately.
## Expected Result
Each environment produces appropriate output:
**Development** - Block all:
```
User-agent: *
Disallow: /
```
**Production** - Allow bots:
```
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap-index.xml
```
## Alternative Approaches
**Config files per environment**: Create `astro.config.dev.mjs` and `astro.config.prod.mjs`.
**Build-time injection**: Use build tools to inject environment-specific values.
**Runtime checks**: For SSR sites, check headers or hostname at runtime.
## Common Issues
**Environment variables not available**: Ensure variables are prefixed with `PUBLIC_` for client access.
**Wrong environment detected**: `import.meta.env.PROD` is true for production builds, not preview.
**Undefined values**: Provide fallbacks for missing environment variables.
**Inconsistent builds**: Document which environment variables affect the build for reproducibility.

View File

@ -3,29 +3,240 @@ title: Filter Sitemap Pages
description: Control which pages appear in your sitemap description: Control which pages appear in your sitemap
--- ---
Configure filtering to control which pages are included in your sitemap. Exclude pages from your sitemap to keep it focused on publicly accessible, valuable content.
:::note[Work in Progress] ## Prerequisites
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon - Integration installed and configured
- Understanding of which pages should be public
- Knowledge of your site's URL structure
This section will include: ## Exclude Admin Pages
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Block administrative and dashboard pages:
- [Configuration Reference](/reference/configuration/) ```typescript
- [API Reference](/reference/api/) // astro.config.mjs
- [Examples](/examples/ecommerce/) discovery({
sitemap: {
filter: (page) => !page.includes('/admin')
}
})
```
## Need Help? This removes all URLs containing `/admin` from the sitemap.
- Check our [FAQ](/community/faq/) ## Exclude Multiple Path Patterns
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) Filter out several types of pages:
```typescript
discovery({
sitemap: {
filter: (page) => {
return !page.includes('/admin') &&
!page.includes('/draft') &&
!page.includes('/private') &&
!page.includes('/test');
}
}
})
```
## Exclude by File Extension
Remove API endpoints or non-HTML pages:
```typescript
discovery({
sitemap: {
filter: (page) => {
return !page.endsWith('.json') &&
!page.endsWith('.xml') &&
!page.includes('/api/');
}
}
})
```
## Include Only Specific Directories
Allow only documentation and blog posts:
```typescript
discovery({
sitemap: {
filter: (page) => {
const url = new URL(page);
const path = url.pathname;
return path.startsWith('/docs/') ||
path.startsWith('/blog/') ||
path === '/';
}
}
})
```
## Exclude by Environment
Different filtering for development vs production:
```typescript
discovery({
sitemap: {
filter: (page) => {
// Exclude drafts in production
if (import.meta.env.PROD && page.includes('/draft')) {
return false;
}
// Exclude test pages in production
if (import.meta.env.PROD && page.includes('/test')) {
return false;
}
return true;
}
}
})
```
## Filter Based on Page Metadata
Use frontmatter or metadata to control inclusion:
```typescript
discovery({
sitemap: {
serialize: (item) => {
// Exclude pages marked as noindex
// Note: You'd need to access page metadata here
// This is a simplified example
return item;
},
filter: (page) => {
// Basic path-based filtering
return !page.includes('/internal/');
}
}
})
```
## Combine with Custom Pages
Add non-generated pages while filtering others:
```typescript
discovery({
sitemap: {
filter: (page) => !page.includes('/admin'),
customPages: [
'https://example.com/special-page',
'https://example.com/external-content'
]
}
})
```
## Use Regular Expressions
Advanced pattern matching:
```typescript
discovery({
sitemap: {
filter: (page) => {
// Exclude pages with query parameters
if (page.includes('?')) return false;
// Exclude paginated pages except first page
if (/\/page\/\d+/.test(page)) return false;
// Exclude temp or staging paths
if (/\/(temp|staging|wip)\//.test(page)) return false;
return true;
}
}
})
```
## Filter User-Generated Content
Exclude user profiles or dynamic content:
```typescript
discovery({
sitemap: {
filter: (page) => {
// Include main user directory page
if (page === '/users' || page === '/users/') return true;
// Exclude individual user pages
if (page.startsWith('/users/')) return false;
// Exclude comment threads
if (page.includes('/comments/')) return false;
return true;
}
}
})
```
## Verify Your Filter
Test your filter logic:
```bash
npm run build
npm run preview
# Check sitemap
curl http://localhost:4321/sitemap-index.xml
# Look for excluded pages (should not appear)
curl http://localhost:4321/sitemap-0.xml | grep '/admin'
```
If grep returns nothing, your filter is working.
## Expected Result
Your sitemap will only contain allowed pages. Excluded pages won't appear:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
</url>
<url>
<loc>https://example.com/blog/post-1</loc>
</url>
<!-- No /admin, /draft, or /private pages -->
</urlset>
```
## Alternative Approaches
**robots.txt blocking**: Block crawling entirely using robots.txt instead of just omitting from sitemap.
**Meta robots tag**: Add `<meta name="robots" content="noindex">` to pages you want excluded.
**Separate sitemaps**: Create multiple sitemap files for different sections, only submit public ones.
**Dynamic generation**: Generate sitemaps at runtime based on user permissions or content status.
## Common Issues
**Too restrictive**: Double-check your filter doesn't exclude important pages. Test thoroughly.
**Case sensitivity**: URL paths are case-sensitive. `/Admin` and `/admin` are different.
**Trailing slashes**: Be consistent. `/page` and `/page/` may both exist. Handle both.
**Query parameters**: Decide whether to include pages with query strings. Usually exclude them.
**Performance**: Complex filter functions run for every page. Keep logic simple for better build times.

View File

@ -3,29 +3,356 @@ title: Basic Setup
description: Set up @astrojs/discovery with default configuration description: Set up @astrojs/discovery with default configuration
--- ---
Learn how to set up @astrojs/discovery with the default configuration for immediate use. In this tutorial, you'll learn how to customize the basic settings of @astrojs/discovery to match your project's needs. We'll start simple and gradually add more configuration.
:::note[Work in Progress] ## What You'll Build
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon By the end of this tutorial, you'll have:
- A fully configured discovery integration
- Custom site description for AI assistants
- Team credits in humans.txt
- Properly configured robots.txt
This section will include: ## Before You Start
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Make sure you have:
- @astrojs/discovery installed
- Your `site` URL configured in `astro.config.mjs`
- A working Astro project
- [Configuration Reference](/reference/configuration/) ## Step 1: Start with the Minimal Setup
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? Open your `astro.config.mjs` file. You should have something like this:
- Check our [FAQ](/community/faq/) ```typescript
- Visit [Troubleshooting](/community/troubleshooting/) import { defineConfig } from 'astro';
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery()
]
});
```
This minimal setup works, but we can make it better!
## Step 2: Add a Site Description
Let's help AI assistants understand your site better. Add an `llms` configuration:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A personal blog about web development, focusing on modern JavaScript frameworks and best practices',
}
})
]
});
```
Build your site and check `dist/llms.txt`:
```bash
npm run build
cat dist/llms.txt
```
You'll now see your description in the generated file!
## Step 3: Add Your Team Information
Let's add credits to humans.txt. Update your configuration:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A personal blog about web development, focusing on modern JavaScript frameworks and best practices',
},
humans: {
team: [
{
name: 'Your Name',
role: 'Developer',
contact: 'you@example.com',
location: 'Your City',
}
],
}
})
]
});
```
Build again and check `dist/humans.txt`:
```bash
npm run build
cat dist/humans.txt
```
You should see:
```txt
/* TEAM */
Name: Your Name
Role: Developer
Contact: you@example.com
Location: Your City
/* SITE */
Last update: 2025-01-08
Language: English
Doctype: HTML5
Tech stack: Astro
```
Great! Your name is now in the credits.
## Step 4: Customize the Tech Stack
Let's document the technologies you're actually using:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A personal blog about web development, focusing on modern JavaScript frameworks and best practices',
},
humans: {
team: [
{
name: 'Your Name',
role: 'Developer',
contact: 'you@example.com',
location: 'Your City',
}
],
site: {
lastUpdate: 'auto',
language: 'English',
doctype: 'HTML5',
techStack: ['Astro', 'TypeScript', 'React', 'Tailwind CSS'],
}
}
})
]
});
```
Build and verify:
```bash
npm run build
cat dist/humans.txt
```
Now the tech stack section lists all your technologies!
## Step 5: Add Key Features for AI
Help AI assistants understand what makes your site special:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A personal blog about web development, focusing on modern JavaScript frameworks and best practices',
keyFeatures: [
'In-depth tutorials on modern web development',
'Code examples with live demos',
'Weekly newsletter with web dev tips',
'Open source project showcase',
],
},
humans: {
team: [
{
name: 'Your Name',
role: 'Developer',
contact: 'you@example.com',
location: 'Your City',
}
],
site: {
lastUpdate: 'auto',
language: 'English',
doctype: 'HTML5',
techStack: ['Astro', 'TypeScript', 'React', 'Tailwind CSS'],
}
}
})
]
});
```
Build and check `dist/llms.txt`:
```bash
npm run build
cat dist/llms.txt
```
You'll see your key features listed prominently!
## Step 6: Adjust Crawl Delay
If you want to be more or less friendly to bots, adjust the crawl delay:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
crawlDelay: 2, // Wait 2 seconds between requests
},
llms: {
description: 'A personal blog about web development, focusing on modern JavaScript frameworks and best practices',
keyFeatures: [
'In-depth tutorials on modern web development',
'Code examples with live demos',
'Weekly newsletter with web dev tips',
'Open source project showcase',
],
},
humans: {
team: [
{
name: 'Your Name',
role: 'Developer',
contact: 'you@example.com',
location: 'Your City',
}
],
site: {
lastUpdate: 'auto',
language: 'English',
doctype: 'HTML5',
techStack: ['Astro', 'TypeScript', 'React', 'Tailwind CSS'],
}
}
})
]
});
```
Check `dist/robots.txt`:
```bash
npm run build
cat dist/robots.txt
```
The crawl delay is now 2 seconds!
## Step 7: Test Everything
Start your dev server and test all the files:
```bash
npm run dev
```
Visit these URLs in your browser:
- `http://localhost:4321/robots.txt` - Should show crawl delay of 2
- `http://localhost:4321/llms.txt` - Should show your description and key features
- `http://localhost:4321/humans.txt` - Should show your team info and tech stack
## What You've Learned
You now know how to:
- Add site descriptions for AI assistants
- Credit your team in humans.txt
- Document your tech stack
- List key features for AI discovery
- Adjust bot crawl behavior
## Your Complete Configuration
Here's what you've built:
```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
crawlDelay: 2,
},
llms: {
description: 'A personal blog about web development, focusing on modern JavaScript frameworks and best practices',
keyFeatures: [
'In-depth tutorials on modern web development',
'Code examples with live demos',
'Weekly newsletter with web dev tips',
'Open source project showcase',
],
},
humans: {
team: [
{
name: 'Your Name',
role: 'Developer',
contact: 'you@example.com',
location: 'Your City',
}
],
site: {
lastUpdate: 'auto',
language: 'English',
doctype: 'HTML5',
techStack: ['Astro', 'TypeScript', 'React', 'Tailwind CSS'],
}
}
})
]
});
```
## Next Steps
Now that you have the basics configured, explore more advanced features:
- [Configure robots.txt](/tutorials/configure-robots/) - Control which bots can access what
- [Setup llms.txt](/tutorials/setup-llms/) - Provide detailed instructions for AI
- [Create humans.txt](/tutorials/create-humans/) - Add story and fun facts
- [Security & Canary](/tutorials/security-canary/) - Add security contact info
## Troubleshooting
### Configuration Not Taking Effect?
Make sure to rebuild after changing config:
```bash
npm run build
```
### Type Errors in Config?
Install TypeScript definitions if needed:
```bash
npm install -D @types/node
```
### Need More Options?
Check the [Configuration Reference](/reference/configuration/) for all available options.

View File

@ -3,29 +3,396 @@ title: Configure robots.txt
description: Customize your robots.txt file description: Customize your robots.txt file
--- ---
Learn how to configure robots.txt to control search engine and bot crawling behavior. In this tutorial, you'll learn how to configure robots.txt to control which bots can access your site and how they should behave.
:::note[Work in Progress] ## What You'll Build
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon By the end of this tutorial, you'll have:
- Custom bot access rules
- Specific rules for different user agents
- Protected admin and private pages
- Configured LLM bot access
This section will include: ## Before You Start
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Make sure you have:
- Completed the [Basic Setup](/tutorials/basic-setup/) tutorial
- An understanding of what robots.txt does
- Pages you want to protect from bots
- [Configuration Reference](/reference/configuration/) ## Step 1: Understanding the Default robots.txt
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? Build your site and look at the default robots.txt:
- Check our [FAQ](/community/faq/) ```bash
- Visit [Troubleshooting](/community/troubleshooting/) npm run build
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) cat dist/robots.txt
```
You'll see:
```txt
User-agent: *
Allow: /
# Sitemaps
Sitemap: https://your-site.com/sitemap-index.xml
# LLM-specific resources
User-agent: Anthropic-AI
User-agent: Claude-Web
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: cohere-ai
User-agent: Google-Extended
Allow: /llms.txt
Crawl-delay: 1
```
This allows all bots full access to your site. Let's customize it!
## Step 2: Block Private Pages
Let's say you have admin and draft pages you want to protect. Update your config:
```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private'],
}
],
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/robots.txt
```
You'll now see:
```txt
User-agent: *
Disallow: /admin
Disallow: /draft
Disallow: /private
Allow: /
```
All bots are now blocked from those paths!
## Step 3: Allow Specific Paths Only
Maybe you want to limit a specific bot to only certain paths. Let's allow a custom bot to access only the API:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private'],
},
{
userAgent: 'MyCustomBot',
allow: ['/api'],
disallow: ['/'],
}
],
}
})
]
});
```
Build and verify:
```bash
npm run build
cat dist/robots.txt
```
Now you'll see rules for MyCustomBot:
```txt
User-agent: MyCustomBot
Allow: /api
Disallow: /
```
This bot can only access `/api` paths!
## Step 4: Block a Troublesome Bot Completely
Let's block a specific bot entirely:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private'],
},
{
userAgent: 'BadBot',
disallow: ['/'],
}
],
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/robots.txt
```
BadBot is now completely blocked!
## Step 5: Adjust Crawl Delay
If your server is getting hammered by bots, increase the crawl delay:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
crawlDelay: 5, // Wait 5 seconds between requests
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private'],
}
],
}
})
]
});
```
Build and verify:
```bash
npm run build
cat dist/robots.txt
```
You'll see `Crawl-delay: 5` in the file.
## Step 6: Customize LLM Bot Access
By default, LLM bots can access everything. Let's control this:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
llmBots: {
enabled: true,
agents: ['Anthropic-AI', 'Claude-Web'], // Only allow these AI bots
},
additionalAgents: [
{
userAgent: '*',
disallow: ['/admin', '/draft', '/private'],
},
{
userAgent: 'GPTBot', // Block OpenAI's bot specifically
disallow: ['/'],
}
],
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/robots.txt
```
Now only Anthropic AI bots have special access!
## Step 7: Disable LLM Bots Entirely
If you don't want AI bots crawling your site:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
robots: {
llmBots: {
enabled: false, // No special LLM bot rules
},
}
})
]
});
```
Build and verify - the LLM-specific section will be gone!
## Step 8: Test Your Configuration
Start the dev server:
```bash
npm run dev
```
Visit `http://localhost:4321/robots.txt` to see your rules in action.
## What You've Learned
You now know how to:
- Block specific paths from all bots
- Create bot-specific access rules
- Completely block troublesome bots
- Adjust crawl delay to protect your server
- Control LLM bot access
- Enable or disable AI bot crawling
## Common Patterns
### E-commerce Site
```typescript
discovery({
robots: {
crawlDelay: 2,
additionalAgents: [
{
userAgent: '*',
disallow: ['/checkout', '/account', '/admin'],
},
{
userAgent: 'PriceScraperBot',
disallow: ['/'],
}
],
}
})
```
### Blog with Drafts
```typescript
discovery({
robots: {
additionalAgents: [
{
userAgent: '*',
disallow: ['/draft', '/preview'],
}
],
}
})
```
### API Platform
```typescript
discovery({
robots: {
llmBots: {
enabled: true, // Let AI bots learn from docs
},
additionalAgents: [
{
userAgent: '*',
allow: ['/docs', '/api/v1'],
disallow: ['/admin', '/api/internal'],
}
],
}
})
```
## Testing Your Rules
After configuring, always test:
1. **Build and inspect:**
```bash
npm run build
cat dist/robots.txt
```
2. **Verify in dev:**
```bash
npm run dev
# Visit http://localhost:4321/robots.txt
```
3. **Test with Google's tool:**
Use Google Search Console's robots.txt tester after deploying
## Next Steps
- [Setup llms.txt](/tutorials/setup-llms/) - Configure AI assistant instructions
- [Create humans.txt](/tutorials/create-humans/) - Add team credits
- [Block Bots How-To](/how-to/block-bots/) - Advanced bot blocking patterns
## Troubleshooting
### Bots Still Accessing Blocked Pages?
Remember that robots.txt is a suggestion, not enforcement. Badly behaved bots may ignore it. Consider:
- Server-level blocking with .htaccess or nginx rules
- Rate limiting
- IP blocking
### Rules Not Taking Effect?
Make sure to:
1. Rebuild after config changes
2. Clear CDN cache if using one
3. Wait for bots to re-crawl robots.txt
4. Check that robots.txt is at the root URL
### LLM Bots Not Listed?
The default LLM bots are:
- Anthropic-AI
- Claude-Web
- GPTBot
- ChatGPT-User
- cohere-ai
- Google-Extended
New bots appear regularly - add them manually to the agents array.

View File

@ -3,29 +3,499 @@ title: Create humans.txt
description: Add team credits and tech stack information description: Add team credits and tech stack information
--- ---
Learn how to create a humans.txt file to credit your team and document your tech stack. In this tutorial, you'll learn how to create a humans.txt file to credit your team, tell your project's story, and document your technology stack.
:::note[Work in Progress] ## What You'll Build
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon By the end of this tutorial, you'll have:
- Team member credits with contact information
- A list of acknowledgments
- Documented tech stack
- Your project's story
- Fun facts about your project
This section will include: ## Before You Start
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Make sure you have:
- Completed the [Basic Setup](/tutorials/basic-setup/) tutorial
- Information about your team members
- A sense of your project's story
- [Configuration Reference](/reference/configuration/) ## Step 1: Add Team Members
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? Open your `astro.config.mjs` and add your team:
- Check our [FAQ](/community/faq/) ```typescript
- Visit [Troubleshooting](/community/troubleshooting/) import { defineConfig } from 'astro';
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [
{
name: 'Jane Developer',
role: 'Lead Developer',
contact: 'jane@example.com',
location: 'San Francisco, CA',
}
],
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/humans.txt
```
You'll see:
```txt
/* TEAM */
Name: Jane Developer
Role: Lead Developer
Contact: jane@example.com
Location: San Francisco, CA
```
Your first team member is credited!
## Step 2: Add Multiple Team Members
Let's add more people:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [
{
name: 'Jane Developer',
role: 'Lead Developer',
contact: 'jane@example.com',
location: 'San Francisco, CA',
twitter: '@janedev',
github: 'janedev',
},
{
name: 'Bob Designer',
role: 'UI/UX Designer',
contact: 'bob@example.com',
location: 'Austin, TX',
twitter: '@bobdesigns',
},
{
name: 'Alice DevOps',
role: 'Infrastructure Engineer',
location: 'Remote',
github: 'alice-ops',
},
],
}
})
]
});
```
Build and verify - all team members are now listed!
## Step 3: Add Thanks and Acknowledgments
Credit the people and projects that helped:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [
{
name: 'Jane Developer',
role: 'Lead Developer',
contact: 'jane@example.com',
}
],
thanks: [
'The amazing Astro team',
'Our supportive open source community',
'Coffee, for making this possible',
'All our beta testers',
'Stack Overflow (you know why)',
],
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/humans.txt
```
You'll see a new THANKS section!
## Step 4: Document Your Tech Stack
Let's tell people what you built with:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [/* ... */],
thanks: [/* ... */],
site: {
lastUpdate: 'auto', // Automatically uses current date
language: 'English',
doctype: 'HTML5',
ide: 'VS Code',
techStack: [
'Astro',
'TypeScript',
'React',
'Tailwind CSS',
'PostgreSQL',
],
standards: [
'HTML5',
'CSS3',
'WCAG 2.1 AA',
],
components: [
'React',
'Astro Components',
],
software: [
'Figma',
'Docker',
'GitHub Actions',
],
},
}
})
]
});
```
Build and verify - comprehensive tech documentation!
## Step 5: Tell Your Story
Add a narrative about your project:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [/* ... */],
thanks: [/* ... */],
site: {/* ... */},
story: `
This project started in early 2024 when we realized there was no simple
way to build fast, modern websites without complex tooling. We fell in
love with Astro and built this site to showcase what's possible.
Three months, 47 cups of coffee, and countless late nights later, we
launched. The response from the community has been incredible, and we're
just getting started.
`.trim(),
}
})
]
});
```
Build and check - your story is now part of humans.txt!
## Step 6: Add Fun Facts
Make it personal with fun facts:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [/* ... */],
thanks: [/* ... */],
site: {/* ... */},
story: `...`,
funFacts: [
'Built entirely on mechanical keyboards',
'Fueled by 347 cups of coffee and 128 energy drinks',
'First deployed from a coffee shop in Portland',
'Named after a joke that nobody remembers anymore',
'Our mascot is a rubber duck named Herbert',
],
}
})
]
});
```
Build and verify - personality added!
## Step 7: Add Your Philosophy
Share your project values:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [/* ... */],
philosophy: [
'Users first, always',
'Fast is a feature',
'Accessibility is not optional',
'Simple over complex',
'Open source by default',
],
}
})
]
});
```
Build and check - your values are documented!
## Step 8: Add Custom Sections
Need something specific? Add custom sections:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
humans: {
team: [/* ... */],
customSections: {
'SUSTAINABILITY': 'Hosted on 100% renewable energy, carbon-offset delivery',
'COMMUNITY': 'Join us on Discord: discord.gg/yourserver',
'HIRING': 'We\'re hiring! Check careers.example.com',
},
}
})
]
});
```
## What You've Learned
You now know how to:
- Credit team members with full details
- Add social links (Twitter, GitHub)
- Thank contributors and inspirations
- Document your complete tech stack
- Tell your project's story
- Add fun facts and personality
- Share your project philosophy
- Create custom sections
## Complete Example
Here's a full, real-world humans.txt configuration:
```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://awesome-project.com',
integrations: [
discovery({
humans: {
team: [
{
name: 'Sarah Chen',
role: 'Founder & Lead Developer',
contact: 'sarah@awesome-project.com',
location: 'Seattle, WA',
twitter: '@sarahchen',
github: 'sarahchen',
},
{
name: 'Marcus Johnson',
role: 'Senior Developer',
contact: 'marcus@awesome-project.com',
location: 'Austin, TX',
github: 'marcusj',
},
{
name: 'Yuki Tanaka',
role: 'Designer & UX Lead',
location: 'Tokyo, Japan',
twitter: '@yukidesigns',
},
],
thanks: [
'The Astro core team for building an amazing framework',
'Our 1,247 GitHub stargazers',
'Beta testers who found all the edge cases',
'The open source community',
'Our families for putting up with late-night deploys',
],
site: {
lastUpdate: 'auto',
language: 'English / 日本語',
doctype: 'HTML5',
ide: 'VS Code + Vim',
techStack: [
'Astro 4.0',
'TypeScript',
'React',
'Tailwind CSS',
'PostgreSQL',
'Redis',
],
standards: [
'HTML5',
'CSS3',
'WCAG 2.1 AA',
'JSON:API',
],
components: [
'React',
'Astro Islands',
'Headless UI',
],
software: [
'Figma',
'Docker',
'GitHub Actions',
'Playwright',
],
},
story: `
Awesome Project was born from a simple frustration: building modern
web apps was too complicated. We wanted something fast, simple, and
delightful to use.
In January 2024, Sarah had the idea over coffee. By March, we had a
working prototype. In June, we launched to the world. The response
was overwhelming - thousands of developers joined our community in
the first week.
Today, Awesome Project powers over 10,000 websites worldwide, from
personal blogs to enterprise applications. But we're just getting
started.
`.trim(),
funFacts: [
'Written entirely in coffee shops across 3 continents',
'Our first commit was made on a plane at 30,000 feet',
'The codebase includes exactly 42 easter eggs',
'Marcus has never used a mouse - keyboard shortcuts only',
'Yuki designed the logo in 7 minutes on a napkin',
'We\'ve gone through 23 different logo iterations',
'The project mascot is a caffeinated squirrel',
],
philosophy: [
'Users come first, always',
'Performance is a feature, not an afterthought',
'Accessibility is mandatory, not optional',
'Simple solutions beat complex ones',
'Open source by default',
'Documentation is just as important as code',
'Be kind, be curious, be helpful',
],
}
})
]
});
```
This generates a comprehensive humans.txt file!
## Testing Your humans.txt
Build and review:
```bash
npm run build
cat dist/humans.txt
```
Or in dev mode:
```bash
npm run dev
# Visit http://localhost:4321/humans.txt
```
## Tips for Great Credits
### Be Genuine
Don't add fake team members or exaggerated thanks. People appreciate authenticity.
### Update Regularly
When team members change or the tech stack evolves, update humans.txt.
### Keep It Fun
humans.txt is one place where personality is encouraged. Add jokes, fun facts, and quirks!
### Credit Everyone
Don't forget contractors, beta testers, and community contributors.
### Link Appropriately
Use Twitter and GitHub handles so people can connect with your team.
## Next Steps
- [Security & Canary](/tutorials/security-canary/) - Add security contact info
- [WebFinger](/tutorials/webfinger/) - Enable federated discovery
- [Add Team Members How-To](/how-to/add-team-members/) - Advanced team management
## Troubleshooting
### Auto Date Not Working?
Make sure you're using `'auto'` as a string:
```typescript
lastUpdate: 'auto' // Correct
lastUpdate: auto // Wrong
```
### Too Much Information?
humans.txt can be as short or long as you want. Include only what feels right for your project.
### Character Encoding Issues?
Make sure your config file is saved as UTF-8, especially if using non-ASCII characters.
### Social Links Not Showing?
Optional fields only appear if you provide them. It's fine to omit twitter or github if not applicable.

View File

@ -3,29 +3,519 @@ title: Security & Canary Files
description: Set up security.txt and canary.txt description: Set up security.txt and canary.txt
--- ---
Configure security contact information and warrant canaries for transparency. In this tutorial, you'll learn how to set up security.txt for responsible disclosure and canary.txt for transparency about government requests.
:::note[Work in Progress] ## What You'll Build
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon By the end of this tutorial, you'll have:
- RFC 9116 compliant security.txt
- Contact information for security researchers
- A warrant canary for transparency
- Automated expiration dates
- PGP encryption details (optional)
This section will include: ## Before You Start
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Make sure you have:
- Completed the [Basic Setup](/tutorials/basic-setup/) tutorial
- A security contact email or URL
- Understanding of responsible disclosure
- [Configuration Reference](/reference/configuration/) ## Part 1: Setting Up security.txt
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? ### Step 1: Add Basic Security Contact
- Check our [FAQ](/community/faq/) Open your `astro.config.mjs` and add security configuration:
- Visit [Troubleshooting](/community/troubleshooting/)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) ```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
security: {
contact: 'security@your-site.com',
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/.well-known/security.txt
```
You'll see:
```txt
Contact: mailto:security@your-site.com
Expires: 2026-01-08T00:00:00.000Z
Canonical: https://your-site.com/.well-known/security.txt
```
Your site now has an RFC 9116 compliant security.txt!
### Step 2: Add Expiration Date
The integration auto-generates expiration (1 year), but you can customize it:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
security: {
contact: 'security@your-site.com',
expires: '2025-12-31T23:59:59Z', // Custom expiration
}
})
]
});
```
Build and verify the custom expiration!
### Step 3: Add PGP Encryption Key
Provide your PGP key for encrypted communications:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
security: {
contact: 'security@your-site.com',
expires: 'auto', // Use auto-generation
encryption: 'https://your-site.com/pgp-key.txt',
}
})
]
});
```
Build and check - encryption URL is now included!
### Step 4: Add Acknowledgments Page
Give credit to security researchers:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
security: {
contact: 'security@your-site.com',
expires: 'auto',
encryption: 'https://your-site.com/pgp-key.txt',
acknowledgments: 'https://your-site.com/security/hall-of-fame',
}
})
]
});
```
Build and verify!
### Step 5: Add Security Policy
Link to your responsible disclosure policy:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
security: {
contact: 'security@your-site.com',
expires: 'auto',
encryption: 'https://your-site.com/pgp-key.txt',
acknowledgments: 'https://your-site.com/security/hall-of-fame',
policy: 'https://your-site.com/security/policy',
preferredLanguages: ['en', 'es'],
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/.well-known/security.txt
```
Complete security.txt is now ready!
## Part 2: Setting Up canary.txt
### Step 1: Add Basic Canary
A warrant canary signals you haven't received secret government requests:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
security: {
contact: 'security@your-site.com',
},
canary: {
organization: 'Your Company Inc',
contact: 'canary@your-site.com',
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/.well-known/canary.txt
```
You'll see:
```txt
-----BEGIN CANARY STATEMENT-----
Organization: Your Company Inc
Contact: canary@your-site.com
Issued: 2025-01-08T12:00:00.000Z
Expires: 2025-02-12T12:00:00.000Z
As of the date above, Your Company Inc has not received any national
security orders or gag orders.
-----END CANARY STATEMENT-----
```
### Step 2: Set Update Frequency
Control how often you update the canary:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
canary: {
organization: 'Your Company Inc',
contact: 'canary@your-site.com',
frequency: 'monthly', // daily, weekly, monthly, quarterly, yearly
}
})
]
});
```
The expiration auto-adjusts:
- daily: 2 days
- weekly: 10 days
- monthly: 35 days (default)
- quarterly: 100 days
- yearly: 380 days
### Step 3: Add Specific Statements
Declare what you haven't received:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
canary: {
organization: 'Your Company Inc',
contact: 'canary@your-site.com',
frequency: 'monthly',
statements: [
{
type: 'nsl',
description: 'National Security Letters',
received: false,
},
{
type: 'fisa',
description: 'FISA court orders',
received: false,
},
{
type: 'gag',
description: 'Gag orders preventing disclosure',
received: false,
},
{
type: 'subpoena',
description: 'Government subpoenas for user data',
received: false,
},
],
}
})
]
});
```
Build and verify - specific statements are listed!
### Step 4: Add Personnel Statement
Confirm no team members are under duress:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
canary: {
organization: 'Your Company Inc',
contact: 'canary@your-site.com',
frequency: 'monthly',
statements: [
{
type: 'nsl',
description: 'National Security Letters',
received: false,
},
],
personnelStatement: true, // Add duress check
}
})
]
});
```
Build and check - personnel confirmation is added!
### Step 5: Add PGP Verification
Sign your canary with PGP:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
canary: {
organization: 'Your Company Inc',
contact: 'canary@your-site.com',
frequency: 'monthly',
statements: [/* ... */],
verification: 'PGP Signature: https://your-site.com/canary.txt.asc',
}
})
]
});
```
Build and verify!
### Step 6: Add Blockchain Proof (Advanced)
For maximum transparency, add blockchain verification:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
canary: {
organization: 'Your Company Inc',
contact: 'canary@your-site.com',
frequency: 'monthly',
blockchainProof: {
network: 'Bitcoin',
address: '1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa',
txHash: '4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b',
timestamp: '2025-01-08T12:00:00Z',
},
}
})
]
});
```
This proves the canary was published at a specific time!
## Step 7: Test Both Files
Start dev server:
```bash
npm run dev
```
Visit:
- `http://localhost:4321/.well-known/security.txt`
- `http://localhost:4321/.well-known/canary.txt`
Both files should be accessible!
## What You've Learned
You now know how to:
- Create RFC 9116 compliant security.txt
- Add security contact and encryption details
- Link to security policies and acknowledgments
- Set up a warrant canary
- Configure update frequency
- Add specific statements
- Verify with PGP or blockchain
## Complete Example
Here's a comprehensive security and canary setup:
```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://example.com',
integrations: [
discovery({
security: {
contact: 'security@example.com',
expires: 'auto', // 1 year from now
encryption: 'https://example.com/pgp-key.txt',
acknowledgments: 'https://example.com/security/hall-of-fame',
preferredLanguages: ['en', 'es'],
policy: 'https://example.com/security/policy',
hiring: 'https://example.com/security/jobs',
},
canary: {
organization: 'Example Corp',
contact: 'canary@example.com',
frequency: 'monthly',
statements: [
{
type: 'nsl',
description: 'National Security Letters',
received: false,
},
{
type: 'fisa',
description: 'FISA court orders',
received: false,
},
{
type: 'gag',
description: 'Gag orders',
received: false,
},
{
type: 'warrant',
description: 'Government search warrants',
received: false,
},
],
additionalStatement: 'We are committed to transparency and protecting user privacy.',
personnelStatement: true,
verification: 'PGP Signature: https://example.com/canary.txt.asc',
blockchainProof: {
network: 'Bitcoin',
address: '1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa',
},
}
})
]
});
```
## Best Practices
### Security.txt
1. **Keep it updated**: Set calendar reminders before expiration
2. **Test your contact**: Make sure security@example.com works
3. **Provide encryption**: Security researchers prefer encrypted communication
4. **Be responsive**: Respond to reports within 24-48 hours
5. **Give credit**: Maintain a hall of fame for responsible disclosers
### Canary.txt
1. **Update regularly**: Stick to your frequency schedule
2. **Automate**: Set up automated deployment/updates
3. **Sign it**: Use PGP signatures for verification
4. **Be consistent**: Always update on the same day
5. **Archive old canaries**: Keep a history for transparency
6. **Don't lie**: Only use if you can commit to honesty
## Important Legal Notes
### Warrant Canaries
Consult with legal counsel before implementing a warrant canary:
- Laws vary by jurisdiction
- May not be legally effective everywhere
- Could have unintended consequences
- Should be part of broader transparency efforts
### Security.txt
- Must be accurate and up-to-date
- Contact must actually work
- Expiration date is required by RFC 9116
- Should be part of a real security program
## Troubleshooting
### security.txt Not in .well-known?
The file should be at `/.well-known/security.txt` per RFC 9116. Check:
```bash
ls dist/.well-known/
```
### Canary Expired?
Rebuild your site regularly to update timestamps:
```bash
npm run build
```
Consider automating rebuilds based on your frequency.
### Email Not Showing mailto: Prefix?
The integration auto-adds it. Just provide:
```typescript
contact: 'security@example.com' // Becomes mailto:security@example.com
```
### Want Multiple Contacts?
Use an array:
```typescript
contact: ['security@example.com', 'https://example.com/security/report']
```
## Next Steps
- [WebFinger Discovery](/tutorials/webfinger/) - Enable federated discovery
- [Environment Config](/how-to/environment-config/) - Different configs per environment
- [Security Explained](/explanation/security-explained/) - Deep dive into security.txt
- [Canary Explained](/explanation/canary-explained/) - Understanding warrant canaries
## Additional Resources
- [RFC 9116 - security.txt](https://www.rfc-editor.org/rfc/rfc9116.html)
- [securitytxt.org](https://securitytxt.org/)
- [Warrant Canary FAQ](https://www.eff.org/deeplinks/2014/04/warrant-canary-faq)

View File

@ -3,29 +3,436 @@ title: Setup llms.txt
description: Configure AI assistant discovery and instructions description: Configure AI assistant discovery and instructions
--- ---
Set up llms.txt to help AI assistants understand and interact with your site. In this tutorial, you'll learn how to set up llms.txt to help AI assistants like Claude, ChatGPT, and others understand and interact with your site effectively.
:::note[Work in Progress] ## What You'll Build
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon By the end of this tutorial, you'll have:
- A comprehensive site description for AI assistants
- Specific instructions for how AI should help users
- Documented API endpoints
- Listed important pages and features
- Defined your brand voice
This section will include: ## Before You Start
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Make sure you have:
- Completed the [Basic Setup](/tutorials/basic-setup/) tutorial
- A clear understanding of your site's purpose
- Knowledge of what AI assistants should know about your site
- [Configuration Reference](/reference/configuration/) ## Step 1: Start with a Good Description
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? Open your `astro.config.mjs` and add a clear, concise description:
- Check our [FAQ](/community/faq/) ```typescript
- Visit [Troubleshooting](/community/troubleshooting/) import { defineConfig } from 'astro';
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development, featuring tutorials, code examples, and best practices for building fast, accessible websites with Astro',
}
})
]
});
```
Build and check:
```bash
npm run build
cat dist/llms.txt
```
You'll see your description prominently displayed!
## Step 2: List Key Features
Help AI assistants understand what makes your site special:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development, featuring tutorials, code examples, and best practices for building fast, accessible websites with Astro',
keyFeatures: [
'Step-by-step tutorials for beginners to advanced developers',
'Interactive code examples with live previews',
'Performance optimization guides',
'Accessibility best practices',
'Weekly newsletter with web dev tips',
],
}
})
]
});
```
Build and verify:
```bash
npm run build
cat dist/llms.txt
```
Your key features are now listed!
## Step 3: Highlight Important Pages
Tell AI assistants about your most valuable content:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development',
keyFeatures: [
'Step-by-step tutorials',
'Interactive code examples',
],
importantPages: [
{
name: 'Getting Started Guide',
path: '/getting-started',
description: 'Begin your web development journey',
},
{
name: 'Tutorial Library',
path: '/tutorials',
description: 'Comprehensive tutorials for all skill levels',
},
{
name: 'API Documentation',
path: '/api',
description: 'Complete API reference',
},
],
}
})
]
});
```
Build and check - AI assistants can now find your key pages!
## Step 4: Provide Specific Instructions
This is where you really help AI assistants. Give them clear guidance:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development',
instructions: `
When helping users with this site:
1. Start by checking the Getting Started guide for new users
2. Reference specific tutorials when answering technical questions
3. Link to the API documentation for detailed method references
4. Encourage users to try the interactive code examples
5. Suggest subscribing to the newsletter for ongoing learning
6. Always provide working code examples when possible
7. Mention performance and accessibility considerations
`.trim(),
}
})
]
});
```
Build and verify:
```bash
npm run build
cat dist/llms.txt
```
Now AI assistants have clear instructions!
## Step 5: Document API Endpoints
If your site has an API, document it:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development',
instructions: `When helping users...`,
apiEndpoints: [
{
path: '/api/tutorials',
method: 'GET',
description: 'List all available tutorials with filtering options',
},
{
path: '/api/search',
method: 'GET',
description: 'Search site content by keyword',
},
{
path: '/api/subscribe',
method: 'POST',
description: 'Subscribe to the newsletter',
},
],
}
})
]
});
```
Build and check - your API is now documented!
## Step 6: Define Your Tech Stack
Help AI assistants understand your technical foundation:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development',
techStack: {
frontend: ['Astro', 'React', 'TypeScript', 'Tailwind CSS'],
backend: ['Node.js', 'Express'],
ai: ['OpenAI API', 'Claude API'],
other: ['PostgreSQL', 'Redis', 'Docker'],
},
}
})
]
});
```
Build and verify - AI knows your stack!
## Step 7: Set Your Brand Voice
Guide AI assistants on how to communicate about your site:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development',
brandVoice: [
'Friendly and approachable, never condescending',
'Technical but accessible - explain complex topics clearly',
'Encouraging and supportive of learning',
'Practical and example-driven',
'Honest about limitations and edge cases',
],
}
})
]
});
```
Build and check the results!
## Step 8: Add Custom Sections
Need something specific? Add custom sections:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
llms: {
description: 'A comprehensive guide to modern web development',
customSections: {
'Community Guidelines': 'Be respectful, help others, share knowledge',
'Support': 'For questions, visit our Discord or open a GitHub issue',
'Contributing': 'We welcome contributions! See CONTRIBUTING.md',
},
}
})
]
});
```
## Step 9: Test with a Dev Server
Start your dev server and check the results:
```bash
npm run dev
```
Visit `http://localhost:4321/llms.txt` and review everything.
## What You've Learned
You now know how to:
- Write effective site descriptions for AI
- List key features and important pages
- Provide specific instructions to AI assistants
- Document API endpoints
- Define your tech stack
- Set your brand voice
- Add custom sections
## Complete Example
Here's a full, real-world example:
```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://webdev-academy.com',
integrations: [
discovery({
llms: {
description: 'WebDev Academy is an interactive learning platform for modern web development, offering hands-on tutorials, real-world projects, and a supportive community',
keyFeatures: [
'Interactive coding challenges with instant feedback',
'Project-based learning with real-world applications',
'Peer code review and mentorship program',
'Career guidance and interview preparation',
'Regularly updated with latest web technologies',
],
importantPages: [
{
name: 'Learning Paths',
path: '/paths',
description: 'Structured curricula for different skill levels and goals',
},
{
name: 'Interactive Challenges',
path: '/challenges',
description: 'Hands-on coding exercises with progressive difficulty',
},
{
name: 'Community Forum',
path: '/community',
description: 'Ask questions, share projects, get feedback',
},
],
instructions: `
When helping users with WebDev Academy:
1. Assess their skill level first - we have content for beginners to advanced
2. Recommend appropriate learning paths based on their goals
3. Encourage hands-on practice with our interactive challenges
4. Suggest joining the community forum for peer support
5. Link to relevant tutorials and documentation
6. Provide code examples that users can test in our playground
7. Emphasize learning by building real projects
8. Be patient and encouraging - everyone starts somewhere
`.trim(),
apiEndpoints: [
{
path: '/api/challenges',
method: 'GET',
description: 'List coding challenges by difficulty and topic',
},
{
path: '/api/progress',
method: 'GET',
description: 'Get user learning progress and achievements',
},
{
path: '/api/submit',
method: 'POST',
description: 'Submit challenge solutions for automated testing',
},
],
techStack: {
frontend: ['Astro', 'React', 'TypeScript', 'Tailwind CSS'],
backend: ['Node.js', 'Fastify', 'PostgreSQL'],
ai: ['Claude API for code review feedback'],
other: ['Docker', 'Redis', 'Playwright for testing'],
},
brandVoice: [
'Encouraging and supportive - learning to code is hard!',
'Clear and jargon-free explanations',
'Practical and project-focused',
'Honest about the learning curve',
'Community-oriented and collaborative',
],
}
})
]
});
```
## Next Steps
- [Create humans.txt](/tutorials/create-humans/) - Add team credits
- [Security & Canary](/tutorials/security-canary/) - Add security info
- [Customize LLM Instructions](/how-to/customize-llm-instructions/) - Advanced instruction patterns
## Tips for Great AI Instructions
### Be Specific
Bad: "Help users with the site"
Good: "Search the tutorial library first, then provide step-by-step guidance with code examples"
### Give Context
Bad: "We have docs"
Good: "Documentation is at /docs with beginner, intermediate, and advanced sections"
### Set Expectations
Bad: "Answer questions"
Good: "If the question is beyond the site's scope, acknowledge it and suggest external resources"
### Update Regularly
As your site grows, keep instructions current:
- Add new features to keyFeatures
- Update important pages
- Revise instructions based on common user questions
## Troubleshooting
### Instructions Too Long?
Keep it concise - AI assistants have token limits. Focus on:
1. Most common user needs
2. Most important pages
3. Key navigation patterns
### Not Seeing Changes?
Remember to rebuild:
```bash
npm run build
```
### Want to Test AI Understanding?
Ask an AI assistant like Claude:
"What can you tell me about [your-site.com]?"
The assistant should reference your llms.txt!

View File

@ -3,29 +3,538 @@ title: WebFinger Discovery
description: Enable WebFinger resource discovery description: Enable WebFinger resource discovery
--- ---
Set up WebFinger for ActivityPub, OpenID Connect, and other federated protocols. In this tutorial, you'll learn how to set up WebFinger for federated discovery, enabling ActivityPub (Mastodon), OpenID Connect, and other federated protocols.
:::note[Work in Progress] ## What You'll Build
This page is currently being developed. Check back soon for complete documentation.
:::
## Coming Soon By the end of this tutorial, you'll have:
- WebFinger endpoint at `/.well-known/webfinger`
- Resource discovery for team members
- ActivityPub/Mastodon integration
- OpenID Connect support (optional)
- Dynamic resource lookups
This section will include: ## Before You Start
- Detailed explanations
- Code examples
- Best practices
- Common patterns
- Troubleshooting tips
## Related Pages Make sure you have:
- Completed the [Basic Setup](/tutorials/basic-setup/) tutorial
- Understanding of what WebFinger is used for
- Knowledge of the resources you want to make discoverable
- [Configuration Reference](/reference/configuration/) ## What is WebFinger?
- [API Reference](/reference/api/)
- [Examples](/examples/ecommerce/)
## Need Help? WebFinger (RFC 7033) lets people and services discover information about resources using simple identifiers like email addresses or usernames. It powers:
- Check our [FAQ](/community/faq/) - **ActivityPub/Mastodon**: Federated social networks
- Visit [Troubleshooting](/community/troubleshooting/) - **OpenID Connect**: Identity federation
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues) - **Team discovery**: Find team members across services
- **Resource metadata**: Link identities to profiles
## Step 1: Enable WebFinger
Open your `astro.config.mjs` and enable WebFinger:
```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
webfinger: {
enabled: true, // WebFinger is opt-in
}
})
]
});
```
Start dev server:
```bash
npm run dev
```
Visit: `http://localhost:4321/.well-known/webfinger?resource=test`
You'll see an empty response - let's add resources!
## Step 2: Add Your First Resource
Let's make yourself discoverable:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
webfinger: {
enabled: true,
resources: [
{
resource: 'acct:you@your-site.com',
aliases: ['https://your-site.com/@you'],
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: 'https://your-site.com/@you',
}
],
}
],
}
})
]
});
```
Test it:
```bash
npm run dev
```
Visit: `http://localhost:4321/.well-known/webfinger?resource=acct:you@your-site.com`
You'll see:
```json
{
"subject": "acct:you@your-site.com",
"aliases": ["https://your-site.com/@you"],
"links": [
{
"rel": "http://webfinger.net/rel/profile-page",
"type": "text/html",
"href": "https://your-site.com/@you"
}
]
}
```
Your first WebFinger resource!
## Step 3: Add ActivityPub Support
Make yourself discoverable on Mastodon:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
webfinger: {
enabled: true,
resources: [
{
resource: 'acct:you@your-site.com',
aliases: [
'https://your-site.com/@you',
'https://your-site.com/users/you',
],
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: 'https://your-site.com/@you',
},
{
rel: 'self',
type: 'application/activity+json', // ActivityPub!
href: 'https://your-site.com/users/you',
}
],
}
],
}
})
]
});
```
Build and test - Mastodon can now discover your profile!
## Step 4: Add Multiple Team Members
Let's add more people:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
webfinger: {
enabled: true,
resources: [
{
resource: 'acct:alice@your-site.com',
properties: {
'http://schema.org/name': 'Alice Developer',
'http://schema.org/jobTitle': 'Lead Developer',
},
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
href: 'https://your-site.com/team/alice',
},
{
rel: 'self',
type: 'application/activity+json',
href: 'https://your-site.com/users/alice',
},
],
},
{
resource: 'acct:bob@your-site.com',
properties: {
'http://schema.org/name': 'Bob Designer',
'http://schema.org/jobTitle': 'UX Designer',
},
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
href: 'https://your-site.com/team/bob',
},
],
},
],
}
})
]
});
```
Test both:
- `?resource=acct:alice@your-site.com`
- `?resource=acct:bob@your-site.com`
## Step 5: Add Avatar Links
Link to profile pictures:
```typescript
{
resource: 'acct:alice@your-site.com',
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
href: 'https://your-site.com/team/alice',
},
{
rel: 'http://webfinger.net/rel/avatar',
type: 'image/jpeg',
href: 'https://your-site.com/avatars/alice.jpg',
},
{
rel: 'self',
type: 'application/activity+json',
href: 'https://your-site.com/users/alice',
},
],
}
```
Now avatars are discoverable!
## Step 6: Use Content Collections
Automatically generate resources from Astro content collections:
```typescript
export default defineConfig({
site: 'https://your-site.com',
integrations: [
discovery({
webfinger: {
enabled: true,
collections: [
{
name: 'team', // Your content collection
resourceTemplate: 'acct:{slug}@your-site.com',
linksBuilder: (member) => [
{
rel: 'http://webfinger.net/rel/profile-page',
href: `https://your-site.com/team/${member.slug}`,
type: 'text/html',
},
{
rel: 'http://webfinger.net/rel/avatar',
href: member.data.avatar,
type: 'image/jpeg',
},
{
rel: 'self',
type: 'application/activity+json',
href: `https://your-site.com/users/${member.slug}`,
},
],
propertiesBuilder: (member) => ({
'http://schema.org/name': member.data.name,
'http://schema.org/jobTitle': member.data.role,
}),
}
],
}
})
]
});
```
Now all team members from your content collection are automatically discoverable!
## Step 7: Filter Links with Rel Parameter
WebFinger supports filtering by link relation:
Test: `?resource=acct:alice@your-site.com&rel=self`
Only links with `rel="self"` will be returned!
## What You've Learned
You now know how to:
- Enable WebFinger on your site
- Create discoverable resources
- Add ActivityPub/Mastodon support
- Link profile pages and avatars
- Add semantic properties
- Use content collections for dynamic resources
- Filter results with rel parameter
## Complete Example: ActivityPub Site
Here's a full setup for a federated social site:
```typescript
import { defineConfig } from 'astro';
import discovery from '@astrojs/discovery';
export default defineConfig({
site: 'https://social-site.com',
integrations: [
discovery({
webfinger: {
enabled: true,
// Static resources
resources: [
{
resource: 'acct:admin@social-site.com',
subject: 'acct:admin@social-site.com',
aliases: [
'https://social-site.com/@admin',
'https://social-site.com/users/admin',
],
properties: {
'http://schema.org/name': 'Site Admin',
},
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: 'https://social-site.com/@admin',
},
{
rel: 'self',
type: 'application/activity+json',
href: 'https://social-site.com/users/admin',
},
{
rel: 'http://webfinger.net/rel/avatar',
type: 'image/png',
href: 'https://social-site.com/avatars/admin.png',
},
],
},
],
// Dynamic from content collection
collections: [
{
name: 'users',
resourceTemplate: 'acct:{slug}@social-site.com',
aliasesBuilder: (user) => [
`https://social-site.com/@${user.slug}`,
`https://social-site.com/users/${user.slug}`,
],
linksBuilder: (user) => [
{
rel: 'http://webfinger.net/rel/profile-page',
type: 'text/html',
href: `https://social-site.com/@${user.slug}`,
},
{
rel: 'self',
type: 'application/activity+json',
href: `https://social-site.com/users/${user.slug}`,
},
{
rel: 'http://webfinger.net/rel/avatar',
type: user.data.avatarType || 'image/jpeg',
href: user.data.avatar,
},
],
propertiesBuilder: (user) => ({
'http://schema.org/name': user.data.displayName,
'http://schema.org/description': user.data.bio,
}),
},
],
}
})
]
});
```
## Use Case Examples
### Mastodon/ActivityPub Integration
```typescript
{
resource: 'acct:user@example.com',
links: [
{
rel: 'self',
type: 'application/activity+json',
href: 'https://example.com/users/user'
}
]
}
```
Mastodon will query your WebFinger, then fetch the ActivityPub actor!
### OpenID Connect
```typescript
{
resource: 'acct:user@example.com',
links: [
{
rel: 'http://openid.net/specs/connect/1.0/issuer',
href: 'https://example.com'
}
]
}
```
Enables identity federation!
### Team Directory
```typescript
{
resource: 'acct:support@example.com',
properties: {
'http://schema.org/name': 'Support Team',
'http://schema.org/email': 'support@example.com'
},
links: [
{
rel: 'http://webfinger.net/rel/profile-page',
href: 'https://example.com/support'
}
]
}
```
## Testing Your WebFinger
### Manual Testing
```bash
curl "https://your-site.com/.well-known/webfinger?resource=acct:you@your-site.com"
```
### With Mastodon
1. Open Mastodon
2. Search for `you@your-site.com`
3. If WebFinger and ActivityPub are configured, you'll be discoverable!
### With WebFinger Client
Use a WebFinger client library to test programmatically.
## Common Link Relations
- `http://webfinger.net/rel/profile-page` - HTML profile page
- `self` - The resource itself (ActivityPub actor)
- `http://webfinger.net/rel/avatar` - Profile picture
- `http://openid.net/specs/connect/1.0/issuer` - OpenID issuer
- `http://ostatus.org/schema/1.0/subscribe` - Subscription endpoint
## Troubleshooting
### WebFinger Not Working?
Make sure:
1. **Enabled**: `enabled: true` in config
2. **Resources added**: At least one resource or collection
3. **Query parameter**: Include `?resource=acct:user@domain.com`
4. **CORS**: WebFinger includes CORS headers automatically
### Resource Not Found?
Check the exact resource URI:
```bash
curl "http://localhost:4321/.well-known/webfinger?resource=acct:exact@match.com"
```
Resource must match exactly!
### Mastodon Can't Find You?
Mastodon needs:
1. WebFinger with correct resource format
2. ActivityPub actor at the linked URL
3. Proper CORS headers (automatic)
4. HTTPS in production
### Content Collection Not Working?
Make sure:
- Collection name matches exactly
- Template variables use correct field names
- linksBuilder and propertiesBuilder return correct types
## Security Considerations
### Privacy
WebFinger makes information public. Only expose what you want discoverable:
- Don't include private email addresses
- Be careful with personal information
- Consider what's already public
### Rate Limiting
WebFinger endpoints can be queried frequently. Consider:
- Caching responses
- Rate limiting at server level
- CDN caching
### Validation
The integration validates resource URIs automatically, but be careful with:
- User-generated content in collections
- External URLs in links
- Property values
## Next Steps
- [ActivityPub How-To](/how-to/activitypub/) - Full ActivityPub integration
- [Content Collections How-To](/how-to/content-collections/) - Advanced collection patterns
- [WebFinger Explained](/explanation/webfinger-explained/) - Deep dive into RFC 7033
## Additional Resources
- [RFC 7033 - WebFinger](https://www.rfc-editor.org/rfc/rfc7033.html)
- [WebFinger.net](https://webfinger.net/)
- [ActivityPub Spec](https://www.w3.org/TR/activitypub/)
- [Mastodon Documentation](https://docs.joinmastodon.org/)

123
status.json Normal file
View File

@ -0,0 +1,123 @@
{
"task_masters": {
"tutorials": {
"status": "ready",
"branch": "docs/tutorials-content",
"worktree": "docs-tutorials",
"pages": [
"getting-started/installation.md",
"getting-started/quick-start.md",
"getting-started/first-steps.md",
"tutorials/basic-setup.md",
"tutorials/configure-robots.md",
"tutorials/setup-llms.md",
"tutorials/create-humans.md",
"tutorials/security-canary.md",
"tutorials/webfinger.md"
],
"dependencies": [],
"completed_pages": [
"getting-started/installation.md",
"getting-started/quick-start.md",
"getting-started/first-steps.md",
"tutorials/basic-setup.md",
"tutorials/configure-robots.md",
"tutorials/setup-llms.md",
"tutorials/create-humans.md",
"tutorials/security-canary.md",
"tutorials/webfinger.md"
]
},
"howto": {
"status": "ready",
"branch": "docs/howto-content",
"worktree": "docs-howto",
"pages": [
"how-to/block-bots.md",
"how-to/customize-llm-instructions.md",
"how-to/add-team-members.md",
"how-to/filter-sitemap.md",
"how-to/cache-headers.md",
"how-to/environment-config.md",
"how-to/content-collections.md",
"how-to/custom-templates.md",
"how-to/activitypub.md"
],
"dependencies": ["tutorials"],
"completed_pages": [
"how-to/block-bots.md",
"how-to/customize-llm-instructions.md",
"how-to/add-team-members.md",
"how-to/filter-sitemap.md",
"how-to/cache-headers.md",
"how-to/environment-config.md",
"how-to/content-collections.md",
"how-to/custom-templates.md",
"how-to/activitypub.md"
]
},
"reference": {
"status": "ready",
"branch": "docs/reference-content",
"worktree": "docs-reference",
"pages": [
"reference/configuration.md",
"reference/api.md",
"reference/robots.md",
"reference/llms.md",
"reference/humans.md",
"reference/security.md",
"reference/canary.md",
"reference/webfinger.md",
"reference/sitemap.md",
"reference/cache.md",
"reference/typescript.md"
],
"dependencies": [],
"completed_pages": [
"reference/configuration.md",
"reference/api.md",
"reference/robots.md",
"reference/llms.md",
"reference/humans.md",
"reference/security.md",
"reference/canary.md",
"reference/webfinger.md",
"reference/sitemap.md",
"reference/cache.md",
"reference/typescript.md"
]
},
"explanation": {
"status": "executing",
"branch": "docs/explanation-content",
"worktree": "docs-explanation",
"pages": [
"explanation/why-discovery.md",
"explanation/robots-explained.md",
"explanation/llms-explained.md",
"explanation/humans-explained.md",
"explanation/security-explained.md",
"explanation/canary-explained.md",
"explanation/webfinger-explained.md",
"explanation/seo.md",
"explanation/ai-integration.md",
"explanation/architecture.md",
"examples/ecommerce.md",
"examples/documentation.md",
"examples/blog.md",
"examples/api-platform.md",
"examples/multilanguage.md",
"examples/federated-social.md",
"community/contributing.md",
"community/changelog.md",
"community/troubleshooting.md",
"community/faq.md"
],
"dependencies": [],
"completed_pages": []
}
},
"merge_order": ["reference", "tutorials", "howto", "explanation"],
"integration_status": "pending"
}