Complete how-to guide documentation

Add comprehensive problem-oriented how-to guides following Diátaxis framework: - Block specific bots from crawling the site - Customize LLM instructions for AI assistants - Add team members to humans.txt - Filter sitemap pages - Configure cache headers for discovery files - Environment-specific configuration - Integration with Astro content collections - Custom templates for discovery files - ActivityPub/Fediverse integration via WebFinger Each guide provides: - Clear prerequisites - Step-by-step solutions - Multiple approaches/variations - Expected outcomes - Alternative approaches - Common issues and troubleshooting Total: 9 guides, 6,677 words
2025-11-08 23:32:22 -07:00 · 2025-11-08 23:32:22 -07:00 · 74cffc2842
commit 74cffc2842
parent f8d4e10ffc
18 changed files with 4456 additions and 337 deletions
--- a/docs/src/content/docs/explanation/canary-explained.md
+++ b/docs/src/content/docs/explanation/canary-explained.md
@ -1,31 +1,231 @@
 ---
 title: Warrant Canaries
-description: Understanding warrant canaries and transparency
+description: Understanding warrant canaries and transparency mechanisms
 ---
-Learn how warrant canaries work and their role in organizational transparency.
+A warrant canary is a method for organizations to communicate the **absence** of secret government orders through regular public statements. The concept comes from the canaries coal miners once carried - their silence indicated danger.
-:::note[Work in Progress]
+## The Gag Order Problem
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+Certain legal instruments (National Security Letters in the US, similar mechanisms elsewhere) can compel organizations to:
-This section will include:
+1. Provide user data or access to systems
- Detailed explanations
+2. Never disclose that the request was made
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+This creates an information asymmetry - users can't know if their service provider has been compromised by government orders.
- [Configuration Reference](/reference/configuration/)
+Warrant canaries address this by inverting the communication: instead of saying "we received an order" (which is forbidden), the organization regularly says "we have NOT received an order."
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+If the statement stops or changes, users can infer something happened.
- Check our [FAQ](/community/faq/)
+## How It Works
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+A simple canary statement:
 ```
 As of 2024-11-08, Example Corp has NOT received:
  - National Security Letters
  - FISA court orders
  - Gag orders preventing disclosure
  - Secret government requests for user data
  - Requests to install surveillance capabilities
 ```
 The organization publishes this monthly. Users monitor it. If November's update doesn't appear, or the statements change, users know to investigate.
 The canary communicates through **absence** rather than disclosure.
 ## Legal Theory and Limitations
 Warrant canaries operate in a legal gray area. The theory:
 - Compelled speech (forcing you to lie) may violate free speech rights
 - Choosing to remain silent is protected
 - Government can prevent disclosure but cannot compel false statements
 This hasn't been extensively tested in court. Canaries are no guarantee, but they provide a transparency mechanism where direct disclosure is prohibited.
 Important limitations:
 - **No legal precedent**: Courts haven't ruled definitively on validity
 - **Jurisdictional differences**: What works in one country may not in another
 - **Sophistication of threats**: Adversaries may compel continued updates
 - **Interpretation challenges**: Absence could mean many things
 Canaries are part of a transparency strategy, not a complete solution.
 ## What Goes in a Canary
 The integration's default statements cover common government data requests:
 **National Security Letters (NSLs)**: US administrative subpoenas for subscriber information
 **FISA court orders**: Foreign Intelligence Surveillance Act orders
 **Gag orders**: Any order preventing disclosure of requests
 **Surveillance requests**: Secret requests for user data
 **Backdoor requests**: Demands to install surveillance capabilities
 You can customize these or add organization-specific concerns.
 ## Frequency and Expiration
 Canaries must update regularly. The frequency determines trust:
 **Daily**: Maximum transparency, high maintenance burden
 **Weekly**: Good for high-security contexts
 **Monthly**: Standard for most organizations
 **Quarterly**: Minimum for credibility
 **Yearly**: Too infrequent to be meaningful
 The integration auto-calculates expiration based on frequency:
 - Daily: 2 days
 - Weekly: 10 days
 - Monthly: 35 days
 - Quarterly: 100 days
 - Yearly: 380 days
 These provide buffer time while ensuring staleness is obvious.
 ## The Personnel Statement
 A sophisticated addition is the personnel statement:
 ```
 Key Personnel Statement: All key personnel with access to
 infrastructure remain free and under no duress.
 ```
 This addresses scenarios where individuals are compelled to act under physical threat or coercion.
 If personnel are compromised, the statement can be omitted without violating gag orders (since it's not disclosing a government request).
 ## Verification Mechanisms
 Mere publication isn't enough - users need to verify authenticity:
 ### PGP Signatures
 Sign canary.txt with your organization's PGP key:
 ```
 Verification: https://example.com/canary.txt.asc
 ```
 This proves the canary came from you and hasn't been tampered with.
 ### Blockchain Anchoring
 Publish a hash of the canary to a blockchain:
 ```
 Blockchain-Proof: ethereum:0x123...abc:0xdef...789
 Blockchain-Timestamp: 2024-11-08T12:00:00Z
 ```
 This creates an immutable, time-stamped record that the canary existed at a specific moment.
 Anyone can verify the canary matches the blockchain hash, preventing retroactive alterations.
 ### Previous Canary Links
 Link to the previous canary:
 ```
 Previous-Canary: https://example.com/canary-2024-10.txt
 ```
 This creates a chain of trust. If an attacker compromises your site and tries to backdate canaries, the chain breaks.
 ## What Absence Means
 If a canary stops updating or changes, it doesn't definitively mean government compromise. Possible reasons:
 - Organization received a legal order (the intended signal)
 - Technical failure prevented update
 - Personnel forgot or were unable to update
 - Organization shut down or changed practices
 - Security incident prevented trusted publication
 Users must interpret absence in context. Multiple verification methods help distinguish scenarios.
 ## Building Trust Over Time
 A new canary has limited credibility. Trust builds through:
 1. **Consistency**: Regular updates on schedule
 2. **Verification**: Multiple cryptographic proofs
 3. **Transparency**: Clear explanation of canary purpose and limitations
 4. **History**: Years of reliable updates
 5. **Community**: External monitoring and verification
 Organizations should start canaries early, before they're needed, to build this trust.
 ## The Integration's Approach
 This integration makes canaries accessible:
 **Auto-expiration**: Calculated from frequency
 **Default statements**: Cover common concerns
 **Dynamic generation**: Functions can generate statements at build time
 **Verification support**: Links to PGP signatures and blockchain proofs
 **Update reminders**: Clear expiration in content
 You configure once, the integration handles timing and formatting.
 ## When to Use Canaries
 Canaries make sense for:
 - Organizations handling sensitive user data
 - Services likely to receive government data requests
 - Privacy-focused companies
 - Organizations operating in multiple jurisdictions
 - Platforms used by activists, journalists, or vulnerable groups
 They're less relevant for:
 - Personal blogs without user data
 - Purely informational sites
 - Organizations that can't commit to regular updates
 - Contexts where legal risks outweigh benefits
 ## Practical Considerations
 **Update process**: Who's responsible for monthly updates?
 **Backup procedures**: What if primary person is unavailable?
 **Legal review**: Has counsel approved canary language and process?
 **Monitoring**: Who watches for expiration?
 **Communication**: How will users be notified of canary changes?
 **Contingency**: What's the plan if you must stop publishing?
 These operational questions matter as much as the canary itself.
 ## The Limitations
 Canaries are not magic:
 - They rely on legal interpretations that haven't been tested
 - Sophisticated adversaries may compel continued updates
 - Absence is ambiguous - could be many causes
 - Only useful for orders that come with gag provisions
 - Don't address technical compromises or insider threats
 They're one tool in a transparency toolkit, not a complete solution.
 ## Real-World Examples
 **Tech companies**: Some publish annual or quarterly canaries as part of transparency reports
 **VPN providers**: Many use canaries to signal absence of data retention orders
 **Privacy-focused services**: Canaries are common among services catering to privacy-conscious users
 **Open source projects**: Some maintainers publish personal canaries about project compromise
 The practice is growing as awareness of surveillance increases.
 ## Related Topics
 - [Security.txt](/explanation/security-explained/) - Complementary transparency for security issues
 - [Canary Reference](/reference/canary/) - Complete configuration options
 - [Blockchain Verification](/how-to/canary-verification/) - Setting up cryptographic proofs
--- a/docs/src/content/docs/explanation/humans-explained.md
+++ b/docs/src/content/docs/explanation/humans-explained.md
@ -3,29 +3,306 @@ title: Understanding humans.txt
 description: The human side of discovery files
 ---
-Explore the humans.txt initiative and how it credits the people behind websites.
+In a web dominated by machine-readable metadata, humans.txt is a delightful rebellion. It's a file written by humans, for humans, about the humans who built the website you're visiting.
-:::note[Work in Progress]
+## The Initiative
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+humans.txt emerged in 2008 from a simple observation: websites have extensive metadata for machines (robots.txt, sitemaps, structured data) but nothing to credit the people who built them.
-This section will include:
+The initiative proposed a standard format for human-readable credits, transforming the impersonal `/humans.txt` URL into a space for personality, gratitude, and transparency.
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+## What Makes It Human
- [Configuration Reference](/reference/configuration/)
+Unlike other discovery files optimized for parsing, humans.txt embraces readability and creativity:
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+```
 /* TEAM */
 Developer: Jane Doe
 Role: Full-stack wizardry
 Location: Portland, OR
 Favorite beverage: Cold brew coffee
- Check our [FAQ](/community/faq/)
+/* THANKS */
- Visit [Troubleshooting](/community/troubleshooting/)
+- Stack Overflow (for everything)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+- My rubber duck debugging companion
 - Coffee, obviously
 /* SITE */
 Built with: Blood, sweat, and JavaScript
 Fun fact: Deployed 47 times before launch
 ```
 Notice the tone - casual, personal, fun. This isn't corporate boilerplate. It's a connection between builders and users.
 ## Why It Matters
 On the surface, humans.txt seems frivolous. Who cares about credits buried in a text file?
 But consider the impact:
 **Recognition**: Developers, designers, and content creators work in the shadows. Humans.txt brings them into the light.
 **Transparency**: Users curious about how your site works can see the tech stack and team behind it.
 **Recruitment**: Talented developers browse humans.txt files. Listing your stack and philosophy attracts aligned talent.
 **Culture**: A well-crafted humans.txt reveals company culture and values better than any about page.
 **Humanity**: In an increasingly automated web, humans.txt reminds us that real people built this.
 ## The Standard Sections
 The initiative proposes several standard sections:
 ### TEAM
 Credits for everyone who contributed:
 ```
 /* TEAM */
 Name: Alice Developer
 Role: Lead Developer
 Contact: alice@example.com
 Twitter: @alicedev
 From: Brooklyn, NY
 ```
 List everyone - developers, designers, writers, managers. Projects are team efforts.
 ### THANKS
 Acknowledgments for inspiration, tools, and support:
 ```
 /* THANKS */
 - The Astro community
 - Open-source maintainers everywhere
 - Our beta testers
 - Late night playlist creators
 ```
 This section humanizes development. We build on the work of others.
 ### SITE
 Technical details about the project:
 ```
 /* SITE */
 Last update: 2024-11-08
 Language: English / Markdown
 Doctype: HTML5
 IDE: VS Code with Vim keybindings
 Components: Astro, React, TypeScript
 Standards: HTML5, CSS3, ES2022
 ```
 This satisfies developer curiosity and provides context for technical decisions.
 ## Going Beyond the Standard
 The beauty of humans.txt is flexibility. Many sites add custom sections:
 **STORY**: The origin story of your project
 **PHILOSOPHY**: Development principles and values
 **FUN FACTS**: Easter eggs and behind-the-scenes details
 **COLOPHON**: Typography and design choices
 **ERRORS**: Humorous changelog of mistakes
 These additions transform humans.txt from credits into narrative.
 ## The Integration's Approach
 This integration generates humans.txt with opinionated defaults but encourages customization:
 **Auto-dating**: `lastUpdate: 'auto'` uses current build date
 **Flexible structure**: Add any custom sections you want
 **Dynamic content**: Generate team lists from content collections
 **Rich metadata**: Include social links, locations, and personal touches
 The goal is making credits easy enough that you'll actually maintain them.
 ## Real-World Examples
 **Humanstxt.org** (the initiative's site):
 ```
 /* TEAM */
 Creator: Abel Cabans
 Site: http://abelcabans.com
 Twitter: @abelcabans
 Location: Sant Cugat del Vallès, Barcelona, Spain
 /* THANKS */
 - All the people who have contributed
 - Spread the word!
 /* SITE */
 Last update: 2024/01/15
 Standards: HTML5, CSS3
 Components: Jekyll
 Software: TextMate, Git
 ```
 Clean, simple, effective.
 **Creative Agency** (fictional but typical):
 ```
 /* TEAM */
 Creative Director: Max Wilson
 Role: Visionary chaos coordinator
 Contact: max@agency.com
 Fun fact: Has never missed a deadline (barely)
 Designer: Sarah Chen
 Role: Pixel perfectionist
 Location: San Francisco
 Tool of choice: Figma, obviously
 Developer: Jordan Lee
 Role: Code whisperer
 From: Remote (currently Bali)
 Coffee order: Oat milk cortado
 /* THANKS */
 - Our clients for trusting us with their dreams
 - The internet for cat videos during crunch time
 - Figma for not crashing during presentations
 /* STORY */
 We started in a garage. Not for dramatic effect - office
 space in SF is expensive. Three friends with complementary
 skills and a shared belief that design should be delightful.
 Five years later, we're still in that garage (now with
 better chairs). But we've shipped products used by millions
 and worked with brands we admired as kids.
 We believe in:
 - Craftsmanship over shortcuts
 - Accessibility as a baseline, not a feature
 - Open source as community participation
 - Making the web more fun
 /* SITE */
 Built with: Astro, Svelte, TypeScript, TailwindCSS
 Deployed on: Cloudflare Pages
 Font: Inter (because we're not monsters)
 Colors: Custom palette inspired by Bauhaus
 Last rewrite: 2024 (the third time's the charm)
 ```
 Notice the personality, the details, the humanity.
 ## The "Last Update" Decision
 The `lastUpdate` field presents a philosophical question: should it reflect content updates or just site updates?
 **Content perspective**: Change date when humans.txt content changes
 **Site perspective**: Change date when any part of the site deploys
 The integration defaults to site perspective (auto-update on every build). This ensures the date always reflects current site state, even if humans.txt content stays static.
 But you can override with a specific date if you prefer manual control.
 ## Social Links and Contact Info
 humans.txt is a great place for social links:
 ```
 /* TEAM */
 Name: Developer Name
 Twitter: @username
 GitHub: username
 LinkedIn: /in/username
 Mastodon: @username@instance.social
 ```
 This provides discoverable contact information without cluttering your UI.
 It's particularly valuable for open-source projects where contributors want to connect.
 ## The Gratitude Practice
 Writing a good THANKS section is a gratitude practice. It forces you to acknowledge the shoulders you stand on:
 - Which open-source projects made your work possible?
 - Who provided feedback, testing, or encouragement?
 - What tools, resources, or communities helped you learn?
 - Which mistakes taught you valuable lessons?
 This reflection benefits you as much as it credits others.
 ## Humor and Personality
 humans.txt invites creativity. Some examples:
 ```
 /* FUN FACTS */
 - Entire site built during one caffeinated weekend
 - 437 commits with message "fix typo"
 - Originally designed in Figma, rebuilt in Sketch, launched from code
 - The dog in our 404 page is the CEO's actual dog
 - We've used Comic Sans exactly once (regrettably)
 ```
 This personality differentiates you and creates connection.
 ## When Not to Use Humor
 Professional context matters. A bank's humans.txt should be more restrained than a gaming startup's.
 Match the tone to your audience and brand. Personality doesn't require jokes.
 Simple sincerity works too:
 ```
 /* TEAM */
 We're a team of 12 developers across 6 countries
 working to make financial services more accessible.
 /* THANKS */
 To the users who trust us with their financial data -
 we take that responsibility seriously every day.
 ```
 ## Maintenance Considerations
 humans.txt requires maintenance:
 - Update when team members change
 - Refresh tech stack as you adopt new tools
 - Add new thanks as you use new resources
 - Keep contact information current
 The integration helps by supporting dynamic content:
 ```typescript
 humans: {
  team: await getCollection('team'), // Auto-sync with team content
  site: {
    lastUpdate: 'auto',              // Auto-update on each build
    techStack: Object.keys(deps)     // Extract from package.json
  }
 }
 ```
 This reduces manual maintenance burden.
 ## The Browse Experience
 Most users never see humans.txt. And that's okay.
 The file serves several audiences:
 **Curious users**: The 1% who look behind the curtain
 **Developers**: Evaluating tech stack for integration or inspiration
 **Recruiters**: Understanding team culture and capabilities
 **You**: Reflection and gratitude practice during creation
 It's not about traffic - it's about transparency and humanity.
 ## Related Topics
 - [Content Collections Integration](/how-to/content-collections/) - Auto-generate team lists
 - [Humans.txt Reference](/reference/humans/) - Complete configuration options
 - [Examples](/examples/blog/) - See humans.txt in context
--- a/docs/src/content/docs/explanation/llms-explained.md
+++ b/docs/src/content/docs/explanation/llms-explained.md
@ -1,31 +1,213 @@
 ---
 title: Understanding llms.txt
-description: What is llms.txt and why it matters
+description: How AI assistants discover and understand your website
 ---
-Learn about the llms.txt specification and how it helps AI assistants.
+llms.txt is the newest member of the discovery file family, emerging in response to a fundamental shift in how content is consumed on the web. While search engines index and retrieve, AI language models read, understand, and synthesize.
-:::note[Work in Progress]
+## Why AI Needs Different Guidance
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+Traditional search engines need to know **what exists and where**. They build indexes mapping keywords to pages.
-This section will include:
+AI assistants need to know **what things mean and how to use them**. They need context, instructions, and understanding of relationships between content.
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Consider the difference:
- [Configuration Reference](/reference/configuration/)
+**Search engine thinking**: "This page contains the word 'API' and is located at /docs/api"
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+**AI assistant thinking**: "This site offers a REST API at /api/endpoint that requires authentication. When users ask how to integrate, I should explain the auth flow and reference the examples at /docs/examples"
- Check our [FAQ](/community/faq/)
+llms.txt bridges this gap by providing **semantic context** that goes beyond structural metadata.
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+## The Information Architecture
 llms.txt follows a simple, human-readable structure:
 ```
 # Site Description
 > One-line tagline
 ## Site Information
 Basic facts about the site
 ## For AI Assistants
 Instructions and guidelines
 ## Important Pages
 Key resources to know about
 ## API Endpoints
 Available programmatic access
 ```
 This structure mirrors how you'd brief a human assistant about your site. It's not rigid XML or JSON - it's conversational documentation optimized for language model consumption.
 ## What to Include
 The most effective llms.txt files provide:
 **Description**: Not just what your site is, but **why it exists**. "E-commerce platform" is weak. "E-commerce platform focused on sustainable products with carbon footprint tracking" gives context.
 **Key Features**: The 3-5 things that make your site unique or particularly useful. These help AI assistants understand what problems you solve.
 **Important Pages**: Not a sitemap (that's what sitemap.xml is for), but the **handful of pages** that provide disproportionate value. Think: getting started guide, API docs, pricing.
 **Instructions**: Specific guidance on how AI should represent your content. This is where you establish voice, correct common misconceptions, and provide task-specific guidance.
 **API Endpoints**: If you have programmatic access, describe it. AI assistants can help users integrate with your service if they know endpoints exist.
 ## The Instruction Set Pattern
 The most powerful part of llms.txt is the instructions section. This is where you teach AI assistants how to be helpful about your site.
 Effective instructions are:
 **Specific**: "When users ask about authentication, explain we use OAuth2 and point them to /docs/auth"
 **Actionable**: "Check /api/status before suggesting users try the API"
 **Context-aware**: "Remember that we're focused on accessibility - always mention a11y features"
 **Preventive**: "We don't offer feature X - suggest alternatives Y or Z instead"
 Think of it as training an employee who'll be answering questions about your product. What would you want them to know?
 ## Brand Voice and Tone
 AI assistants can adapt their responses to match your brand if you provide guidance:
 ```
 ## Brand Voice
 - Professional but approachable
 - Technical accuracy over marketing speak
 - Always mention open-source nature
 - Emphasize privacy and user control
 ```
 This helps ensure AI representations of your site feel consistent with your actual brand identity.
 ## Tech Stack Transparency
 Including your tech stack serves multiple purposes:
 1. **Helps AI assistants answer developer questions** ("Can I use this with React?" - "Yes, it's built on React")
 2. **Aids troubleshooting** (knowing the framework helps diagnose integration issues)
 3. **Attracts contributors** (developers interested in your stack are more likely to contribute)
 Be specific but not exhaustive. "Built with Astro, TypeScript, and Tailwind" is better than listing every npm package.
 ## API Documentation
 If your site offers APIs, llms.txt should describe them at a high level:
 ```
 ## API Endpoints
 - GET /api/products - List all products
  Authentication: API key required
  Returns: JSON array of product objects
 - POST /api/calculate-carbon - Calculate carbon footprint
  Authentication: Not required
  Accepts: JSON with cart data
  Returns: Carbon footprint estimate
 ```
 This isn't meant to replace full API documentation - it's a quick reference so AI assistants know what's possible.
 ## The Relationship with robots.txt
 robots.txt and llms.txt work together:
 **robots.txt** says: "AI bots, you can access these paths"
 **llms.txt** says: "Here's how to understand what you find there"
 The integration coordinates them automatically:
 1. robots.txt includes rules for LLM user-agents
 2. Those rules reference llms.txt
 3. LLM bots follow robots.txt to respect boundaries
 4. Then read llms.txt for guidance on content interpretation
 ## Dynamic vs. Static Content
 llms.txt can be either static (same content always) or dynamic (generated at build time):
 **Static**: Your site description and brand voice rarely change
 **Dynamic**: Current API endpoints, team members, or feature status might update frequently
 The integration supports both approaches. You can provide static strings or functions that generate content at build time.
 This is particularly useful for:
 - Extracting API endpoints from OpenAPI specs
 - Listing important pages from content collections
 - Keeping tech stack synchronized with package.json
 - Generating context from current deployment metadata
 ## What Not to Include
 llms.txt should be concise and focused. Avoid:
 **Comprehensive documentation**: Link to it, don't duplicate it
 **Entire sitemaps**: That's what sitemap.xml is for
 **Legal boilerplate**: Keep it in your terms of service
 **Overly specific instructions**: Trust AI to handle common cases
 **Marketing copy**: Be informative, not promotional
 Think of llms.txt as **strategic context**, not exhaustive documentation.
 ## Measuring Impact
 Unlike traditional SEO, llms.txt impact is harder to measure directly. You won't see "llms.txt traffic" in analytics.
 Instead, look for:
 - AI assistants correctly representing your product
 - Reduction in mischaracterizations or outdated information
 - Appropriate use of your APIs by AI-assisted developers
 - Consistency in how different AI systems describe your site
 The goal is **accurate representation**, not traffic maximization.
 ## Privacy and Data Concerns
 A common concern: "Doesn't llms.txt help AI companies train on my content?"
 Important points:
 1. **AI training happens regardless** of llms.txt - they crawl public content anyway
 2. **llms.txt doesn't grant permission** - it provides context for content they already access
 3. **robots.txt controls access** - if you don't want AI crawlers, use robots.txt to block them
 4. **llms.txt helps AI represent you accurately** - better context = better representation
 Think of it this way: if someone's going to talk about you, would you rather they have accurate information or guess?
 ## The Evolution of AI Context
 llms.txt is a living standard, evolving as AI capabilities grow:
 **Current**: Basic site description and instructions
 **Near future**: Structured data about capabilities, limitations, and relationships
 **Long term**: Semantic graphs of site knowledge and interconnections
 By adopting llms.txt now, you're positioning your site to benefit as these capabilities mature.
 ## Real-World Patterns
 **Documentation sites**: Emphasize how to search docs, common pitfalls, and where to find examples
 **E-commerce**: Describe product categories, search capabilities, and checkout process
 **SaaS products**: Explain core features, authentication, and API availability
 **Blogs**: Highlight author expertise, main topics, and content philosophy
 The pattern that works best depends on how people use AI to interact with your type of content.
 ## Related Topics
 - [AI Integration Strategy](/explanation/ai-integration/) - Broader AI considerations
 - [Robots.txt Coordination](/explanation/robots-explained/) - How robots.txt and llms.txt work together
 - [LLMs.txt Reference](/reference/llms/) - Complete configuration options
--- a/docs/src/content/docs/explanation/robots-explained.md
+++ b/docs/src/content/docs/explanation/robots-explained.md
@ -1,31 +1,182 @@
 ---
-title: Understanding robots.txt
+title: How robots.txt Works
-description: Deep dive into robots.txt and its purpose
+description: Understanding robots.txt and web crawler communication
 ---
-Comprehensive explanation of robots.txt, its history, and modern usage.
+Robots.txt is the oldest and most fundamental discovery file on the web. Since 1994, it has served as the **polite agreement** between website owners and automated crawlers about what content can be accessed and how.
-:::note[Work in Progress]
+## The Gentleman's Agreement
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+robots.txt is not a security mechanism - it's a social contract. It tells crawlers "please don't go here" rather than "you cannot go here." Any crawler can ignore it, and malicious ones often do.
-This section will include:
+This might seem like a weakness, but it's actually a strength. The file works because the overwhelming majority of automated traffic comes from legitimate crawlers (search engines, monitoring tools, archive services) that want to be good citizens of the web.
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Think of it like a "No Trespassing" sign on private property. It won't stop determined intruders, but it clearly communicates boundaries to honest visitors and provides legal/ethical grounds for addressing violations.
- [Configuration Reference](/reference/configuration/)
+## What robots.txt Solves
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+Before robots.txt, early search engines would crawl websites aggressively, sometimes overwhelming servers or wasting bandwidth on administrative pages. Website owners had no standard way to communicate crawling preferences.
- Check our [FAQ](/community/faq/)
+robots.txt provides three critical capabilities:
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+**1. Access Control**: Specify which paths crawlers can and cannot visit
 **2. Resource Management**: Set crawl delays to prevent server overload
 **3. Signposting**: Point crawlers to important resources like sitemaps
 ## The User-Agent Model
 robots.txt uses a "user-agent" model where rules target specific bots:
 ```
 User-agent: *
 Disallow: /admin/
 User-agent: GoogleBot
 Allow: /api/
 ```
 This allows fine-grained control. You might allow Google to index your API documentation while blocking other crawlers. Or permit archive services to access historical content while disallowing marketing bots.
 The `*` wildcard matches all user-agents, providing default rules. Specific user-agents override these defaults for their particular bot.
 ## The LLM Bot Challenge
 The emergence of AI language models created a new category of web consumers. Unlike traditional search engines that index for retrieval, LLMs process content for training data and context.
 This raises different concerns:
 - Training data usage and attribution
 - Content representation accuracy
 - Server load from context gathering
 - Different resource needs (full pages vs. search snippets)
 The integration addresses this by providing dedicated rules for LLM bots (GPTBot, Claude-Web, Anthropic-AI, etc.) while pointing them to llms.txt for additional context.
 ## Allow vs. Disallow
 A common point of confusion is the relationship between Allow and Disallow directives.
 **Disallow**: Explicitly forbids access to a path
 **Allow**: Creates exceptions to Disallow rules
 Consider this example:
 ```
 User-agent: *
 Disallow: /admin/
 Allow: /admin/public/
 ```
 This says "don't crawl /admin/ except for /admin/public/ which is allowed." The Allow creates a specific exception to the broader Disallow.
 Without any rules, everything is implicitly allowed. You don't need `Allow: /` - that's the default state.
 ## Path Matching
 Path patterns in robots.txt support wildcards and prefix matching:
 - `/api/` matches `/api/` and everything under it
 - `/api/private` matches that specific path
 - `*.pdf` matches any URL containing `.pdf`
 - `/page$` matches `/page` but not `/page/subpage`
 The most specific matching rule wins. If both `/api/` and `/api/public/` have rules for the same user-agent, the longer path takes precedence.
 ## Crawl-Delay: The Double-Edged Sword
 Crawl-delay tells bots to wait between requests:
 ```
 Crawl-delay: 2
 ```
 This means "wait 2 seconds between page requests." It's useful for:
 - Protecting servers with limited resources
 - Preventing rate limiting from triggering
 - Managing bandwidth costs
 But there's a trade-off: slower crawling means it takes longer for your content to be indexed. Set it too high and you might delay important updates from appearing in search results.
 The integration defaults to 1 second - a balanced compromise between politeness and indexing speed.
 ## Sitemap Declaration
 One of robots.txt's most valuable features is sitemap declaration:
 ```
 Sitemap: https://example.com/sitemap-index.xml
 ```
 This tells crawlers "here's a comprehensive list of all my pages." It's more efficient than discovering pages through link following and ensures crawlers know about pages that might not be linked from elsewhere.
 The integration automatically adds your sitemap reference, keeping it synchronized with your Astro site URL.
 ## Common Mistakes
 **Blocking CSS/JS**: Some sites block `/assets/` thinking it saves bandwidth. This prevents search engines from rendering your pages correctly, harming SEO.
 **Disallowing Everything**: `Disallow: /` blocks all crawlers completely. This is rarely what you want - even internal tools need access.
 **Forgetting About Dynamic Content**: If your search or API routes generate content dynamically, consider whether crawlers should access them.
 **Security Through Obscurity**: Don't rely on robots.txt to hide sensitive content. Use proper authentication instead.
 ## Why Not Just Use Authentication?
 You might wonder why we need robots.txt if we can protect content with authentication.
 The answer is that most website content should be publicly accessible - that's the point. You want search engines to index your blog, documentation, and product pages.
 robots.txt lets you have **public content that crawlers respect** without requiring authentication. It's about communicating intent, not enforcing access control.
 ## The Integration's Approach
 This integration generates robots.txt with opinionated defaults:
 - Allow all bots by default (the web works best when discoverable)
 - Include LLM-specific bots with llms.txt guidance
 - Reference your sitemap automatically
 - Set a reasonable 1-second crawl delay
 - Provide easy overrides for your specific needs
 You can customize any aspect, but the defaults represent best practices for most sites.
 ## Looking at Real-World Examples
 **Wikipedia** (`robots.txt`):
 ```
 User-agent: *
 Disallow: /wiki/Special:
 Crawl-delay: 1
 Sitemap: https://en.wikipedia.org/sitemap.xml
 ```
 Simple and effective. Block special admin pages, allow everything else.
 **GitHub** (simplified):
 ```
 User-agent: *
 Disallow: /search/
 Disallow: */pull/
 Allow: */pull$/
 ```
 Notice how they block pull request search but allow individual pull request pages. This prevents crawler loops while keeping content accessible.
 ## Verification and Testing
 After deploying, verify your robots.txt:
 1. Visit `yoursite.com/robots.txt` directly
 2. Use Google Search Console's robots.txt tester
 3. Check specific user-agent rules with online validators
 4. Monitor crawler behavior in server logs
 The file is cached aggressively by crawlers, so changes may take time to propagate.
 ## Related Topics
 - [SEO Impact](/explanation/seo/) - How robots.txt affects search rankings
 - [LLMs.txt Integration](/explanation/llms-explained/) - Connecting bot control with AI guidance
 - [Robots.txt Reference](/reference/robots/) - Complete configuration options
--- a/docs/src/content/docs/explanation/security-explained.md
+++ b/docs/src/content/docs/explanation/security-explained.md
@ -1,31 +1,277 @@
 ---
 title: Security.txt Standard (RFC 9116)
-description: Understanding the security.txt RFC
+description: Understanding RFC 9116 and responsible vulnerability disclosure
 ---
-Learn about RFC 9116 and why security.txt is important for responsible disclosure.
+security.txt, standardized as RFC 9116 in 2022, solves a deceptively simple problem: when a security researcher finds a vulnerability in your website, how do they tell you about it?
-:::note[Work in Progress]
+## The Responsible Disclosure Problem
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+Before security.txt, researchers faced a frustrating journey:
-This section will include:
+1. Find vulnerability in example.com
- Detailed explanations
+2. Search for security contact information
- Code examples
+3. Check footer, about page, contact page
- Best practices
+4. Try info@, security@, admin@ email addresses
- Common patterns
+5. Hope someone reads it and knows what to do with it
- Troubleshooting tips
+6. Wait weeks for response (or get none)
 7. Consider public disclosure out of frustration
-## Related Pages
+This process was inefficient for researchers and dangerous for organizations. Vulnerabilities went unreported or were disclosed publicly before fixes could be deployed.
- [Configuration Reference](/reference/configuration/)
+## The RFC 9116 Solution
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+RFC 9116 standardizes a machine-readable file at `/.well-known/security.txt` containing:
- Check our [FAQ](/community/faq/)
+- **Contact**: How to reach your security team (required)
- Visit [Troubleshooting](/community/troubleshooting/)
+- **Expires**: When this information becomes stale (required)
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+- **Canonical**: The authoritative location of this file
 - **Encryption**: PGP keys for encrypted communication
 - **Acknowledgments**: Hall of fame for researchers
 - **Policy**: Your disclosure policy URL
 - **Preferred-Languages**: Languages you can handle reports in
 - **Hiring**: Security job opportunities
 This provides a **standardized, discoverable, machine-readable** security contact mechanism.
 ## Why .well-known?
 The `/.well-known/` directory is an RFC 8615 standard for site-wide metadata. It's where clients expect to find standard configuration files.
 By placing security.txt in `/.well-known/security.txt`, the RFC ensures:
 - **Consistent location**: No guessing where to find it
 - **Standard compliance**: Follows web architecture patterns
 - **Tool support**: Security scanners can automatically check for it
 The integration generates security.txt at the correct location automatically.
 ## The Required Fields
 RFC 9116 mandates two fields:
 ### Contact
 At least one contact method (email or URL):
 ```
 Contact: mailto:security@example.com
 Contact: https://example.com/security-contact
 Contact: tel:+1-555-0100
 ```
 Multiple contacts provide redundancy. If one channel fails, researchers have alternatives.
 Email addresses automatically get `mailto:` prefixes. URLs should point to security contact forms or issue trackers.
 ### Expires
 An ISO 8601 timestamp indicating when to stop trusting this file:
 ```
 Expires: 2025-12-31T23:59:59Z
 ```
 This is critical - it prevents researchers from reporting to stale contacts that are no longer monitored.
 The integration defaults to `expires: 'auto'`, setting expiration to one year from build time. This ensures the field updates on every deployment.
 ## Optional but Valuable Fields
 ### Encryption
 URLs to PGP public keys for encrypted vulnerability reports:
 ```
 Encryption: https://example.com/pgp-key.txt
 Encryption: openpgp4fpr:5F2DE18D3AFE0FD7A1F2F5A3E4562BB79E3B2E80
 ```
 This enables researchers to send sensitive details securely, preventing disclosure to attackers monitoring email.
 ### Acknowledgments
 URL to your security researcher hall of fame:
 ```
 Acknowledgments: https://example.com/security/hall-of-fame
 ```
 Public recognition motivates responsible disclosure. Researchers appreciate being credited for their work.
 ### Policy
 URL to your vulnerability disclosure policy:
 ```
 Policy: https://example.com/security/disclosure-policy
 ```
 This clarifies expectations: response timelines, safe harbor provisions, bug bounty details, and disclosure coordination.
 ### Preferred-Languages
 Languages your security team can handle:
 ```
 Preferred-Languages: en, es, fr
 ```
 This helps international researchers communicate effectively. Use ISO 639-1 language codes.
 ### Hiring
 URL to security job openings:
 ```
 Hiring: https://example.com/careers/security
 ```
 Talented researchers who find vulnerabilities might be hiring prospects. This field provides a connection point.
 ## The Canonical Field
 The Canonical field specifies the authoritative location:
 ```
 Canonical: https://example.com/.well-known/security.txt
 ```
 This matters for:
 - **Verification**: Ensures you're reading the correct version
 - **Mirrors**: Multiple domains can reference the same canonical file
 - **Historical context**: Archives know which version was authoritative
 The integration sets this automatically based on your site URL.
 ## Why Expiration Matters
 The Expires field isn't bureaucracy - it's safety.
 Consider a scenario:
 1. Company sets up security.txt pointing to security@company.com
 2. Security team disbands, email is decommissioned
 3. Attacker registers security@company.com domain after it expires
 4. Researcher reports vulnerability to attacker's email
 5. Attacker has vulnerability details before the company does
 Expiration prevents this. If security.txt is expired, researchers know not to trust it and must find alternative contact methods.
 Best practice: Set expiration to 1 year maximum. The integration's `'auto'` option handles this.
 ## Security.txt in Practice
 A minimal production security.txt:
 ```
 Canonical: https://example.com/.well-known/security.txt
 Contact: mailto:security@example.com
 Expires: 2025-11-08T00:00:00.000Z
 ```
 A comprehensive implementation:
 ```
 Canonical: https://example.com/.well-known/security.txt
 Contact: mailto:security@example.com
 Contact: https://example.com/security-report
 Expires: 2025-11-08T00:00:00.000Z
 Encryption: https://example.com/pgp-key.asc
 Acknowledgments: https://example.com/security/researchers
 Preferred-Languages: en, de, ja
 Policy: https://example.com/security/disclosure
 Hiring: https://example.com/careers/security-engineer
 ```
 ## Common Mistakes
 **Using relative URLs**: All URLs must be absolute (`https://...`)
 **Missing mailto: prefix**: Email addresses need `mailto:` - the integration adds this automatically
 **Far-future expiration**: Don't set expiration 10 years out. Keep it to 1 year maximum.
 **No monitoring**: Set up alerts when security.txt approaches expiration
 **Stale contacts**: Verify listed contacts still work
 ## Building a Disclosure Program
 security.txt is the entry point to vulnerability disclosure, but you need supporting infrastructure:
 **Monitoring**: Watch the security inbox religiously
 **Triage process**: Quick initial response (even if just "we're investigating")
 **Fix timeline**: Clear expectations about patch development
 **Disclosure coordination**: Work with researcher on public disclosure timing
 **Recognition**: Credit researchers in release notes and acknowledgments page
 The integration makes the entry point easy. The program around it requires organizational commitment.
 ## Security Through Transparency
 Some organizations hesitate to publish security.txt, fearing it invites attacks.
 The reality: security researchers are already looking. security.txt helps them help you.
 Without it:
 - Vulnerabilities go unreported
 - Researchers waste time finding contacts
 - Frustration leads to premature public disclosure
 - You look unprofessional to security community
 With it:
 - Clear channel for responsible disclosure
 - Faster vulnerability reports
 - Better researcher relationships
 - Professional security posture
 ## Verification and Monitoring
 After deploying security.txt:
 1. Verify it's accessible at `/.well-known/security.txt`
 2. Check field formatting with RFC 9116 validators
 3. Test contact methods work
 4. Set up monitoring for expiration date
 5. Create calendar reminder to refresh before expiration
 Many organizations set up automated checks that alert if security.txt will expire within 30 days.
 ## Integration with Bug Bounty Programs
 If you run a bug bounty program, reference it in your policy:
 ```
 Policy: https://example.com/bug-bounty
 ```
 This connects researchers to your incentive program immediately.
 security.txt and bug bounties work together - the file provides discovery, the program provides incentive structure.
 ## Legal Considerations
 security.txt should coordinate with your legal team's disclosure policy.
 Consider including:
 - Safe harbor provisions (no legal action against good-faith researchers)
 - Scope definition (what systems are in/out of scope)
 - Rules of engagement (don't exfiltrate data, etc.)
 - Disclosure timeline expectations
 These protect both your organization and researchers.
 ## Related Topics
 - [Canary.txt Explained](/explanation/canary-explained/) - Complementary transparency mechanism
 - [Security.txt Reference](/reference/security/) - Complete configuration options
 - [Security Best Practices](/how-to/environment-config/) - Securing your deployment
--- a/docs/src/content/docs/explanation/seo.md
+++ b/docs/src/content/docs/explanation/seo.md
@ -1,31 +1,327 @@
 ---
 title: SEO & Discoverability
-description: How discovery files improve SEO
+description: How discovery files improve search engine optimization
 ---
-Understand how properly configured discovery files enhance search engine optimization.
+Discovery files and SEO have a symbiotic relationship. While some files (like humans.txt) don't directly impact rankings, others (robots.txt, sitemaps) are foundational to how search engines understand and index your site.
-:::note[Work in Progress]
+## Robots.txt: The SEO Foundation
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+robots.txt is one of the first files search engines request. It determines:
-This section will include:
+- Which pages can be crawled and indexed
- Detailed explanations
+- How aggressively to crawl (via crawl-delay)
- Code examples
+- Where to find your sitemap
- Best practices
+- Special instructions for specific bots
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+### Crawl Budget Optimization
- [Configuration Reference](/reference/configuration/)
+Search engines allocate limited resources to each site - your "crawl budget." robots.txt helps you spend it wisely:
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+**Block low-value pages**: Admin sections, search result pages, and duplicate content waste crawl budget
 **Allow high-value content**: Ensure important pages are accessible
 **Set appropriate crawl-delay**: Balance thorough indexing against server load
- Check our [FAQ](/community/faq/)
+Example SEO-optimized robots.txt:
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+```
 User-agent: *
 Allow: /
 Disallow: /admin/
 Disallow: /search?
 Disallow: /*?sort=*
 Disallow: /api/
 Crawl-delay: 1
 Sitemap: https://example.com/sitemap-index.xml
 ```
 This blocks non-content pages while allowing crawlers to efficiently index your actual content.
 ### The CSS/JS Trap
 A common SEO mistake:
 ```
 # DON'T DO THIS
 Disallow: /assets/
 Disallow: /*.css
 Disallow: /*.js
 ```
 This prevents search engines from fully rendering your pages. Modern SEO requires JavaScript execution for SPAs and interactive content.
 The integration doesn't block assets by default - this is intentional and SEO-optimal.
 ### Sitemap Declaration
 The `Sitemap:` directive in robots.txt is critical for SEO. It tells search engines:
 - All your pages exist (even if not linked)
 - When pages were last modified
 - Relative priority of pages
 - Alternative language versions
 This dramatically improves indexing coverage and freshness.
 ## Sitemaps: The SEO Roadmap
 Sitemaps serve multiple SEO functions:
 ### Discoverability
 Pages not linked from your navigation can still be indexed. This matters for:
 - Deep content structures
 - Recently published pages not yet linked
 - Orphaned pages with valuable content
 - Alternative language versions
 ### Update Frequency
 The `<lastmod>` element signals content freshness:
 ```xml
 <url>
  <loc>https://example.com/article</loc>
  <lastmod>2024-11-08T12:00:00Z</lastmod>
  <changefreq>weekly</changefreq>
 </url>
 ```
 Search engines prioritize recently updated content. Fresh `lastmod` dates encourage re-crawling.
 ### Priority Hints
 The `<priority>` element suggests relative importance:
 ```xml
 <url>
  <loc>https://example.com/important-page</loc>
  <priority>0.9</priority>
 </url>
 <url>
  <loc>https://example.com/minor-page</loc>
  <priority>0.3</priority>
 </url>
 ```
 This is a hint, not a directive. Search engines use it along with other signals.
 ### International SEO
 For multilingual sites, sitemaps declare language alternatives:
 ```xml
 <url>
  <loc>https://example.com/page</loc>
  <xhtml:link rel="alternate" hreflang="es"
    href="https://example.com/es/page"/>
  <xhtml:link rel="alternate" hreflang="fr"
    href="https://example.com/fr/page"/>
 </url>
 ```
 This prevents duplicate content penalties while ensuring all language versions are indexed.
 ## LLMs.txt: The AI SEO Frontier
 Traditional SEO optimizes for search retrieval. llms.txt optimizes for AI representation - the emerging frontier of discoverability.
 ### AI-Generated Summaries
 Search engines increasingly show AI-generated answer boxes. llms.txt helps ensure these summaries:
 - Accurately represent your content
 - Use your preferred terminology and brand voice
 - Highlight your key differentiators
 - Link to appropriate pages
 ### Voice Search Optimization
 Voice assistants rely on AI understanding. llms.txt provides:
 - Natural language context for your content
 - Clarification of ambiguous terms
 - Guidance on how to answer user questions
 - References to authoritative pages
 This improves your chances of being the source for voice search answers.
 ### Content Attribution
 When AI systems reference your content, llms.txt helps ensure:
 - Proper context is maintained
 - Your brand is correctly associated
 - Key features aren't misrepresented
 - Updates propagate to AI models
 Think of it as structured data for AI agents.
 ## Humans.txt: The Indirect SEO Value
 humans.txt doesn't directly impact rankings, but it supports SEO indirectly:
 ### Technical Transparency
 Developers evaluating integration with your platform check humans.txt for tech stack info. This can lead to:
 - Backlinks from integration tutorials
 - Technical blog posts mentioning your stack
 - Developer community discussions
 All of which generate valuable backlinks and traffic.
 ### Brand Signals
 A well-crafted humans.txt signals:
 - Active development and maintenance
 - Professional operations
 - Transparent communication
 - Company culture
 These contribute to overall site authority and trustworthiness.
 ## Security.txt: Trust Signals
 Security.txt demonstrates professionalism and security-consciousness. While not a ranking factor, it:
 - Builds trust with security-conscious users
 - Prevents security incidents that could damage SEO (hacked site penalties)
 - Shows organizational maturity
 - Enables faster vulnerability fixes (preserving site integrity)
 Search engines penalize compromised sites heavily. security.txt helps prevent those penalties.
 ## Integration SEO Benefits
 This integration provides several SEO advantages:
 ### Consistency
 All discovery files reference the same site URL from your Astro config. This prevents:
 - Mixed http/https signals
 - www vs. non-www confusion
 - Subdomain inconsistencies
 Consistency is an underrated SEO factor.
 ### Freshness
 Auto-generated timestamps keep discovery files fresh:
 - Sitemaps show current lastmod dates
 - security.txt expiration updates with each build
 - canary.txt timestamps reflect current build
 Fresh content signals active maintenance.
 ### Correctness
 The integration handles RFC compliance automatically:
 - security.txt follows RFC 9116 exactly
 - robots.txt uses correct syntax
 - Sitemaps follow XML schema
 - WebFinger implements RFC 7033
 Malformed discovery files can harm SEO. The integration prevents errors.
 ## Monitoring SEO Impact
 Track discovery file effectiveness:
 **Google Search Console**:
 - Sitemap coverage reports
 - Crawl statistics
 - Indexing status
 - Mobile usability
 **Crawl behavior analysis**:
 - Server logs showing crawler patterns
 - Crawl-delay effectiveness
 - Blocked vs. allowed URL ratio
 - Time to index new content
 **AI representation monitoring**:
 - How AI assistants describe your site
 - Accuracy of information
 - Attribution and links
 - Brand voice consistency
 ## Common SEO Mistakes
 ### Over-blocking
 Blocking too much harms SEO:
 ```
 # Too restrictive
 Disallow: /blog/?
 Disallow: /products/?
 ```
 This might block legitimate content URLs. Be specific:
 ```
 # Better
 Disallow: /blog?*
 Disallow: /products?sort=*
 ```
 ### Sitemap bloat
 Including every URL hurts more than helps:
 - Don't include parameter variations
 - Skip pagination (keep to representative pages)
 - Exclude search result pages
 - Filter out duplicate content
 Quality over quantity.
 ### Ignoring crawl errors
 Monitor Search Console for:
 - 404s in sitemap
 - Blocked resources search engines need
 - Redirect chains
 - Server errors
 Fix these promptly - they impact ranking.
 ### Stale sitemaps
 Ensure sitemaps update with your content:
 - New pages appear quickly
 - Deleted pages are removed
 - lastmod timestamps are accurate
 - Priority reflects current importance
 The integration's automatic generation ensures freshness.
 ## Future SEO Trends
 Discovery files will evolve with search:
 **AI-first indexing**: Search engines will increasingly rely on structured context (llms.txt) rather than pure crawling
 **Federated discovery**: WebFinger and similar protocols may influence how distributed content is discovered and indexed
 **Transparency signals**: Files like security.txt and canary.txt may become trust signals in ranking algorithms
 **Structured data expansion**: Discovery files complement schema.org markup as structured communication channels
 By implementing comprehensive discovery now, you're positioned for these trends.
 ## Related Topics
 - [Robots.txt Configuration](/reference/robots/) - SEO-optimized robot settings
 - [Sitemap Optimization](/how-to/filter-sitemap/) - Filtering for better SEO
 - [AI Integration Strategy](/explanation/ai-integration/) - Preparing for AI-first search
--- a/docs/src/content/docs/explanation/webfinger-explained.md
+++ b/docs/src/content/docs/explanation/webfinger-explained.md
@ -1,31 +1,309 @@
 ---
 title: WebFinger Protocol (RFC 7033)
-description: Understanding WebFinger resource discovery
+description: Understanding WebFinger and federated resource discovery
 ---
-Deep dive into the WebFinger protocol and its role in federated identity.
+WebFinger (RFC 7033) solves a fundamental problem of the decentralized web: how do you discover information about a resource (person, service, device) when you only have an identifier?
-:::note[Work in Progress]
+## The Discovery Challenge
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+On centralized platforms, discovery is simple. Twitter knows about @username because it's all in one database. But in decentralized systems (email, federated social networks, distributed identity), there's no central registry.
-This section will include:
+WebFinger provides a standardized way to ask: "Given this identifier (email, account name, URL), what can you tell me about it?"
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+## The Query Pattern
- [Configuration Reference](/reference/configuration/)
+WebFinger uses a simple HTTP GET request:
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+```
 GET /.well-known/webfinger?resource=acct:alice@example.com
 ```
- Check our [FAQ](/community/faq/)
+This asks: "What do you know about alice@example.com?"
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+The server responds with a JSON Resource Descriptor (JRD) containing links, properties, and metadata about that resource.
 ## Real-World Use Cases
 ### ActivityPub / Mastodon
 When you follow `@alice@example.com` on Mastodon, your instance:
 1. Queries `example.com/.well-known/webfinger?resource=acct:alice@example.com`
 2. Gets back Alice's ActivityPub profile URL
 3. Fetches her profile and posts from that URL
 4. Subscribes to updates
 WebFinger is the discovery layer that makes federation work.
 ### OpenID Connect
 OAuth/OpenID providers use WebFinger for issuer discovery:
 1. User enters email address
 2. Client extracts domain
 3. Queries WebFinger for OpenID configuration
 4. Discovers authentication endpoints
 5. Initiates OAuth flow
 This enables "email address as identity" without hardcoding provider lists.
 ### Contact Discovery
 Email clients and contact apps use WebFinger to discover:
 - Profile photos and avatars
 - Public keys for encryption
 - Social media profiles
 - Calendar availability
 - Preferred contact methods
 ## The JRD Response Format
 A WebFinger response looks like:
 ```json
 {
  "subject": "acct:alice@example.com",
  "aliases": [
    "https://example.com/@alice",
    "https://example.com/users/alice"
  ],
  "properties": {
    "http://schema.org/name": "Alice Developer"
  },
  "links": [
    {
      "rel": "self",
      "type": "application/activity+json",
      "href": "https://example.com/users/alice"
    },
    {
      "rel": "http://webfinger.net/rel/profile-page",
      "type": "text/html",
      "href": "https://example.com/@alice"
    },
    {
      "rel": "http://webfinger.net/rel/avatar",
      "type": "image/jpeg",
      "href": "https://example.com/avatars/alice.jpg"
    }
  ]
 }
 ```
 **Subject**: The resource being described (often same as query)
 **Aliases**: Alternative identifiers for the same resource
 **Properties**: Key-value metadata (property names must be URIs)
 **Links**: Related resources with relationship types
 ## Link Relations
 The `rel` field uses standardized link relation types:
 **IANA registered**: `self`, `alternate`, `canonical`, etc.
 **WebFinger specific**: `http://webfinger.net/rel/profile-page`, etc.
 **Custom/domain-specific**: Any URI works
 This extensibility allows WebFinger to serve many use cases while remaining standardized.
 ## Static vs. Dynamic Resources
 The integration supports both approaches:
 ### Static Resources
 Define specific resources explicitly:
 ```typescript
 webfinger: {
  resources: [
    {
      resource: 'acct:alice@example.com',
      links: [...]
    }
  ]
 }
 ```
 Use this for a small, known set of identities.
 ### Content Collection Integration
 Generate resources dynamically from Astro content collections:
 ```typescript
 webfinger: {
  collections: [{
    name: 'team',
    resourceTemplate: 'acct:{slug}@example.com',
    linksBuilder: (member) => [...]
  }]
 }
 ```
 This auto-generates WebFinger responses for all collection entries. Add a team member to your content collection, and they become discoverable via WebFinger automatically.
 ## Template Variables
 Resource and subject templates support variables:
 - `{slug}`: Collection entry slug
 - `{id}`: Collection entry ID
 - `{data.fieldName}`: Any field from entry data
 - `{siteURL}`: Your configured site URL
 Example:
 ```typescript
 resourceTemplate: 'acct:{data.username}@{siteURL.hostname}'
 ```
 For a team member with `username: 'alice'` on `example.com`, this generates:
 `acct:alice@example.com`
 ## CORS and Security
 WebFinger responses include:
 ```
 Access-Control-Allow-Origin: *
 ```
 This is intentional - WebFinger is designed for public discovery. If information shouldn't be public, don't put it in WebFinger.
 The protocol assumes:
 - Resources are intentionally discoverable
 - Information is public or intended for sharing
 - Authentication happens at linked resources, not discovery layer
 ## Rel Filtering
 Clients can request specific link types:
 ```
 GET /.well-known/webfinger?resource=acct:alice@example.com&rel=self
 ```
 The server returns only links matching that relation type. This reduces bandwidth and focuses the response.
 The integration handles this automatically.
 ## Why Dynamic Routes
 Unlike other discovery files, WebFinger uses a dynamic route (`prerender: false`). This is because:
 1. Query parameters determine the response
 2. Content collection resources may be numerous
 3. Responses are lightweight enough to generate on-demand
 Static generation would require pre-rendering every possible query, which is impractical for collections.
 ## Building for Federation
 If you want your site to participate in federated protocols:
 **Enable WebFinger**: Makes your users/resources discoverable
 **Implement ActivityPub**: Provide the linked profile/actor endpoints
 **Support WebFinger lookup**: Allow others to discover your resources
 WebFinger is the discovery layer; ActivityPub (or other protocols) provide the functionality.
 ## Team/Author Discovery
 A common pattern for blogs and documentation:
 ```typescript
 webfinger: {
  collections: [{
    name: 'authors',
    resourceTemplate: 'acct:{slug}@myblog.com',
    linksBuilder: (author) => [
      {
        rel: 'http://webfinger.net/rel/profile-page',
        href: `https://myblog.com/authors/${author.slug}`,
        type: 'text/html'
      },
      {
        rel: 'http://webfinger.net/rel/avatar',
        href: author.data.avatar,
        type: 'image/jpeg'
      }
    ],
    propertiesBuilder: (author) => ({
      'http://schema.org/name': author.data.name,
      'http://schema.org/email': author.data.email
    })
  }]
 }
 ```
 Now `acct:alice@myblog.com` resolves to Alice's author page, avatar, and contact info.
 ## Testing WebFinger
 After deployment:
 1. Query directly: `curl 'https://example.com/.well-known/webfinger?resource=acct:alice@example.com'`
 2. Use WebFinger validators/debuggers
 3. Test from federated clients (Mastodon, etc.)
 4. Verify CORS headers are present
 5. Check rel filtering works
 ## Privacy Considerations
 WebFinger makes information **discoverable**. Consider:
 - Don't expose private email addresses or contact info
 - Limit to intentionally public resources
 - Understand that responses are cached
 - Remember `Access-Control-Allow-Origin: *` makes responses widely accessible
 If information shouldn't be public, don't include it in WebFinger responses.
 ## Beyond Social Networks
 WebFinger isn't just for social media. Other applications:
 **Device discovery**: IoT devices announcing capabilities
 **Service discovery**: API endpoints and configurations
 **Calendar/availability**: Free/busy status and booking links
 **Payment addresses**: Cryptocurrency addresses and payment methods
 **Professional profiles**: Credentials, certifications, and portfolios
 The protocol is general-purpose resource discovery.
 ## The Integration's Approach
 This integration makes WebFinger accessible without boilerplate:
 - Auto-generates from content collections
 - Handles template variable substitution
 - Manages CORS and rel filtering
 - Provides type-safe configuration
 - Supports both static and dynamic resources
 You define the mappings, the integration handles the protocol.
 ## When to Use WebFinger
 Enable WebFinger if:
 - You want to participate in federated protocols
 - Your site has user profiles or authors
 - You're building decentralized services
 - You want discoverable team members
 - You're implementing OAuth/OpenID
 Skip it if:
 - Your site is purely informational with no identity component
 - You don't want to expose resource discovery
 - You're not integrating with federated services
 ## Related Topics
 - [ActivityPub Integration](/how-to/activitypub/) - Building on WebFinger for federation
 - [WebFinger Reference](/reference/webfinger/) - Complete configuration options
 - [Content Collections](/how-to/content-collections/) - Dynamic resource generation
--- a/docs/src/content/docs/explanation/why-discovery.md
+++ b/docs/src/content/docs/explanation/why-discovery.md
@ -1,31 +1,130 @@
 ---
 title: Why Use Discovery Files?
-description: Understanding the importance of discovery files
+description: Understanding the importance of discovery files for modern websites
 ---
-Learn why discovery files are essential for modern websites and their benefits.
+Discovery files are the polite introduction your website makes to the automated systems that visit it every day. Just as you might put up a sign directing visitors to your front door, these files tell bots, AI assistants, search engines, and other automated systems where to go and what they can do.
-:::note[Work in Progress]
+## The Discovery Problem
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+Every website faces a fundamental challenge: how do automated systems know what your site contains, where security issues should be reported, or how AI assistants should interact with your content?
-This section will include:
+Without standardized discovery mechanisms, each bot must guess. Search engines might crawl your entire site inefficiently. AI systems might misrepresent your content. Security researchers won't know how to contact you responsibly. Federated services can't find your user profiles.
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Discovery files solve this by providing **machine-readable contracts** that answer specific questions:
- [Configuration Reference](/reference/configuration/)
+- **robots.txt**: "What can I crawl and where?"
- [API Reference](/reference/api/)
+- **llms.txt**: "How should AI assistants understand and represent your site?"
- [Examples](/examples/ecommerce/)
+- **humans.txt**: "Who built this and what technologies were used?"
 - **security.txt**: "Where do I report security vulnerabilities?"
 - **canary.txt**: "Has your organization received certain legal orders?"
 - **webfinger**: "How do I discover user profiles and federated identities?"
-## Need Help?
+## Why Multiple Files?
- Check our [FAQ](/community/faq/)
+You might wonder why we need separate files instead of one unified discovery document. The answer lies in **separation of concerns** and **backwards compatibility**.
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+Each file serves a distinct audience and purpose:
 - **robots.txt** targets web crawlers and has been the standard since 1994
 - **llms.txt** addresses the new reality of AI assistants processing web content
 - **humans.txt** provides transparency for developers and users curious about your stack
 - **security.txt** (RFC 9116) offers a standardized security contact mechanism
 - **canary.txt** enables transparency about legal obligations
 - **webfinger** (RFC 7033) enables decentralized resource discovery
 Different systems read different files. A search engine ignores humans.txt. A developer looking at your tech stack won't read robots.txt. A security researcher needs security.txt, not your sitemap.
 This modularity also means you can adopt discovery files incrementally. Start with robots.txt and sitemap.xml, add llms.txt when you want AI assistance, enable security.txt when you're ready to accept vulnerability reports.
 ## The Visibility Trade-off
 Discovery files involve an important trade-off: **transparency versus obscurity**.
 By publishing robots.txt, you tell both polite crawlers and malicious scrapers about your site structure. Security.txt reveals your security team's contact information. Humans.txt exposes your technology stack.
 This is deliberate. Discovery files embrace the principle that **security through obscurity is not security**. The benefits of standardized, polite communication with automated systems outweigh the minimal risks of exposing this information.
 Consider that:
 - Attackers can discover your tech stack through other means (HTTP headers, page analysis, etc.)
 - Security.txt makes responsible disclosure easier, reducing time-to-fix for vulnerabilities
 - Robots.txt only controls *polite* bots - malicious actors ignore it anyway
 - The transparency builds trust with users, developers, and security researchers
 ## The Evolution of Discovery
 Discovery mechanisms have evolved alongside the web itself:
 **1994**: robots.txt emerges as an informal standard for crawler communication
 **2000s**: Sitemaps become essential for SEO as the web grows exponentially
 **2008**: humans.txt proposed to add personality and transparency to websites
 **2017**: RFC 9116 standardizes security.txt after years of ad-hoc security contact methods
 **2023**: llms.txt proposed as AI assistants become major consumers of web content
 **2024**: Warrant canaries and webfinger integration emerge for transparency and federation
 Each new discovery file addresses a real need that emerged as the web ecosystem grew. The integration brings them together because **modern websites need to communicate with an increasingly diverse set of automated visitors**.
 ## Discovery as Infrastructure
 Think of discovery files as **critical infrastructure for your website**. They're not optional extras - they're the foundation for how your site interacts with the broader web ecosystem.
 Without proper discovery files:
 - Search engines may crawl inefficiently, wasting your server resources
 - AI assistants may misunderstand your content or ignore important context
 - Security researchers may struggle to report vulnerabilities responsibly
 - Developers can't easily understand your technical choices
 - Federated services can't integrate with your user profiles
 With comprehensive discovery:
 - You control how bots interact with your site
 - AI assistants have proper context for representing your content
 - Security issues can be reported through established channels
 - Your tech stack and team are properly credited
 - Your site integrates seamlessly with federated protocols
 ## The Cost-Benefit Analysis
 Setting up discovery files manually for each project is tedious and error-prone. You need to:
 - Remember the correct format for each file type
 - Keep URLs and sitemaps synchronized with your site config
 - Update expiration dates for security.txt and canary.txt
 - Maintain consistency across different discovery mechanisms
 - Handle edge cases and RFC compliance
 An integration automates all of this, ensuring:
 - **Consistency**: All discovery files reference the same site URL
 - **Correctness**: RFC compliance is handled automatically
 - **Maintenance**: Expiration dates and timestamps update on each build
 - **Flexibility**: Configuration changes propagate to all relevant files
 - **Best Practices**: Sensible defaults that you can override as needed
 The cost is minimal - a single integration in your Astro config. The benefit is comprehensive, standards-compliant discovery across your entire site.
 ## Looking Forward
 As the web continues to evolve, discovery mechanisms will too. We're already seeing:
 - AI systems becoming more sophisticated in how they consume web content
 - Federated protocols gaining adoption for decentralized social networks
 - Increased emphasis on security transparency and responsible disclosure
 - Growing need for machine-readable metadata as automation increases
 Discovery files aren't a trend - they're fundamental communication protocols that will remain relevant as long as automated systems interact with websites.
 By implementing comprehensive discovery now, you're **future-proofing** your site for whatever new automated visitors emerge next.
 ## Related Topics
 - [SEO Implications](/explanation/seo/) - How discovery files affect search rankings
 - [AI Integration Strategy](/explanation/ai-integration/) - Making your content AI-friendly
 - [Architecture](/explanation/architecture/) - How the integration works internally
--- a/docs/src/content/docs/how-to/activitypub.md
+++ b/docs/src/content/docs/how-to/activitypub.md
@ -3,29 +3,382 @@ title: ActivityPub Integration
 description: Connect with the Fediverse via WebFinger
 ---
-Set up ActivityPub integration to make your site discoverable on Mastodon and the Fediverse.
+Enable WebFinger to make your site discoverable on Mastodon and other ActivityPub-compatible services in the Fediverse.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Understanding of ActivityPub and WebFinger protocols
 - Knowledge of your site's user or author structure
 - ActivityPub server endpoints (or static actor files)
-This section will include:
+## Basic Static Profile
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Create a single discoverable profile:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  webfinger: {
    enabled: true,
    resources: [
      {
        resource: 'acct:yourname@example.com',
        subject: 'acct:yourname@example.com',
        aliases: [
          'https://example.com/@yourname'
        ],
        links: [
          {
            rel: 'http://webfinger.net/rel/profile-page',
            type: 'text/html',
            href: 'https://example.com/@yourname'
          },
          {
            rel: 'self',
            type: 'application/activity+json',
            href: 'https://example.com/users/yourname'
          }
        ]
      }
    ]
  }
 })
 ```
-## Need Help?
+Query: `GET /.well-known/webfinger?resource=acct:yourname@example.com`
- Check our [FAQ](/community/faq/)
+## Multiple Authors
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+Enable discovery for all blog authors:
 ```typescript
 discovery({
  webfinger: {
    enabled: true,
    resources: [
      {
        resource: 'acct:alice@example.com',
        links: [
          {
            rel: 'self',
            type: 'application/activity+json',
            href: 'https://example.com/users/alice'
          },
          {
            rel: 'http://webfinger.net/rel/profile-page',
            href: 'https://example.com/authors/alice'
          }
        ]
      },
      {
        resource: 'acct:bob@example.com',
        links: [
          {
            rel: 'self',
            type: 'application/activity+json',
            href: 'https://example.com/users/bob'
          },
          {
            rel: 'http://webfinger.net/rel/profile-page',
            href: 'https://example.com/authors/bob'
          }
        ]
      }
    ]
  }
 })
 ```
 ## Dynamic Authors from Content Collection
 Load authors from Astro content collection:
 **Step 1**: Create authors collection:
 ```typescript
 // src/content.config.ts
 const authorsCollection = defineCollection({
  type: 'data',
  schema: z.object({
    name: z.string(),
    email: z.string().email(),
    bio: z.string(),
    avatar: z.string().url(),
    mastodon: z.string().optional(),
  })
 });
 ```
 **Step 2**: Add author data:
 ```yaml
 # src/content/authors/alice.yaml
 name: Alice Developer
 email: alice@example.com
 bio: Full-stack developer and writer
 avatar: https://example.com/avatars/alice.jpg
 mastodon: '@alice@mastodon.social'
 ```
 **Step 3**: Configure WebFinger collection:
 ```typescript
 discovery({
  webfinger: {
    enabled: true,
    collections: [{
      name: 'authors',
      resourceTemplate: 'acct:{slug}@example.com',
      linksBuilder: (author) => [
        {
          rel: 'http://webfinger.net/rel/profile-page',
          type: 'text/html',
          href: `https://example.com/authors/${author.slug}`
        },
        {
          rel: 'http://webfinger.net/rel/avatar',
          type: 'image/jpeg',
          href: author.data.avatar
        },
        {
          rel: 'self',
          type: 'application/activity+json',
          href: `https://example.com/users/${author.slug}`
        }
      ],
      propertiesBuilder: (author) => ({
        'http://schema.org/name': author.data.name,
        'http://schema.org/description': author.data.bio
      }),
      aliasesBuilder: (author) => [
        `https://example.com/@${author.slug}`
      ]
    }]
  }
 })
 ```
 ## Create ActivityPub Actor Endpoint
 WebFinger discovery requires an ActivityPub actor endpoint. Create it:
 ```typescript
 // src/pages/users/[author].json.ts
 import type { APIRoute } from 'astro';
 import { getCollection } from 'astro:content';
 export async function getStaticPaths() {
  const authors = await getCollection('authors');
  return authors.map(author => ({
    params: { author: author.slug }
  }));
 }
 export const GET: APIRoute = async ({ params, site }) => {
  const authors = await getCollection('authors');
  const author = authors.find(a => a.slug === params.author);
  if (!author) {
    return new Response(null, { status: 404 });
  }
  const actor = {
    '@context': [
      'https://www.w3.org/ns/activitystreams',
      'https://w3id.org/security/v1'
    ],
    'type': 'Person',
    'id': `${site}users/${author.slug}`,
    'preferredUsername': author.slug,
    'name': author.data.name,
    'summary': author.data.bio,
    'url': `${site}authors/${author.slug}`,
    'icon': {
      'type': 'Image',
      'mediaType': 'image/jpeg',
      'url': author.data.avatar
    },
    'inbox': `${site}users/${author.slug}/inbox`,
    'outbox': `${site}users/${author.slug}/outbox`,
    'followers': `${site}users/${author.slug}/followers`,
    'following': `${site}users/${author.slug}/following`,
  };
  return new Response(JSON.stringify(actor, null, 2), {
    status: 200,
    headers: {
      'Content-Type': 'application/activity+json'
    }
  });
 };
 ```
 ## Link from Mastodon
 Users can find your profile on Mastodon:
 1. Go to Mastodon search
 2. Enter `@yourname@example.com`
 3. Mastodon queries WebFinger at your site
 4. Gets ActivityPub actor URL
 5. Displays profile with follow button
 ## Add Profile Link in Bio
 Link your Mastodon profile:
 ```typescript
 discovery({
  webfinger: {
    enabled: true,
    collections: [{
      name: 'authors',
      resourceTemplate: 'acct:{slug}@example.com',
      linksBuilder: (author) => {
        const links = [
          {
            rel: 'self',
            type: 'application/activity+json',
            href: `https://example.com/users/${author.slug}`
          }
        ];
        // Add Mastodon link if available
        if (author.data.mastodon) {
          const mastodonUrl = author.data.mastodon.startsWith('http')
            ? author.data.mastodon
            : `https://mastodon.social/${author.data.mastodon}`;
          links.push({
            rel: 'http://webfinger.net/rel/profile-page',
            type: 'text/html',
            href: mastodonUrl
          });
        }
        return links;
      }
    }]
  }
 })
 ```
 ## Testing WebFinger
 Test your WebFinger endpoint:
 ```bash
 # Build the site
 npm run build
 npm run preview
 # Test WebFinger query
 curl "http://localhost:4321/.well-known/webfinger?resource=acct:alice@example.com"
 ```
 Expected response:
 ```json
 {
  "subject": "acct:alice@example.com",
  "aliases": [
    "https://example.com/@alice"
  ],
  "links": [
    {
      "rel": "http://webfinger.net/rel/profile-page",
      "type": "text/html",
      "href": "https://example.com/authors/alice"
    },
    {
      "rel": "self",
      "type": "application/activity+json",
      "href": "https://example.com/users/alice"
    }
  ]
 }
 ```
 ## Test ActivityPub Actor
 Verify actor endpoint:
 ```bash
 curl "http://localhost:4321/users/alice" \
  -H "Accept: application/activity+json"
 ```
 Should return actor JSON with inbox, outbox, followers, etc.
 ## Configure CORS
 WebFinger requires CORS headers:
 The integration automatically adds:
 ```
 Access-Control-Allow-Origin: *
 ```
 For production with an ActivityPub server, configure appropriate CORS in your hosting.
 ## Implement Full ActivityPub
 For complete Fediverse integration:
 1. **Implement inbox**: Handle incoming activities (follows, likes, shares)
 2. **Implement outbox**: Serve your posts/activities
 3. **Generate keypairs**: For signing activities
 4. **Handle followers**: Maintain follower/following lists
 5. **Send activities**: Notify followers of new posts
 This is beyond WebFinger scope. Consider using:
 - [Bridgy Fed](https://fed.brid.gy/) for easy federation
 - [WriteFreely](https://writefreely.org/) for federated blogging
 - [GoToSocial](https://gotosocial.org/) for self-hosted instances
 ## Expected Result
 Your site becomes discoverable in the Fediverse:
 1. Users search `@yourname@example.com` on Mastodon
 2. Mastodon fetches WebFinger from `/.well-known/webfinger`
 3. Gets ActivityPub actor URL
 4. Displays your profile
 5. Users can follow/interact (if full ActivityPub implemented)
 ## Alternative Approaches
 **Static site**: Use WebFinger for discovery only, point to external Mastodon account.
 **Proxy to Mastodon**: WebFinger points to your Mastodon instance.
 **Bridgy Fed**: Use Bridgy Fed to handle ActivityPub protocol, just provide WebFinger.
 **Full implementation**: Build complete ActivityPub server with inbox/outbox.
 ## Common Issues
 **WebFinger not found**: Ensure `webfinger.enabled: true` and resources/collections configured.
 **CORS errors**: Integration adds CORS automatically. Check if hosting overrides headers.
 **Actor URL 404**: Create the actor endpoint at the URL specified in WebFinger links.
 **Mastodon can't find profile**: Ensure `rel: 'self'` link with `type: 'application/activity+json'` exists.
 **Incorrect format**: WebFinger must return valid JRD JSON. Test with curl.
 **Case sensitivity**: Resource URIs are case-sensitive. `acct:alice@example.com` ≠ `acct:Alice@example.com`
 ## Additional Resources
 - [WebFinger RFC 7033](https://datatracker.ietf.org/doc/html/rfc7033)
 - [ActivityPub Spec](https://www.w3.org/TR/activitypub/)
 - [Mastodon Documentation](https://docs.joinmastodon.org/)
 - [Bridgy Fed](https://fed.brid.gy/)
--- a/docs/src/content/docs/how-to/add-team-members.md
+++ b/docs/src/content/docs/how-to/add-team-members.md
@ -3,29 +3,248 @@ title: Add Team Members
 description: Add team member information to humans.txt
 ---
-Learn how to add team members and collaborators to your humans.txt file.
+Document your team and contributors in humans.txt for public recognition.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Team member information (names, roles, contact details)
 - Permission from team members to share their information
-This section will include:
+## Add a Single Team Member
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Configure basic team information:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  humans: {
    team: [
      {
        name: 'Jane Developer',
        role: 'Lead Developer',
        contact: 'jane@example.com'
      }
    ]
  }
 })
 ```
-## Need Help?
+## Add Multiple Team Members
- Check our [FAQ](/community/faq/)
+Include your full team:
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+```typescript
 discovery({
  humans: {
    team: [
      {
        name: 'Jane Developer',
        role: 'Lead Developer',
        contact: 'jane@example.com',
        location: 'San Francisco, CA'
      },
      {
        name: 'John Designer',
        role: 'UI/UX Designer',
        contact: 'john@example.com',
        location: 'New York, NY'
      },
      {
        name: 'Sarah Product',
        role: 'Product Manager',
        location: 'London, UK'
      }
    ]
  }
 })
 ```
 ## Include Social Media Profiles
 Add Twitter and GitHub handles:
 ```typescript
 discovery({
  humans: {
    team: [
      {
        name: 'Alex Dev',
        role: 'Full Stack Developer',
        contact: 'alex@example.com',
        twitter: '@alexdev',
        github: 'alex-codes'
      }
    ]
  }
 })
 ```
 ## Load from Content Collections
 Dynamically generate team list from content:
 ```typescript
 import { getCollection } from 'astro:content';
 discovery({
  humans: {
    team: async () => {
      const teamMembers = await getCollection('team');
      return teamMembers.map(member => ({
        name: member.data.name,
        role: member.data.role,
        contact: member.data.email,
        location: member.data.city,
        twitter: member.data.twitter,
        github: member.data.github
      }));
    }
  }
 })
 ```
 Create a content collection in `src/content/team/`:
 ```yaml
 # src/content/team/jane.yaml
 name: Jane Developer
 role: Lead Developer
 email: jane@example.com
 city: San Francisco, CA
 twitter: '@janedev'
 github: jane-codes
 ```
 ## Load from External Source
 Fetch team data from your API or database:
 ```typescript
 discovery({
  humans: {
    team: async () => {
      const response = await fetch('https://api.example.com/team');
      const teamData = await response.json();
      return teamData.members.map(member => ({
        name: member.fullName,
        role: member.position,
        contact: member.publicEmail,
        location: member.location
      }));
    }
  }
 })
 ```
 ## Add Acknowledgments
 Thank contributors and inspirations:
 ```typescript
 discovery({
  humans: {
    team: [/* ... */],
    thanks: [
      'The Astro team for the amazing framework',
      'All our open source contributors',
      'Stack Overflow community',
      'Our beta testers',
      'Coffee and late nights'
    ]
  }
 })
 ```
 ## Include Project Story
 Add context about your project:
 ```typescript
 discovery({
  humans: {
    team: [/* ... */],
    story: `
 This project was born from a hackathon in 2024. What started as
 a weekend experiment grew into a tool used by thousands. Our team
 came together from different timezones and backgrounds, united by
 a passion for making the web more discoverable.
    `.trim()
  }
 })
 ```
 ## Add Fun Facts
 Make it personal:
 ```typescript
 discovery({
  humans: {
    team: [/* ... */],
    funFacts: [
      'Built entirely remotely across 4 continents',
      'Powered by 1,247 cups of coffee',
      'Deployed on a Friday (we live dangerously)',
      'First commit was at 2:47 AM',
      'Named after a recurring inside joke'
    ]
  }
 })
 ```
 ## Verify Your Configuration
 Build and check the output:
 ```bash
 npm run build
 npm run preview
 curl http://localhost:4321/humans.txt
 ```
 ## Expected Result
 Your humans.txt will contain formatted team information:
 ```
 /* TEAM */
  Name: Jane Developer
  Role: Lead Developer
  Contact: jane@example.com
  From: San Francisco, CA
  Twitter: @janedev
  GitHub: jane-codes
  Name: John Designer
  Role: UI/UX Designer
  Contact: john@example.com
  From: New York, NY
 /* THANKS */
  The Astro team for the amazing framework
  All our open source contributors
  Coffee and late nights
 ```
 ## Alternative Approaches
 **Privacy-first**: Use team roles without names or contact details for privacy.
 **Department-based**: Group team members by department rather than listing individually.
 **Rotating spotlight**: Highlight different team members each month using dynamic content.
 ## Common Issues
 **Missing permissions**: Always get consent before publishing personal information.
 **Outdated information**: Keep contact details current. Use dynamic loading to stay fresh.
 **Too much detail**: Stick to professional information. Avoid personal addresses or phone numbers.
 **Special characters**: Use plain ASCII in humans.txt. Avoid emojis unless necessary.
--- a/docs/src/content/docs/how-to/block-bots.md
+++ b/docs/src/content/docs/how-to/block-bots.md
@ -1,31 +1,169 @@
 ---
 title: Block Specific Bots
-description: How to block unwanted bots from crawling your site
+description: Control which bots can crawl your site using robots.txt rules
 ---
-Learn how to block specific bots or user agents from accessing your site.
+Block unwanted bots or user agents from accessing specific parts of your site.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Basic familiarity with robots.txt format
 - Knowledge of which bot user agents to block
-This section will include:
+## Block a Single Bot Completely
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+To prevent a specific bot from crawling your entire site:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  robots: {
    additionalAgents: [
      {
        userAgent: 'BadBot',
        disallow: ['/']
      }
    ]
  }
 })
 ```
-## Need Help?
+This creates a rule that blocks `BadBot` from all pages.
- Check our [FAQ](/community/faq/)
+## Block Multiple Bots
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+Add multiple entries to the `additionalAgents` array:
 ```typescript
 discovery({
  robots: {
    additionalAgents: [
      { userAgent: 'BadBot', disallow: ['/'] },
      { userAgent: 'SpamCrawler', disallow: ['/'] },
      { userAgent: 'AnnoyingBot', disallow: ['/'] }
    ]
  }
 })
 ```
 ## Block Bots from Specific Paths
 Allow a bot access to most content, but block sensitive areas:
 ```typescript
 discovery({
  robots: {
    additionalAgents: [
      {
        userAgent: 'PriceBot',
        allow: ['/'],
        disallow: ['/checkout', '/account', '/api']
      }
    ]
  }
 })
 ```
 **Order matters**: Specific rules (`/checkout`) should come after general rules (`/`).
 ## Disable All LLM Bots
 To block all AI crawler bots:
 ```typescript
 discovery({
  robots: {
    llmBots: {
      enabled: false
    }
  }
 })
 ```
 This removes the allow rules for Anthropic-AI, Claude-Web, GPTBot, and other LLM crawlers.
 ## Block Specific LLM Bots
 Keep some LLM bots while blocking others:
 ```typescript
 discovery({
  robots: {
    llmBots: {
      enabled: true,
      agents: ['Anthropic-AI', 'Claude-Web'] // Only allow these
    },
    additionalAgents: [
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'Google-Extended', disallow: ['/'] }
    ]
  }
 })
 ```
 ## Add Custom Rules
 For complex scenarios, use `customRules` to add raw robots.txt content:
 ```typescript
 discovery({
  robots: {
    customRules: `
 # Block aggressive crawlers
 User-agent: AggressiveBot
 Crawl-delay: 30
 Disallow: /
 # Special rule for search engine
 User-agent: Googlebot
 Allow: /api/public
 Disallow: /api/private
    `.trim()
  }
 })
 ```
 ## Verify Your Configuration
 After configuration, build your site and check `/robots.txt`:
 ```bash
 npm run build
 npm run preview
 curl http://localhost:4321/robots.txt
 ```
 Look for your custom agent rules in the output.
 ## Expected Result
 Your robots.txt will contain entries like:
 ```
 User-agent: BadBot
 Disallow: /
 User-agent: PriceBot
 Allow: /
 Disallow: /checkout
 Disallow: /account
 ```
 Blocked bots should respect these rules and avoid crawling restricted areas.
 ## Alternative Approaches
 **Server-level blocking**: For malicious bots that ignore robots.txt, consider blocking at the server/firewall level.
 **User-agent detection**: Implement server-side detection to return 403 Forbidden for specific user agents.
 **Rate limiting**: Use crawl delays to slow down aggressive crawlers rather than blocking them completely.
 ## Common Issues
 **Bots ignoring rules**: robots.txt is advisory only. Malicious bots may not respect it.
 **Overly broad patterns**: Be specific with disallow paths. `/api` blocks `/api/public` too.
 **Typos in user agents**: User agent strings are case-sensitive. Check bot documentation for exact values.
--- a/docs/src/content/docs/how-to/cache-headers.md
+++ b/docs/src/content/docs/how-to/cache-headers.md
@ -3,29 +3,224 @@ title: Set Cache Headers
 description: Configure HTTP caching for discovery files
 ---
-Optimize cache headers for discovery files to balance freshness and performance.
+Optimize cache headers for discovery files to balance freshness with server load and client performance.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Understanding of HTTP caching concepts
 - Knowledge of your content update frequency
-This section will include:
+## Set Cache Duration for All Files
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Configure caching in seconds:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  caching: {
    robots: 3600,      // 1 hour
    llms: 3600,        // 1 hour
    humans: 86400,     // 24 hours
    security: 86400,   // 24 hours
    canary: 3600,      // 1 hour
    webfinger: 3600,   // 1 hour
    sitemap: 3600      // 1 hour
  }
 })
 ```
-## Need Help?
+These values set `Cache-Control: public, max-age=<seconds>` headers.
- Check our [FAQ](/community/faq/)
+## Short Cache for Frequently Updated Content
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+Update canary.txt daily? Use short cache:
 ```typescript
 discovery({
  caching: {
    canary: 1800  // 30 minutes
  }
 })
 ```
 Bots will check for updates more frequently.
 ## Long Cache for Static Content
 Rarely change humans.txt? Cache longer:
 ```typescript
 discovery({
  caching: {
    humans: 604800  // 1 week (7 days)
  }
 })
 ```
 Reduces server load for static content.
 ## Disable Caching for Development
 Different caching for development vs production:
 ```typescript
 discovery({
  caching: import.meta.env.PROD
    ? {
        // Production: aggressive caching
        robots: 3600,
        llms: 3600,
        humans: 86400
      }
    : {
        // Development: no caching
        robots: 0,
        llms: 0,
        humans: 0
      }
 })
 ```
 Zero seconds means no caching (always fresh).
 ## Match Cache to Update Frequency
 Align with your content update schedule:
 ```typescript
 discovery({
  caching: {
    // Updated hourly via CI/CD
    llms: 3600,  // 1 hour
    // Updated daily
    canary: 7200,  // 2 hours (some buffer)
    // Updated weekly
    humans: 86400,  // 24 hours
    // Rarely changes
    robots: 604800,  // 1 week
    security: 2592000  // 30 days
  }
 })
 ```
 ## Conservative Caching
 When in doubt, cache shorter:
 ```typescript
 discovery({
  caching: {
    robots: 1800,    // 30 min
    llms: 1800,      // 30 min
    humans: 3600,    // 1 hour
    sitemap: 1800    // 30 min
  }
 })
 ```
 Ensures content stays relatively fresh.
 ## Aggressive Caching
 Optimize for performance when content is stable:
 ```typescript
 discovery({
  caching: {
    robots: 86400,     // 24 hours
    llms: 43200,       // 12 hours
    humans: 604800,    // 1 week
    security: 2592000, // 30 days
    sitemap: 86400     // 24 hours
  }
 })
 ```
 ## Understand Cache Behavior
 Different cache durations affect different use cases:
 **robots.txt** (crawl bots):
 - Short cache (1 hour): Quickly reflect changes to bot permissions
 - Long cache (24 hours): Reduce load from frequent bot checks
 **llms.txt** (AI assistants):
 - Short cache (1 hour): Keep instructions current
 - Medium cache (6 hours): Balance freshness and performance
 **humans.txt** (curious visitors):
 - Long cache (24 hours - 1 week): Team info changes rarely
 **security.txt** (security researchers):
 - Long cache (24 hours - 30 days): Contact info is stable
 **canary.txt** (transparency):
 - Short cache (30 min - 1 hour): Must be checked frequently
 ## Verify Cache Headers
 Test with curl:
 ```bash
 npm run build
 npm run preview
 # Check cache headers
 curl -I http://localhost:4321/robots.txt
 curl -I http://localhost:4321/llms.txt
 curl -I http://localhost:4321/humans.txt
 ```
 Look for `Cache-Control` header in the response:
 ```
 Cache-Control: public, max-age=3600
 ```
 ## Expected Result
 Browsers and CDNs will cache files according to your settings. Subsequent requests within the cache period will be served from cache, reducing server load.
 For a 1-hour cache:
 1. First request at 10:00 AM: Server serves fresh content
 2. Request at 10:30 AM: Served from cache
 3. Request at 11:01 AM: Cache expired, server serves fresh content
 ## Alternative Approaches
 **CDN-level caching**: Configure caching at your CDN (Cloudflare, Fastly) rather than in the integration.
 **Surrogate-Control header**: Use `Surrogate-Control` for CDN caching while controlling browser cache separately.
 **ETags**: Add ETag support for efficient conditional requests.
 **Vary header**: Consider adding `Vary: Accept-Encoding` for compressed responses.
 ## Common Issues
 **Cache too long**: Content changes not reflected quickly. Reduce cache duration.
 **Cache too short**: High server load from repeated requests. Increase cache duration.
 **No caching in production**: Check if your hosting platform overrides headers.
 **Stale content after updates**: Deploy a new version with a build timestamp to bust caches.
 **Different behavior in CDN**: CDN may have its own caching rules. Check CDN configuration.
 ## Cache Duration Guidelines
 **Rule of thumb**:
 - Update frequency = Daily → Cache 2-6 hours
 - Update frequency = Weekly → Cache 12-24 hours
 - Update frequency = Monthly → Cache 1-7 days
 - Update frequency = Rarely → Cache 7-30 days
 **Special cases**:
 - Canary.txt: Cache < update frequency (if daily, cache 2-12 hours)
 - Security.txt: Cache longer (expires field handles staleness)
 - Development: Cache 0 or very short (60 seconds)
--- a/docs/src/content/docs/how-to/content-collections.md
+++ b/docs/src/content/docs/how-to/content-collections.md
@ -3,29 +3,376 @@ title: Use with Content Collections
 description: Integrate with Astro content collections
 ---
-Automatically generate discovery content from your Astro content collections.
+Automatically generate discovery content from your Astro content collections for dynamic, maintainable configuration.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Astro content collections set up
 - Understanding of async configuration functions
-This section will include:
+## Load Team from Collection
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Create a team content collection and populate humans.txt:
- [Configuration Reference](/reference/configuration/)
+**Step 1**: Define the collection schema:
 - [API Reference](/reference/api/)
 - [Examples](/examples/ecommerce/)
-## Need Help?
+```typescript
 // src/content.config.ts
 import { defineCollection, z } from 'astro:content';
- Check our [FAQ](/community/faq/)
+const teamCollection = defineCollection({
- Visit [Troubleshooting](/community/troubleshooting/)
+  type: 'data',
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+  schema: z.object({
    name: z.string(),
    role: z.string(),
    email: z.string().email(),
    location: z.string().optional(),
    twitter: z.string().optional(),
    github: z.string().optional(),
  })
 });
 export const collections = {
  team: teamCollection
 };
 ```
 **Step 2**: Add team members:
 ```yaml
 # src/content/team/alice.yaml
 name: Alice Johnson
 role: Lead Developer
 email: alice@example.com
 location: San Francisco, CA
 github: alice-codes
 ```
 ```yaml
 # src/content/team/bob.yaml
 name: Bob Smith
 role: Designer
 email: bob@example.com
 location: New York, NY
 twitter: '@bobdesigns'
 ```
 **Step 3**: Load in discovery config:
 ```typescript
 // astro.config.mjs
 import { getCollection } from 'astro:content';
 discovery({
  humans: {
    team: async () => {
      const members = await getCollection('team');
      return members.map(member => ({
        name: member.data.name,
        role: member.data.role,
        contact: member.data.email,
        location: member.data.location,
        twitter: member.data.twitter,
        github: member.data.github
      }));
    }
  }
 })
 ```
 ## Generate Important Pages from Docs
 List featured documentation pages in llms.txt:
 **Step 1**: Add featured flag to doc frontmatter:
 ```markdown
 ---
 # src/content/docs/getting-started.md
 title: Getting Started Guide
 description: Quick start guide for new users
 featured: true
 ---
 ```
 **Step 2**: Load featured docs:
 ```typescript
 discovery({
  llms: {
    importantPages: async () => {
      const docs = await getCollection('docs');
      return docs
        .filter(doc => doc.data.featured)
        .sort((a, b) => (a.data.order || 0) - (b.data.order || 0))
        .map(doc => ({
          name: doc.data.title,
          path: `/docs/${doc.slug}`,
          description: doc.data.description
        }));
    }
  }
 })
 ```
 ## WebFinger from Author Collection
 Make blog authors discoverable via WebFinger:
 **Step 1**: Define authors collection:
 ```typescript
 // src/content.config.ts
 const authorsCollection = defineCollection({
  type: 'data',
  schema: z.object({
    name: z.string(),
    email: z.string().email(),
    bio: z.string(),
    avatar: z.string().url(),
    mastodon: z.string().url().optional(),
    website: z.string().url().optional()
  })
 });
 ```
 **Step 2**: Add author data:
 ```yaml
 # src/content/authors/alice.yaml
 name: Alice Developer
 email: alice@example.com
 bio: Full-stack developer and open source enthusiast
 avatar: https://example.com/avatars/alice.jpg
 mastodon: https://mastodon.social/@alice
 website: https://alice.dev
 ```
 **Step 3**: Configure WebFinger:
 ```typescript
 discovery({
  webfinger: {
    enabled: true,
    collections: [{
      name: 'authors',
      resourceTemplate: 'acct:{slug}@example.com',
      linksBuilder: (author) => [
        {
          rel: 'http://webfinger.net/rel/profile-page',
          type: 'text/html',
          href: `https://example.com/authors/${author.slug}`
        },
        {
          rel: 'http://webfinger.net/rel/avatar',
          type: 'image/jpeg',
          href: author.data.avatar
        },
        ...(author.data.mastodon ? [{
          rel: 'self',
          type: 'application/activity+json',
          href: author.data.mastodon
        }] : [])
      ],
      propertiesBuilder: (author) => ({
        'http://schema.org/name': author.data.name,
        'http://schema.org/description': author.data.bio
      })
    }]
  }
 })
 ```
 Query with: `GET /.well-known/webfinger?resource=acct:alice@example.com`
 ## Load API Endpoints from Spec
 Generate API documentation from a collection:
 ```typescript
 // src/content.config.ts
 const apiCollection = defineCollection({
  type: 'data',
  schema: z.object({
    path: z.string(),
    method: z.enum(['GET', 'POST', 'PUT', 'DELETE', 'PATCH']),
    description: z.string(),
    public: z.boolean().default(true)
  })
 });
 ```
 ```yaml
 # src/content/api/search.yaml
 path: /api/search
 method: GET
 description: Search products by name, category, or tag
 public: true
 ```
 ```typescript
 discovery({
  llms: {
    apiEndpoints: async () => {
      const endpoints = await getCollection('api');
      return endpoints
        .filter(ep => ep.data.public)
        .map(ep => ({
          path: ep.data.path,
          method: ep.data.method,
          description: ep.data.description
        }));
    }
  }
 })
 ```
 ## Multiple Collections
 Combine data from several collections:
 ```typescript
 discovery({
  humans: {
    team: async () => {
      const [coreTeam, contributors] = await Promise.all([
        getCollection('team'),
        getCollection('contributors')
      ]);
      return [
        ...coreTeam.map(m => ({ ...m.data, role: `Core - ${m.data.role}` })),
        ...contributors.map(m => ({ ...m.data, role: `Contributor - ${m.data.role}` }))
      ];
    },
    thanks: async () => {
      const sponsors = await getCollection('sponsors');
      return sponsors.map(s => s.data.name);
    }
  }
 })
 ```
 ## Filter and Sort Collections
 Control which items are included:
 ```typescript
 discovery({
  llms: {
    importantPages: async () => {
      const allDocs = await getCollection('docs');
      return allDocs
        // Only published docs
        .filter(doc => doc.data.published !== false)
        // Only important ones
        .filter(doc => doc.data.priority === 'high')
        // Sort by custom order
        .sort((a, b) => {
          const orderA = a.data.order ?? 999;
          const orderB = b.data.order ?? 999;
          return orderA - orderB;
        })
        // Map to format
        .map(doc => ({
          name: doc.data.title,
          path: `/docs/${doc.slug}`,
          description: doc.data.description
        }));
    }
  }
 })
 ```
 ## Localized Content
 Support multiple languages:
 ```typescript
 discovery({
  llms: {
    importantPages: async () => {
      const docs = await getCollection('docs');
      // Group by language
      const enDocs = docs.filter(d => d.slug.startsWith('en/'));
      const esDocs = docs.filter(d => d.slug.startsWith('es/'));
      // Return English docs, with links to translations
      return enDocs.map(doc => ({
        name: doc.data.title,
        path: `/docs/${doc.slug}`,
        description: doc.data.description,
        // Could add: translations: ['/docs/es/...']
      }));
    }
  }
 })
 ```
 ## Cache Collection Queries
 Optimize build performance:
 ```typescript
 // Cache at module level
 let cachedTeam = null;
 discovery({
  humans: {
    team: async () => {
      if (!cachedTeam) {
        const members = await getCollection('team');
        cachedTeam = members.map(m => ({
          name: m.data.name,
          role: m.data.role,
          contact: m.data.email
        }));
      }
      return cachedTeam;
    }
  }
 })
 ```
 ## Expected Result
 Content collections automatically populate discovery files:
 **Adding a team member**:
 1. Create `src/content/team/new-member.yaml`
 2. Run `npm run build`
 3. humans.txt includes new member
 **Marking a doc as featured**:
 1. Add `featured: true` to frontmatter
 2. Run `npm run build`
 3. llms.txt lists the new important page
 ## Alternative Approaches
 **Static data**: Use plain JavaScript objects when data rarely changes.
 **External API**: Fetch from CMS or API during build instead of using collections.
 **Hybrid**: Use collections for core data, enhance with API data.
 ## Common Issues
 **Async not awaited**: Ensure you use `async () => {}` and `await getCollection()`.
 **Build-time only**: Collections are loaded at build time, not runtime.
 **Type errors**: Ensure collection schema matches the data structure you're mapping.
 **Missing data**: Check that collection files exist and match the schema.
 **Slow builds**: Cache collection queries if used multiple times in config.
--- a/docs/src/content/docs/how-to/custom-templates.md
+++ b/docs/src/content/docs/how-to/custom-templates.md
@ -3,29 +3,417 @@ title: Custom Templates
 description: Create custom templates for discovery files
 ---
-Override default templates to fully customize the output of discovery files.
+Override default templates to fully customize the output format of discovery files.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Understanding of the file formats (robots.txt, llms.txt, etc.)
 - Knowledge of template function signatures
-This section will include:
+## Override robots.txt Template
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Complete control over robots.txt output:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  templates: {
    robots: (config, siteURL) => {
      const lines = [];
-## Need Help?
+      // Custom header
      lines.push('# Custom robots.txt');
      lines.push(`# Site: ${siteURL.hostname}`);
      lines.push('# Last generated: ' + new Date().toISOString());
      lines.push('');
- Check our [FAQ](/community/faq/)
+      // Default rule
- Visit [Troubleshooting](/community/troubleshooting/)
+      lines.push('User-agent: *');
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+      lines.push('Allow: /');
      lines.push('');
      // Add sitemap
      lines.push(`Sitemap: ${new URL('sitemap-index.xml', siteURL).href}`);
      return lines.join('\n') + '\n';
    }
  }
 })
 ```
 ## Override llms.txt Template
 Custom format for AI instructions:
 ```typescript
 discovery({
  templates: {
    llms: async (config, siteURL) => {
      const lines = [];
      // Header
      lines.push(`=`.repeat(60));
      lines.push(`AI ASSISTANT GUIDE FOR ${siteURL.hostname.toUpperCase()}`);
      lines.push(`=`.repeat(60));
      lines.push('');
      // Description
      const description = typeof config.description === 'function'
        ? config.description()
        : config.description;
      if (description) {
        lines.push(description);
        lines.push('');
      }
      // Instructions
      if (config.instructions) {
        lines.push('IMPORTANT INSTRUCTIONS:');
        lines.push(config.instructions);
        lines.push('');
      }
      // API endpoints in custom format
      if (config.apiEndpoints && config.apiEndpoints.length > 0) {
        lines.push('AVAILABLE APIs:');
        config.apiEndpoints.forEach(ep => {
          lines.push(`  [${ep.method || 'GET'}] ${ep.path}`);
          lines.push(`      → ${ep.description}`);
        });
        lines.push('');
      }
      // Footer
      lines.push(`=`.repeat(60));
      lines.push(`Generated: ${new Date().toISOString()}`);
      return lines.join('\n') + '\n';
    }
  }
 })
 ```
 ## Override humans.txt Template
 Custom humans.txt format:
 ```typescript
 discovery({
  templates: {
    humans: (config, siteURL) => {
      const lines = [];
      lines.push('========================================');
      lines.push('         HUMANS BEHIND THE SITE         ');
      lines.push('========================================');
      lines.push('');
      // Team in custom format
      if (config.team && config.team.length > 0) {
        lines.push('OUR TEAM:');
        lines.push('');
        config.team.forEach((member, i) => {
          if (i > 0) lines.push('---');
          lines.push(`Name     : ${member.name}`);
          if (member.role) lines.push(`Role     : ${member.role}`);
          if (member.contact) lines.push(`Email    : ${member.contact}`);
          if (member.github) lines.push(`GitHub   : https://github.com/${member.github}`);
          lines.push('');
        });
      }
      // Stack info
      if (config.site?.techStack) {
        lines.push('BUILT WITH:');
        lines.push(config.site.techStack.join(' | '));
        lines.push('');
      }
      return lines.join('\n') + '\n';
    }
  }
 })
 ```
 ## Override security.txt Template
 Custom security.txt with additional fields:
 ```typescript
 discovery({
  templates: {
    security: (config, siteURL) => {
      const lines = [];
      // Canonical (required by RFC 9116)
      const canonical = config.canonical ||
        new URL('.well-known/security.txt', siteURL).href;
      lines.push(`Canonical: ${canonical}`);
      // Contact (required)
      const contacts = Array.isArray(config.contact)
        ? config.contact
        : [config.contact];
      contacts.forEach(contact => {
        const contactValue = contact.includes('@') && !contact.startsWith('mailto:')
          ? `mailto:${contact}`
          : contact;
        lines.push(`Contact: ${contactValue}`);
      });
      // Expires (recommended)
      const expires = config.expires === 'auto'
        ? new Date(Date.now() + 365 * 24 * 60 * 60 * 1000).toISOString()
        : config.expires;
      if (expires) {
        lines.push(`Expires: ${expires}`);
      }
      // Optional fields
      if (config.encryption) {
        const encryptions = Array.isArray(config.encryption)
          ? config.encryption
          : [config.encryption];
        encryptions.forEach(enc => lines.push(`Encryption: ${enc}`));
      }
      if (config.policy) {
        lines.push(`Policy: ${config.policy}`);
      }
      if (config.acknowledgments) {
        lines.push(`Acknowledgments: ${config.acknowledgments}`);
      }
      // Add custom comment
      lines.push('');
      lines.push('# Thank you for helping keep our users safe!');
      return lines.join('\n') + '\n';
    }
  }
 })
 ```
 ## Override canary.txt Template
 Custom warrant canary format:
 ```typescript
 discovery({
  templates: {
    canary: (config, siteURL) => {
      const lines = [];
      const today = new Date().toISOString().split('T')[0];
      lines.push('=== WARRANT CANARY ===');
      lines.push('');
      lines.push(`Organization: ${config.organization || siteURL.hostname}`);
      lines.push(`Date Issued: ${today}`);
      lines.push('');
      lines.push('As of this date, we confirm:');
      lines.push('');
      // List what has NOT been received
      const statements = typeof config.statements === 'function'
        ? config.statements()
        : config.statements || [];
      statements
        .filter(s => !s.received)
        .forEach(statement => {
          lines.push(`✓ NO ${statement.description} received`);
        });
      lines.push('');
      lines.push('This canary will be updated regularly.');
      lines.push('Absence of an update should be considered significant.');
      lines.push('');
      if (config.verification) {
        lines.push(`Verification: ${config.verification}`);
      }
      return lines.join('\n') + '\n';
    }
  }
 })
 ```
 ## Combine Default Generator with Custom Content
 Use default generator, add custom content:
 ```typescript
 import { generateRobotsTxt } from '@astrojs/discovery/generators';
 discovery({
  templates: {
    robots: (config, siteURL) => {
      // Generate default content
      const defaultContent = generateRobotsTxt(config, siteURL);
      // Add custom rules
      const customRules = `
 # Custom section
 User-agent: MySpecialBot
 Crawl-delay: 20
 Allow: /special
 # Rate limiting comment
 # Please be respectful of our server resources
      `.trim();
      return defaultContent + '\n\n' + customRules + '\n';
    }
  }
 })
 ```
 ## Load Template from File
 Keep templates separate:
 ```typescript
 // templates/robots.txt.js
 export default (config, siteURL) => {
  return `
 User-agent: *
 Allow: /
 Sitemap: ${new URL('sitemap-index.xml', siteURL).href}
  `.trim() + '\n';
 };
 ```
 ```typescript
 // astro.config.mjs
 import robotsTemplate from './templates/robots.txt.js';
 discovery({
  templates: {
    robots: robotsTemplate
  }
 })
 ```
 ## Conditional Template Logic
 Different templates per environment:
 ```typescript
 discovery({
  templates: {
    llms: import.meta.env.PROD
      ? (config, siteURL) => {
          // Production: detailed guide
          return `# Production site guide\n...detailed content...`;
        }
      : (config, siteURL) => {
          // Development: simple warning
          return `# Development environment\nThis is a development site.\n`;
        }
  }
 })
 ```
 ## Template with External Data
 Fetch additional data in template:
 ```typescript
 discovery({
  templates: {
    llms: async (config, siteURL) => {
      // Fetch latest API spec
      const response = await fetch('https://api.example.com/openapi.json');
      const spec = await response.json();
      const lines = [];
      lines.push(`# ${siteURL.hostname} API Guide`);
      lines.push('');
      lines.push('Available endpoints:');
      Object.entries(spec.paths).forEach(([path, methods]) => {
        Object.keys(methods).forEach(method => {
          lines.push(`- ${method.toUpperCase()} ${path}`);
        });
      });
      return lines.join('\n') + '\n';
    }
  }
 })
 ```
 ## Verify Custom Templates
 Test your templates:
 ```bash
 npm run build
 npm run preview
 # Check each file
 curl http://localhost:4321/robots.txt
 curl http://localhost:4321/llms.txt
 curl http://localhost:4321/humans.txt
 curl http://localhost:4321/.well-known/security.txt
 ```
 Ensure format is correct and content appears as expected.
 ## Expected Result
 Your custom templates completely control output format:
 **Custom robots.txt**:
 ```
 # Custom robots.txt
 # Site: example.com
 # Last generated: 2025-11-08T12:00:00.000Z
 User-agent: *
 Allow: /
 Sitemap: https://example.com/sitemap-index.xml
 ```
 **Custom llms.txt**:
 ```
 ============================================================
 AI ASSISTANT GUIDE FOR EXAMPLE.COM
 ============================================================
 Your site description here
 IMPORTANT INSTRUCTIONS:
 ...
 ```
 ## Alternative Approaches
 **Partial overrides**: Extend default generators rather than replacing entirely.
 **Post-processing**: Generate default content, then modify it with string manipulation.
 **Multiple templates**: Use different templates based on configuration flags.
 ## Common Issues
 **Missing newline at end**: Ensure template returns content ending with `\n`.
 **Async templates**: llms.txt template can be async, others are sync. Don't mix.
 **Type errors**: Template signature must match: `(config: Config, siteURL: URL) => string`
 **Breaking specs**: security.txt and robots.txt have specific formats. Don't break them.
 **Config not available**: Only config passed to that section is available. Can't access other sections.
--- a/docs/src/content/docs/how-to/customize-llm-instructions.md
+++ b/docs/src/content/docs/how-to/customize-llm-instructions.md
@ -1,31 +1,255 @@
 ---
 title: Customize LLM Instructions
-description: Provide specific instructions for AI assistants
+description: Provide custom instructions for AI assistants using llms.txt
 ---
-Create custom instructions for AI assistants to follow when helping users with your site.
+Configure how AI assistants interact with your site by customizing instructions in llms.txt.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Understanding of your site's main use cases
 - Knowledge of your API endpoints (if applicable)
-This section will include:
+## Add Basic Instructions
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Provide clear guidance for AI assistants:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  llms: {
    description: 'Technical documentation for the Discovery API',
    instructions: `
 When helping users with this site:
 1. Check the documentation before answering
 2. Provide code examples when relevant
 3. Link to specific documentation pages
 4. Use the search API for queries
    `.trim()
  }
 })
 ```
-## Need Help?
+## Highlight Key Features
- Check our [FAQ](/community/faq/)
+Guide AI assistants to important capabilities:
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+```typescript
 discovery({
  llms: {
    description: 'E-commerce platform for sustainable products',
    keyFeatures: [
      'Carbon footprint calculator for all products',
      'Subscription management with flexible billing',
      'AI-powered product recommendations',
      'Real-time inventory tracking'
    ]
  }
 })
 ```
 ## Document Important Pages
 Direct AI assistants to critical resources:
 ```typescript
 discovery({
  llms: {
    importantPages: [
      {
        name: 'API Documentation',
        path: '/docs/api',
        description: 'Complete API reference with examples'
      },
      {
        name: 'Getting Started Guide',
        path: '/docs/quick-start',
        description: 'Step-by-step setup instructions'
      },
      {
        name: 'FAQ',
        path: '/help/faq',
        description: 'Common questions and solutions'
      }
    ]
  }
 })
 ```
 ## Describe Your APIs
 Help AI assistants use your endpoints correctly:
 ```typescript
 discovery({
  llms: {
    apiEndpoints: [
      {
        path: '/api/search',
        method: 'GET',
        description: 'Search products by name, category, or tag'
      },
      {
        path: '/api/products/:id',
        method: 'GET',
        description: 'Get detailed product information'
      },
      {
        path: '/api/calculate-carbon',
        method: 'POST',
        description: 'Calculate carbon footprint for a cart'
      }
    ]
  }
 })
 ```
 ## Set Brand Voice Guidelines
 Maintain consistent communication style:
 ```typescript
 discovery({
  llms: {
    brandVoice: [
      'Professional yet approachable',
      'Focus on sustainability and environmental impact',
      'Use concrete examples, not abstract concepts',
      'Avoid jargon unless explaining technical features',
      'Emphasize long-term value over short-term savings'
    ]
  }
 })
 ```
 ## Load Content Dynamically
 Pull important pages from content collections:
 ```typescript
 import { getCollection } from 'astro:content';
 discovery({
  llms: {
    importantPages: async () => {
      const docs = await getCollection('docs');
      // Filter to featured pages only
      return docs
        .filter(doc => doc.data.featured)
        .map(doc => ({
          name: doc.data.title,
          path: `/docs/${doc.slug}`,
          description: doc.data.description
        }));
    }
  }
 })
 ```
 ## Add Custom Sections
 Include specialized information:
 ```typescript
 discovery({
  llms: {
    customSections: {
      'Data Privacy': `
 We are GDPR compliant. User data is encrypted at rest and in transit.
 Data retention policy: 90 days for analytics, 7 years for transactions.
      `.trim(),
      'Rate Limits': `
 API rate limits:
 - Authenticated: 1000 requests/hour
 - Anonymous: 60 requests/hour
 - Burst: 20 requests/second
      `.trim(),
      'Support Channels': `
 For assistance:
 - Documentation: https://example.com/docs
 - Email: support@example.com (response within 24h)
 - Community: https://discord.gg/example
      `.trim()
    }
  }
 })
 ```
 ## Environment-Specific Instructions
 Different instructions for development vs production:
 ```typescript
 discovery({
  llms: {
    instructions: import.meta.env.PROD
      ? `Production site - use live API endpoints at https://api.example.com`
      : `Development site - API endpoints may be mocked or unavailable`
  }
 })
 ```
 ## Verify Your Configuration
 Build and check the output:
 ```bash
 npm run build
 npm run preview
 curl http://localhost:4321/llms.txt
 ```
 Look for your instructions, features, and API documentation in the formatted output.
 ## Expected Result
 Your llms.txt will contain structured information:
 ```markdown
 # example.com
 > E-commerce platform for sustainable products
 ---
 ## Key Features
 - Carbon footprint calculator for all products
 - AI-powered product recommendations
 ## Instructions for AI Assistants
 When helping users with this site:
 1. Check the documentation before answering
 2. Provide code examples when relevant
 ## API Endpoints
 - `GET /api/search`
  Search products by name, category, or tag
  Full URL: https://example.com/api/search
 ```
 AI assistants will use this information to provide accurate, context-aware help.
 ## Alternative Approaches
 **Multiple llms.txt files**: Create llms-full.txt for comprehensive docs, llms.txt for summary.
 **Dynamic generation**: Use a build script to extract API docs from OpenAPI specs.
 **Language-specific versions**: Generate different files for different locales (llms-en.txt, llms-es.txt).
 ## Common Issues
 **Too much information**: Keep it concise. AI assistants prefer focused, actionable guidance.
 **Outdated instructions**: Use `lastUpdate: 'auto'` or automate updates from your CMS.
 **Missing context**: Don't assume knowledge. Explain domain-specific terms and workflows.
 **Unclear priorities**: List most important pages/features first. AI assistants may prioritize early content.
--- a/docs/src/content/docs/how-to/environment-config.md
+++ b/docs/src/content/docs/how-to/environment-config.md
@ -3,29 +3,322 @@ title: Environment-specific Configuration
 description: Use different configs for dev and production
 ---
-Configure different settings for development and production environments.
+Configure different settings for development and production environments to optimize for local testing vs deployed sites.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Understanding of Astro environment variables
 - Knowledge of your deployment setup
-This section will include:
+## Basic Environment Switching
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Use `import.meta.env.PROD` to detect production:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  robots: {
    // Block all bots in development
    allowAllBots: import.meta.env.PROD
  }
 })
 ```
-## Need Help?
+Development: Bots blocked. Production: Bots allowed.
- Check our [FAQ](/community/faq/)
+## Different Site URLs
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+Use different domains for staging and production:
 ```typescript
 export default defineConfig({
  site: import.meta.env.PROD
    ? 'https://example.com'
    : 'http://localhost:4321',
  integrations: [
    discovery({
      // Config automatically uses correct site URL
    })
  ]
 })
 ```
 ## Conditional Feature Enablement
 Enable security.txt and canary.txt only in production:
 ```typescript
 discovery({
  security: import.meta.env.PROD
    ? {
        contact: 'security@example.com',
        expires: 'auto'
      }
    : undefined,  // Disabled in development
  canary: import.meta.env.PROD
    ? {
        organization: 'Example Corp',
        contact: 'canary@example.com',
        frequency: 'monthly'
      }
    : undefined  // Disabled in development
 })
 ```
 ## Environment-Specific Instructions
 Different LLM instructions for each environment:
 ```typescript
 discovery({
  llms: {
    description: import.meta.env.PROD
      ? 'Production e-commerce platform'
      : 'Development/Staging environment - data may be test data',
    instructions: import.meta.env.PROD
      ? `
 When helping users:
 1. Use production API at https://api.example.com
 2. Data is live - be careful with modifications
 3. Refer to https://docs.example.com for documentation
      `.trim()
      : `
 Development environment - for testing only:
 1. API endpoints may be mocked
 2. Database is reset nightly
 3. Some features may not work
      `.trim()
  }
 })
 ```
 ## Custom Environment Variables
 Use `.env` files for configuration:
 ```bash
 # .env.production
 PUBLIC_SECURITY_EMAIL=security@example.com
 PUBLIC_CANARY_ENABLED=true
 PUBLIC_CONTACT_EMAIL=contact@example.com
 # .env.development
 PUBLIC_SECURITY_EMAIL=dev-security@localhost
 PUBLIC_CANARY_ENABLED=false
 PUBLIC_CONTACT_EMAIL=dev@localhost
 ```
 Then use in config:
 ```typescript
 discovery({
  security: import.meta.env.PUBLIC_CANARY_ENABLED === 'true'
    ? {
        contact: import.meta.env.PUBLIC_SECURITY_EMAIL,
        expires: 'auto'
      }
    : undefined,
  humans: {
    team: [
      {
        name: 'Team',
        contact: import.meta.env.PUBLIC_CONTACT_EMAIL
      }
    ]
  }
 })
 ```
 ## Staging Environment
 Support three environments: dev, staging, production:
 ```typescript
 const ENV = import.meta.env.MODE; // 'development', 'staging', or 'production'
 const siteURLs = {
  development: 'http://localhost:4321',
  staging: 'https://staging.example.com',
  production: 'https://example.com'
 };
 export default defineConfig({
  site: siteURLs[ENV],
  integrations: [
    discovery({
      robots: {
        // Block bots in dev and staging
        allowAllBots: ENV === 'production',
        additionalAgents: ENV !== 'production'
          ? [{ userAgent: '*', disallow: ['/'] }]
          : []
      },
      llms: {
        description: ENV === 'production'
          ? 'Production site'
          : `${ENV} environment - not for public use`
      }
    })
  ]
 })
 ```
 Run with: `astro build --mode staging`
 ## Different Cache Headers
 Aggressive caching in production, none in development:
 ```typescript
 discovery({
  caching: import.meta.env.PROD
    ? {
        // Production: cache aggressively
        robots: 86400,
        llms: 3600,
        humans: 604800
      }
    : {
        // Development: no caching
        robots: 0,
        llms: 0,
        humans: 0
      }
 })
 ```
 ## Feature Flags
 Use environment variables as feature flags:
 ```typescript
 discovery({
  webfinger: {
    enabled: import.meta.env.PUBLIC_ENABLE_WEBFINGER === 'true',
    resources: [/* ... */]
  },
  canary: import.meta.env.PUBLIC_ENABLE_CANARY === 'true'
    ? {
        organization: 'Example Corp',
        frequency: 'monthly'
      }
    : undefined
 })
 ```
 Set in `.env`:
 ```bash
 PUBLIC_ENABLE_WEBFINGER=false
 PUBLIC_ENABLE_CANARY=true
 ```
 ## Test vs Production Data
 Load different team data per environment:
 ```typescript
 import { getCollection } from 'astro:content';
 discovery({
  humans: {
    team: import.meta.env.PROD
      ? await getCollection('team')  // Real team
      : [
          {
            name: 'Test Developer',
            role: 'Developer',
            contact: 'test@localhost'
          }
        ]
  }
 })
 ```
 ## Preview Deployments
 Handle preview/branch deployments:
 ```typescript
 const isPreview = import.meta.env.PREVIEW === 'true';
 const isProd = import.meta.env.PROD && !isPreview;
 discovery({
  robots: {
    allowAllBots: isProd,  // Block on previews too
    additionalAgents: !isProd
      ? [
          {
            userAgent: '*',
            disallow: ['/']
          }
        ]
      : []
  }
 })
 ```
 ## Verify Environment Config
 Test each environment:
 ```bash
 # Development
 npm run dev
 curl http://localhost:4321/robots.txt
 # Production build
 npm run build
 npm run preview
 curl http://localhost:4321/robots.txt
 # Staging (if configured)
 astro build --mode staging
 ```
 Check that content differs appropriately.
 ## Expected Result
 Each environment produces appropriate output:
 **Development** - Block all:
 ```
 User-agent: *
 Disallow: /
 ```
 **Production** - Allow bots:
 ```
 User-agent: *
 Allow: /
 Sitemap: https://example.com/sitemap-index.xml
 ```
 ## Alternative Approaches
 **Config files per environment**: Create `astro.config.dev.mjs` and `astro.config.prod.mjs`.
 **Build-time injection**: Use build tools to inject environment-specific values.
 **Runtime checks**: For SSR sites, check headers or hostname at runtime.
 ## Common Issues
 **Environment variables not available**: Ensure variables are prefixed with `PUBLIC_` for client access.
 **Wrong environment detected**: `import.meta.env.PROD` is true for production builds, not preview.
 **Undefined values**: Provide fallbacks for missing environment variables.
 **Inconsistent builds**: Document which environment variables affect the build for reproducibility.
--- a/docs/src/content/docs/how-to/filter-sitemap.md
+++ b/docs/src/content/docs/how-to/filter-sitemap.md
@ -3,29 +3,240 @@ title: Filter Sitemap Pages
 description: Control which pages appear in your sitemap
 ---
-Configure filtering to control which pages are included in your sitemap.
+Exclude pages from your sitemap to keep it focused on publicly accessible, valuable content.
-:::note[Work in Progress]
+## Prerequisites
 This page is currently being developed. Check back soon for complete documentation.
 :::
-## Coming Soon
+- Integration installed and configured
 - Understanding of which pages should be public
 - Knowledge of your site's URL structure
-This section will include:
+## Exclude Admin Pages
 - Detailed explanations
 - Code examples
 - Best practices
 - Common patterns
 - Troubleshooting tips
-## Related Pages
+Block administrative and dashboard pages:
- [Configuration Reference](/reference/configuration/)
+```typescript
- [API Reference](/reference/api/)
+// astro.config.mjs
- [Examples](/examples/ecommerce/)
+discovery({
  sitemap: {
    filter: (page) => !page.includes('/admin')
  }
 })
 ```
-## Need Help?
+This removes all URLs containing `/admin` from the sitemap.
- Check our [FAQ](/community/faq/)
+## Exclude Multiple Path Patterns
- Visit [Troubleshooting](/community/troubleshooting/)
+
- Open an issue on [GitHub](https://github.com/withastro/astro-discovery/issues)
+Filter out several types of pages:
 ```typescript
 discovery({
  sitemap: {
    filter: (page) => {
      return !page.includes('/admin') &&
             !page.includes('/draft') &&
             !page.includes('/private') &&
             !page.includes('/test');
    }
  }
 })
 ```
 ## Exclude by File Extension
 Remove API endpoints or non-HTML pages:
 ```typescript
 discovery({
  sitemap: {
    filter: (page) => {
      return !page.endsWith('.json') &&
             !page.endsWith('.xml') &&
             !page.includes('/api/');
    }
  }
 })
 ```
 ## Include Only Specific Directories
 Allow only documentation and blog posts:
 ```typescript
 discovery({
  sitemap: {
    filter: (page) => {
      const url = new URL(page);
      const path = url.pathname;
      return path.startsWith('/docs/') ||
             path.startsWith('/blog/') ||
             path === '/';
    }
  }
 })
 ```
 ## Exclude by Environment
 Different filtering for development vs production:
 ```typescript
 discovery({
  sitemap: {
    filter: (page) => {
      // Exclude drafts in production
      if (import.meta.env.PROD && page.includes('/draft')) {
        return false;
      }
      // Exclude test pages in production
      if (import.meta.env.PROD && page.includes('/test')) {
        return false;
      }
      return true;
    }
  }
 })
 ```
 ## Filter Based on Page Metadata
 Use frontmatter or metadata to control inclusion:
 ```typescript
 discovery({
  sitemap: {
    serialize: (item) => {
      // Exclude pages marked as noindex
      // Note: You'd need to access page metadata here
      // This is a simplified example
      return item;
    },
    filter: (page) => {
      // Basic path-based filtering
      return !page.includes('/internal/');
    }
  }
 })
 ```
 ## Combine with Custom Pages
 Add non-generated pages while filtering others:
 ```typescript
 discovery({
  sitemap: {
    filter: (page) => !page.includes('/admin'),
    customPages: [
      'https://example.com/special-page',
      'https://example.com/external-content'
    ]
  }
 })
 ```
 ## Use Regular Expressions
 Advanced pattern matching:
 ```typescript
 discovery({
  sitemap: {
    filter: (page) => {
      // Exclude pages with query parameters
      if (page.includes('?')) return false;
      // Exclude paginated pages except first page
      if (/\/page\/\d+/.test(page)) return false;
      // Exclude temp or staging paths
      if (/\/(temp|staging|wip)\//.test(page)) return false;
      return true;
    }
  }
 })
 ```
 ## Filter User-Generated Content
 Exclude user profiles or dynamic content:
 ```typescript
 discovery({
  sitemap: {
    filter: (page) => {
      // Include main user directory page
      if (page === '/users' || page === '/users/') return true;
      // Exclude individual user pages
      if (page.startsWith('/users/')) return false;
      // Exclude comment threads
      if (page.includes('/comments/')) return false;
      return true;
    }
  }
 })
 ```
 ## Verify Your Filter
 Test your filter logic:
 ```bash
 npm run build
 npm run preview
 # Check sitemap
 curl http://localhost:4321/sitemap-index.xml
 # Look for excluded pages (should not appear)
 curl http://localhost:4321/sitemap-0.xml | grep '/admin'
 ```
 If grep returns nothing, your filter is working.
 ## Expected Result
 Your sitemap will only contain allowed pages. Excluded pages won't appear:
 ```xml
 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
  </url>
  <url>
    <loc>https://example.com/blog/post-1</loc>
  </url>
  <!-- No /admin, /draft, or /private pages -->
 </urlset>
 ```
 ## Alternative Approaches
 **robots.txt blocking**: Block crawling entirely using robots.txt instead of just omitting from sitemap.
 **Meta robots tag**: Add `<meta name="robots" content="noindex">` to pages you want excluded.
 **Separate sitemaps**: Create multiple sitemap files for different sections, only submit public ones.
 **Dynamic generation**: Generate sitemaps at runtime based on user permissions or content status.
 ## Common Issues
 **Too restrictive**: Double-check your filter doesn't exclude important pages. Test thoroughly.
 **Case sensitivity**: URL paths are case-sensitive. `/Admin` and `/admin` are different.
 **Trailing slashes**: Be consistent. `/page` and `/page/` may both exist. Handle both.
 **Query parameters**: Decide whether to include pages with query strings. Usually exclude them.
 **Performance**: Complex filter functions run for every page. Keep logic simple for better build times.
--- a/status.json
+++ b/status.json
@ -29,7 +29,7 @@
      ]
    },
    "howto": {
-      "status": "executing",
+      "status": "ready",
      "branch": "docs/howto-content",
      "worktree": "docs-howto",
      "pages": [
@ -44,10 +44,20 @@
        "how-to/activitypub.md"
      ],
      "dependencies": ["tutorials"],
-      "completed_pages": []
+      "completed_pages": [
        "how-to/block-bots.md",
        "how-to/customize-llm-instructions.md",
        "how-to/add-team-members.md",
        "how-to/filter-sitemap.md",
        "how-to/cache-headers.md",
        "how-to/environment-config.md",
        "how-to/content-collections.md",
        "how-to/custom-templates.md",
        "how-to/activitypub.md"
      ]
    },
    "reference": {
-      "status": "executing",
+      "status": "ready",
      "branch": "docs/reference-content",
      "worktree": "docs-reference",
      "pages": [
@ -64,7 +74,19 @@
        "reference/typescript.md"
      ],
      "dependencies": [],
-      "completed_pages": []
+      "completed_pages": [
        "reference/configuration.md",
        "reference/api.md",
        "reference/robots.md",
        "reference/llms.md",
        "reference/humans.md",
        "reference/security.md",
        "reference/canary.md",
        "reference/webfinger.md",
        "reference/sitemap.md",
        "reference/cache.md",
        "reference/typescript.md"
      ]
    },
    "explanation": {
      "status": "executing",