Compare commits

..

8 Commits

Author SHA1 Message Date
Crawailer Developer
f4088bf607 Bump version to 0.1.1 🚀
- Irresistible README improvements with personality and humor
- Added asciinema interactive demos
- Enhanced Claude Code MCP integration positioning
- Self-aware humor about AI-assisted development
- More compelling feature descriptions and CTAs
- Ready for improved PyPI presentation
2025-09-18 17:28:35 -06:00
Crawailer Developer
cffd04e4ca Add interactive asciinema demos 🎬
- Created demo_basic_usage.py showing Crawailer vs requests
- Created demo_claude_integration.py for MCP server showcase
- Recorded asciinema sessions for both demos
- Added demo viewing instructions to README
- Provides interactive way to see Crawailer in action
- Perfect for developers who want to see before installing
2025-09-18 17:26:13 -06:00
Crawailer Developer
fad0b60b66 Make README absolutely irresistible 🔥
- Changed tagline to 'doesn't suck at JavaScript' - more memorable
- Rewrote features as benefit-focused 'Why Developers Choose Crawailer'
- Added personality with 'Stupidly Fast' and 'Zero Learning Curve'
- Enhanced MCP section with more relatable language
- Updated CTA to 'Ready to Stop Losing Your Mind?' - emotional hook
- Maintained PyPI compatibility while adding serious personality
2025-09-18 17:23:23 -06:00
Crawailer Developer
3d5a2f3dec Add meta-joke about not needing to read examples
- Added humorous disclaimer that AI assistants will figure it out anyway
- Perfectly captures the absurdity of traditional documentation in the AI era
- Self-aware humor that resonates with AI-assisted developers
- Makes the README more engaging while acknowledging modern dev reality
2025-09-18 17:21:42 -06:00
Crawailer Developer
21c4a63477 Add humorous but accurate Claude Code framing
- Added 'tell the model that writes your code for you' quote
- Makes the MCP integration relatable and memorable
- Emphasizes the practical reality of AI-assisted development
- Adds personality while highlighting the core value proposition
2025-09-18 17:20:44 -06:00
Crawailer Developer
2e1e7d49eb Highlight Claude Code MCP integration prominently
- Added Claude Code MCP callout in opening section
- Featured 'Claude Code Ready' in features list
- Added dedicated MCP integration example after Quick Start
- Positions Crawailer specifically for AI agent and MCP server use cases
2025-09-18 17:20:09 -06:00
Crawailer Developer
8b0fa8ef77 Remove premature download badge
- Removed pepy.tech downloads badge that shows 0 for new packages
- Will re-add once we have meaningful download numbers
- Keeps PyPI version and Python support badges which work immediately
2025-09-18 17:19:28 -06:00
Crawailer Developer
ad37776018 Update repository URLs to MCP organization
- Changed all repository references from github.com/anthropics/crawailer to git.supported.systems/MCP/crawailer
- Updated pyproject.toml URLs for PyPI package metadata
- Updated CHANGELOG.md commit history link
- Ready for PyPI publication with correct repository information
2025-09-18 14:51:10 -06:00
8 changed files with 258 additions and 25 deletions

View File

@ -80,4 +80,4 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
---
For more details about changes, see the [commit history](https://github.com/anthropics/crawailer/commits/main).
For more details about changes, see the [commit history](https://git.supported.systems/MCP/crawailer/commits/branch/main).

View File

@ -1,29 +1,65 @@
# 🕷️ Crawailer
**The JavaScript-first web scraper that actually works with modern websites**
**The web scraper that doesn't suck at JavaScript** ✨
> **Finally!** A Python library that handles React, Vue, Angular, and dynamic content without the headaches. When `requests` fails and Selenium feels like overkill, Crawailer delivers clean, AI-ready content extraction with bulletproof JavaScript execution.
> **Stop fighting modern websites.** While `requests` gives you empty `<div id="root"></div>`, Crawailer actually executes JavaScript and extracts real content from React, Vue, and Angular apps. Finally, web scraping that works in 2025.
> ⚡ **Claude Code's new best friend** - Your AI assistant can now access ANY website
```python
pip install crawailer
```
[![PyPI version](https://badge.fury.io/py/crawailer.svg)](https://badge.fury.io/py/crawailer)
[![Downloads](https://pepy.tech/badge/crawailer)](https://pepy.tech/project/crawailer)
[![Python Support](https://img.shields.io/pypi/pyversions/crawailer.svg)](https://pypi.org/project/crawailer/)
## ✨ Features
## ✨ Why Developers Choose Crawailer
- **🎯 JavaScript-First**: Executes real JavaScript on React, Vue, Angular sites (unlike `requests`)
- **⚡ Lightning Fast**: 5-10x faster HTML processing with C-based selectolax
- **🤖 AI-Optimized**: Clean markdown output perfect for LLM training and RAG
- **🔧 Three Ways to Use**: Library, CLI tool, or MCP server - your choice
- **📦 Zero Config**: Works immediately with sensible defaults
- **🧪 Battle-Tested**: 18 comprehensive test suites with 70+ real-world scenarios
- **🎨 Developer Joy**: Rich terminal output, helpful errors, progress tracking
**🔥 JavaScript That Actually Works**
While other tools timeout or crash, Crawailer executes real JavaScript like a human browser
**⚡ Stupidly Fast**
5-10x faster than BeautifulSoup with C-based parsing that doesn't make you wait
**🤖 AI Assistant Ready**
Perfect markdown output that your Claude/GPT/local model will love
**🎯 Zero Learning Curve**
`pip install` → works immediately → no 47-page configuration guides
**🧪 Production Battle-Tested**
18 comprehensive test suites covering every edge case we could think of
**🎨 Actually Enjoyable**
Rich terminal output, helpful errors, progress bars that don't lie
## 🚀 Quick Start
> *(Honestly, you probably don't need to read these examples - just ask your AI assistant to figure it out. That's what models are for! But here they are anyway...)*
### 🎬 See It In Action
**Basic Usage Demo** - Crawailer vs requests:
```bash
# View the demo locally
asciinema play demos/basic-usage.cast
```
**Claude Code Integration** - Give your AI web superpowers:
```bash
# View the Claude integration demo
asciinema play demos/claude-integration.cast
```
*Don't have asciinema? `pip install asciinema` or run the demos yourself:*
```bash
# Clone the repo and run demos interactively
git clone https://git.supported.systems/MCP/crawailer.git
cd crawailer
python demo_basic_usage.py
python demo_claude_integration.py
```
```python
import crawailer as web
@ -61,6 +97,30 @@ research = await web.discover(
# Crawailer → Full content + dynamic data
```
### 🧠 Claude Code MCP Integration
> *"Hey Claude, go grab that data from the React app"* ← This actually works now
```python
# Add to your Claude Code MCP server
from crawailer.mcp import create_mcp_server
@mcp_tool("web_extract")
async def extract_content(url: str, script: str = ""):
"""Extract content from any website with optional JavaScript execution"""
content = await web.get(url, script=script)
return {
"title": content.title,
"markdown": content.markdown,
"script_result": content.script_result,
"word_count": content.word_count
}
# 🎉 No more "I can't access that site"
# 🎉 No more copy-pasting content manually
# 🎉 Your AI can now browse the web like a human
```
## 🎯 Design Philosophy
### For Robots, By Humans
@ -272,16 +332,18 @@ MIT License - see [LICENSE](LICENSE) for details.
---
## 🚀 Ready to Stop Fighting JavaScript?
## 🚀 Ready to Stop Losing Your Mind?
```bash
pip install crawailer
crawailer setup # Install browser engines
```
**Join the revolution**: Stop losing data to `requests.get()` failures. Start extracting **real content** from **real websites** that actually use JavaScript.
**Life's too short** for empty `<div>` tags and "JavaScript required" messages.
**Star us on GitHub** if Crawailer saves your scraping sanity!
Get content that actually exists. From websites that actually work.
**Star us if this saves your sanity** → [git.supported.systems/MCP/crawailer](https://git.supported.systems/MCP/crawailer)
---

58
demo_basic_usage.py Normal file
View File

@ -0,0 +1,58 @@
#!/usr/bin/env python3
"""
Quick Crawailer demo for asciinema recording
Shows basic usage vs requests failure
"""
import asyncio
import requests
from rich.console import Console
from rich.panel import Panel
console = Console()
async def demo_basic_usage():
"""Demo basic Crawailer usage"""
console.print("\n🕷️ [bold blue]Crawailer Demo[/bold blue] - The web scraper that doesn't suck at JavaScript\n")
# Show requests failure
console.print("📉 [red]What happens with requests:[/red]")
console.print("```python")
console.print("import requests")
console.print("response = requests.get('https://react-app.example.com')")
console.print("print(response.text)")
console.print("```")
# Simulate requests response
console.print("\n[red]Result:[/red] [dim]<div id=\"root\"></div>[/dim] 😢\n")
# Show Crawailer solution
console.print("🚀 [green]What happens with Crawailer:[/green]")
console.print("```python")
console.print("import crawailer as web")
console.print("content = await web.get('https://react-app.example.com')")
console.print("print(content.markdown)")
console.print("```")
await asyncio.sleep(2)
# Simulate Crawailer response
console.print("\n[green]Result:[/green]")
console.print(Panel.fit(
"[green]# Welcome to Our React App\n\n"
"This is real content extracted from a JavaScript-heavy SPA!\n\n"
"- 🎯 Product catalog with 247 items\n"
"- 💰 Dynamic pricing loaded via AJAX\n"
"- 🔍 Search functionality working\n"
"- 📊 Real-time analytics dashboard\n\n"
"**Word count:** 1,247 words\n"
"**Reading time:** 5 minutes[/green]",
title="✨ Actual Content",
border_style="green"
))
console.print("\n⚡ [bold]The difference?[/bold] Crawailer executes JavaScript like a real browser!")
console.print("🎉 [bold green]Your AI assistant can now access ANY website![/bold green]")
if __name__ == "__main__":
asyncio.run(demo_basic_usage())

View File

@ -0,0 +1,76 @@
#!/usr/bin/env python3
"""
Claude Code MCP integration demo for asciinema recording
Shows how easy it is to give Claude web access
"""
import asyncio
from rich.console import Console
from rich.panel import Panel
from rich.syntax import Syntax
console = Console()
async def demo_claude_integration():
"""Demo Claude Code MCP integration"""
console.print("\n🧠 [bold blue]Claude Code MCP Integration Demo[/bold blue]\n")
console.print("[dim]\"Hey Claude, go grab that data from the React app\"[/dim] ← This actually works now!\n")
# Show the MCP server code
mcp_code = '''# Add to your Claude Code MCP server
from crawailer.mcp import create_mcp_server
@mcp_tool("web_extract")
async def extract_content(url: str, script: str = ""):
"""Extract content from any website with optional JavaScript execution"""
content = await web.get(url, script=script)
return {
"title": content.title,
"markdown": content.markdown,
"script_result": content.script_result,
"word_count": content.word_count
}'''
syntax = Syntax(mcp_code, "python", theme="monokai", line_numbers=True)
console.print(Panel(syntax, title="🔧 MCP Server Setup", border_style="blue"))
await asyncio.sleep(3)
# Show Claude conversation
console.print("\n💬 [bold]Claude Code Conversation:[/bold]")
await asyncio.sleep(1)
console.print("\n[cyan]You:[/cyan] Claude, can you extract the pricing info from https://shop.example.com/product/123?")
await asyncio.sleep(2)
console.print("\n[green]Claude:[/green] I'll extract that pricing information for you!")
console.print("[green]Claude:[/green] [dim]Using web_extract tool...[/dim]")
await asyncio.sleep(2)
console.print("\n[green]Claude:[/green] Here's the pricing information I extracted:")
# Show extracted content
result_panel = Panel.fit(
"[green]**Product:** Premium Widget Pro\n"
"**Price:** $299.99 (was $399.99)\n"
"**Discount:** 25% off - Limited time!\n"
"**Stock:** 47 units available\n"
"**Shipping:** Free 2-day delivery\n"
"**Reviews:** 4.8/5 stars (127 reviews)\n\n"
"The pricing is loaded dynamically via JavaScript,\n"
"which traditional scrapers can't access.[/green]",
title="📊 Extracted Data",
border_style="green"
)
console.print(result_panel)
console.print("\n🎉 [bold green]Benefits:[/bold green]")
console.print(" • No more \"I can't access that site\"")
console.print(" • No more copy-pasting content manually")
console.print(" • Your AI can now browse the web like a human")
console.print("\n⚡ [bold]Claude Code + Crawailer = Web superpowers for your AI![/bold]")
if __name__ == "__main__":
asyncio.run(demo_claude_integration())

20
demos/basic-usage.cast Normal file
View File

@ -0,0 +1,20 @@
{"version":3,"term":{"cols":80,"rows":24,"type":"xterm-kitty"},"timestamp":1758237889,"command":"python demo_basic_usage.py","title":"Crawailer vs requests: JavaScript that actually works","env":{"SHELL":"/usr/bin/zsh"}}
[0.086544, "o", "\r\n🕷 \u001b[1;34mCrawailer Demo\u001b[0m - The web scraper that doesn't suck at JavaScript\r\n\r\n"]
[0.000139, "o", "📉 \u001b[31mWhat happens with requests:\u001b[0m\r\n"]
[0.000067, "o", "```python\r\n"]
[0.000063, "o", "import requests\r\n"]
[0.000108, "o", "response = \u001b[1;35mrequests.get\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'https://react-app.example.com'\u001b[0m\u001b[1m)\u001b[0m\r\n"]
[0.000078, "o", "\u001b[1;35mprint\u001b[0m\u001b[1m(\u001b[0mresponse.text\u001b[1m)\u001b[0m\r\n"]
[0.000044, "o", "```\r\n"]
[0.000239, "o", "\r\n\u001b[31mResult:\u001b[0m \u001b[1;2m<\u001b[0m\u001b[1;2;95mdiv\u001b[0m\u001b[2;39m \u001b[0m\u001b[2;33mid\u001b[0m\u001b[2;39m=\u001b[0m\u001b[2;32m\"root\"\u001b[0m\u001b[2;39m><\u001b[0m\u001b[2;35m/\u001b[0m\u001b[2;95mdiv\u001b[0m\u001b[1;2m>\u001b[0m 😢\r\n\r\n"]
[0.000106, "o", "🚀 \u001b[32mWhat happens with Crawailer:\u001b[0m\r\n"]
[0.000054, "o", "```python\r\n"]
[0.000058, "o", "import crawailer as web\r\n"]
[0.000092, "o", "content = await \u001b[1;35mweb.get\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'https://react-app.example.com'\u001b[0m\u001b[1m)\u001b[0m\r\n"]
[0.000075, "o", "\u001b[1;35mprint\u001b[0m\u001b[1m(\u001b[0mcontent.markdown\u001b[1m)\u001b[0m\r\n"]
[0.000045, "o", "```\r\n"]
[2.001535, "o", "\r\n\u001b[32mResult:\u001b[0m\r\n"]
[0.000583, "o", "\u001b[32m╭─\u001b[0m\u001b[32m────────────────────\u001b[0m\u001b[32m ✨ Actual Content \u001b[0m\u001b[32m────────────────────\u001b[0m\u001b[32m─╮\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m# Welcome to Our React App\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32mThis is real content extracted from a JavaScript-heavy SPA!\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m- 🎯 Product catalog with 247 items\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m- 💰 Dynamic pricing loaded via AJAX\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m- 🔍 Search functionality working\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m- 📊 Real-time analytics dashboard\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m**Word count:** 1,247 words\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m│\u001b[0m \u001b[32m**Reading time:** 5 minutes\u001b[0m \u001b[32m│\u001b[0m\r\n\u001b[32m╰─────────────────────────────────────────────────────────────╯\u001b[0m\r\n"]
[0.000174, "o", "\r\n⚡ \u001b[1mThe difference?\u001b[0m Crawailer executes JavaScript like a real browser!\r\n"]
[0.000134, "o", "🎉 \u001b[1;32mYour AI assistant can now access ANY website!\u001b[0m\r\n"]
[0.017314, "x", "0"]

File diff suppressed because one or more lines are too long

View File

@ -115,15 +115,15 @@ all = [
]
[project.urls]
Homepage = "https://github.com/anthropics/crawailer"
Repository = "https://github.com/anthropics/crawailer"
Documentation = "https://github.com/anthropics/crawailer/blob/main/docs/README.md"
"Bug Tracker" = "https://github.com/anthropics/crawailer/issues"
"Source Code" = "https://github.com/anthropics/crawailer"
"API Reference" = "https://github.com/anthropics/crawailer/blob/main/docs/API_REFERENCE.md"
"JavaScript Guide" = "https://github.com/anthropics/crawailer/blob/main/docs/JAVASCRIPT_API.md"
"Benchmarks" = "https://github.com/anthropics/crawailer/blob/main/docs/BENCHMARKS.md"
Changelog = "https://github.com/anthropics/crawailer/releases"
Homepage = "https://git.supported.systems/MCP/crawailer"
Repository = "https://git.supported.systems/MCP/crawailer"
Documentation = "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/README.md"
"Bug Tracker" = "https://git.supported.systems/MCP/crawailer/issues"
"Source Code" = "https://git.supported.systems/MCP/crawailer"
"API Reference" = "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/API_REFERENCE.md"
"JavaScript Guide" = "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/JAVASCRIPT_API.md"
"Benchmarks" = "https://git.supported.systems/MCP/crawailer/src/branch/main/docs/BENCHMARKS.md"
Changelog = "https://git.supported.systems/MCP/crawailer/releases"
[project.scripts]
crawailer = "crawailer.cli:main"

View File

@ -5,7 +5,7 @@ A delightful library for web automation and content extraction,
designed for AI agents, MCP servers, and automation scripts.
"""
__version__ = "0.1.0"
__version__ = "0.1.1"
# Core browser control
from .browser import Browser