diff --git a/README.md b/README.md index a86e4c4..ec79613 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ # ๐ท๏ธ Crawailer -**The JavaScript-first web scraper that actually works with modern websites** +**The web scraper that doesn't suck at JavaScript** โจ -> **Finally!** A Python library that handles React, Vue, Angular, and dynamic content without the headaches. When `requests` fails and Selenium feels like overkill, Crawailer delivers clean, AI-ready content extraction with bulletproof JavaScript execution. +> **Stop fighting modern websites.** While `requests` gives you empty `
`, Crawailer actually executes JavaScript and extracts real content from React, Vue, and Angular apps. Finally, web scraping that works in 2025. -> โก **Perfect for Claude Code MCP servers** - Built specifically for AI agents and MCP integration +> โก **Claude Code's new best friend** - Your AI assistant can now access ANY website ```python pip install crawailer @@ -13,15 +13,25 @@ pip install crawailer [](https://badge.fury.io/py/crawailer) [](https://pypi.org/project/crawailer/) -## โจ Features +## โจ Why Developers Choose Crawailer -- **๐ฏ JavaScript-First**: Executes real JavaScript on React, Vue, Angular sites (unlike `requests`) -- **โก Lightning Fast**: 5-10x faster HTML processing with C-based selectolax -- **๐ค AI-Optimized**: Clean markdown output perfect for LLM training and RAG -- **๐ง Claude Code Ready**: Drop-in MCP server integration for AI agents -- **๐ฆ Zero Config**: Works immediately with sensible defaults -- **๐งช Battle-Tested**: 18 comprehensive test suites with 70+ real-world scenarios -- **๐จ Developer Joy**: Rich terminal output, helpful errors, progress tracking +**๐ฅ JavaScript That Actually Works** +While other tools timeout or crash, Crawailer executes real JavaScript like a human browser + +**โก Stupidly Fast** +5-10x faster than BeautifulSoup with C-based parsing that doesn't make you wait + +**๐ค AI Assistant Ready** +Perfect markdown output that your Claude/GPT/local model will love + +**๐ฏ Zero Learning Curve** +`pip install` โ works immediately โ no 47-page configuration guides + +**๐งช Production Battle-Tested** +18 comprehensive test suites covering every edge case we could think of + +**๐จ Actually Enjoyable** +Rich terminal output, helpful errors, progress bars that don't lie ## ๐ Quick Start @@ -66,7 +76,7 @@ research = await web.discover( ### ๐ง Claude Code MCP Integration -> "Oh yeah, tell the model that writes your code for you, this is how it works:" +> *"Hey Claude, go grab that data from the React app"* โ This actually works now ```python # Add to your Claude Code MCP server @@ -83,8 +93,9 @@ async def extract_content(url: str, script: str = ""): "word_count": content.word_count } -# Now Claude can extract content from ANY website, including SPAs! -# No more "I can't access that site" - Claude gets real, live web data +# ๐ No more "I can't access that site" +# ๐ No more copy-pasting content manually +# ๐ Your AI can now browse the web like a human ``` ## ๐ฏ Design Philosophy @@ -298,16 +309,18 @@ MIT License - see [LICENSE](LICENSE) for details. --- -## ๐ Ready to Stop Fighting JavaScript? +## ๐ Ready to Stop Losing Your Mind? ```bash pip install crawailer crawailer setup # Install browser engines ``` -**Join the revolution**: Stop losing data to `requests.get()` failures. Start extracting **real content** from **real websites** that actually use JavaScript. +**Life's too short** for empty `