crawailer/CHANGELOG.md
Crawailer Developer ad37776018 Update repository URLs to MCP organization
- Changed all repository references from github.com/anthropics/crawailer to git.supported.systems/MCP/crawailer
- Updated pyproject.toml URLs for PyPI package metadata
- Updated CHANGELOG.md commit history link
- Ready for PyPI publication with correct repository information
2025-09-18 14:51:10 -06:00

83 lines
3.2 KiB
Markdown

# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Initial release of Crawailer
- Full JavaScript execution support with `page.evaluate()`
- Modern framework support (React, Vue, Angular)
- Comprehensive content extraction with rich metadata
- High-level API functions: `get()`, `get_many()`, `discover()`
- Browser automation with Playwright integration
- Fast HTML processing with selectolax (5-10x faster than BeautifulSoup)
- WebContent dataclass with computed properties
- Async-first design with concurrent processing
- Command-line interface
- MCP (Model Context Protocol) server integration
- Comprehensive test suite with 357+ scenarios
- Local Docker test server for development
- Security hardening with XSS prevention
- Memory management and leak detection
- Cross-browser engine compatibility
- Performance optimization strategies
### Features
- **JavaScript Execution**: Execute arbitrary JavaScript with `script`, `script_before`, `script_after` parameters
- **SPA Support**: Handle React, Vue, Angular, and other modern frameworks
- **Dynamic Content**: Extract content loaded via AJAX, user interactions, and lazy loading
- **Batch Processing**: Process multiple URLs concurrently with intelligent batching
- **Content Quality**: Rich metadata extraction including author, reading time, quality scores
- **Error Handling**: Comprehensive error capture with graceful degradation
- **Performance Monitoring**: Extract timing and memory metrics from pages
- **Framework Detection**: Automatic detection of JavaScript frameworks and versions
- **User Interaction**: Simulate clicks, form submissions, scrolling, and complex workflows
### Documentation
- Complete JavaScript API guide with examples
- Comprehensive API reference documentation
- Performance benchmarks vs Katana crawler
- Testing infrastructure documentation
- Strategic positioning and use case guidance
### Testing
- 18 test files with 16,554+ lines of test code
- Modern framework integration tests
- Mobile browser compatibility tests
- Security and penetration testing
- Memory management and leak detection
- Network resilience and error handling
- Performance under pressure validation
- Browser engine compatibility testing
### Performance
- Intelligent content extraction optimized for LLM consumption
- Concurrent processing with configurable limits
- Memory-efficient batch processing
- Resource cleanup and garbage collection
- Connection pooling and request optimization
### Security
- XSS prevention and input validation
- Script execution sandboxing
- Safe error handling without information leakage
- Comprehensive security test suite
## [0.1.0] - 2024-09-18
### Added
- Initial public release
- Core browser automation functionality
- JavaScript execution capabilities
- Content extraction and processing
- MCP server integration
- Comprehensive documentation
- Production-ready test suite
---
For more details about changes, see the [commit history](https://git.supported.systems/MCP/crawailer/commits/branch/main).