# Crawailer Testing Infrastructure
## Overview
Crawailer maintains a comprehensive testing suite designed to validate JavaScript execution capabilities, content extraction quality, and production-ready performance characteristics. The testing infrastructure includes local test servers, comprehensive test scenarios, and automated benchmarking.
## Test Suite Architecture
### Test Coverage Statistics
- **18 test files** with **16,554+ lines of test code**
- **357+ test scenarios** covering **~92% production coverage**
- **Comprehensive validation** from basic functionality to complex edge cases
### Test Categories
#### Core Functionality Tests
```
tests/
├── test_javascript_api.py # 700+ lines - JavaScript execution
├── test_basic.py # Basic content extraction
├── test_browser_integration.py # Browser automation
├── test_content_extraction.py # Content processing
└── test_api_functionality.py # High-level API
```
#### Modern Framework Integration
```
├── test_modern_frameworks.py # React, Vue, Angular compatibility
├── test_mobile_browser_compatibility.py # Mobile device testing
└── test_advanced_user_interactions.py # Complex user workflows
```
#### Production Optimization
```
├── test_production_network_resilience.py # Enterprise network conditions
├── test_platform_edge_cases.py # Linux-specific behaviors
├── test_performance_under_pressure.py # CPU stress, resource exhaustion
├── test_browser_engine_compatibility.py # Cross-engine consistency
└── test_memory_management.py # Memory leak detection
```
#### Security and Edge Cases
```
├── test_security_penetration.py # Security hardening
├── test_regression_suite.py # Regression prevention
└── conftest.py # Test configuration
```
## Local Test Server
### Docker-Based Test Environment
The test infrastructure includes a complete local test server with controlled content:
```yaml
# test-server/docker-compose.yml
services:
caddy:
image: caddy:2-alpine
ports:
- "8083:80"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- ./sites:/var/www/html
```
### Test Sites Structure
```
test-server/sites/
├── react/ # React demo application
│ ├── index.html # Complete React app with hooks
│ └── components/ # TodoList, Dashboard, Controls
├── vue/ # Vue 3 demo application
│ ├── index.html # Composition API demo
│ └── components/ # Reactive components
├── angular/ # Angular 17 demo application
│ ├── index.html # TypeScript-like features
│ └── services/ # RxJS and dependency injection
├── ecommerce/ # E-commerce simulation
│ ├── products.html # Product listings
│ └── checkout.html # Purchase workflow
├── api/ # API endpoint simulation
│ ├── rest.json # REST API responses
│ └── graphql.json # GraphQL responses
└── docs/ # Documentation site
├── tutorial.html # Tutorial content
└── reference.html # API reference
```
### Starting Test Infrastructure
```bash
# Start local test server
cd test-server
docker compose up -d
# Verify server is running
curl http://localhost:8083/health
# Run comprehensive test suite
cd ../
pytest tests/ -v
# Run specific test categories
pytest tests/test_javascript_api.py -v
pytest tests/test_modern_frameworks.py -v
pytest tests/test_memory_management.py -v
```
## JavaScript API Testing
### Test Categories
#### Basic JavaScript Execution
```python
# tests/test_javascript_api.py:68-128
async def test_basic_script_execution():
"""Test basic JavaScript execution with result capture"""
content = await get(
"http://localhost:8083/react/",
script="document.title"
)
assert content.has_script_result
assert content.script_result is not None
assert not content.has_script_error
```
#### Dynamic Content Extraction
```python
async def test_dynamic_content_extraction():
"""Test extraction of JavaScript-loaded content"""
content = await get(
"http://localhost:8083/spa/",
script="window.testData?.framework || 'not detected'",
wait_for="[data-app]"
)
assert content.script_result == "react"
```
#### Before/After Script Patterns
```python
async def test_before_after_scripts():
"""Test script execution before and after content extraction"""
content = await get(
"http://localhost:8083/ecommerce/",
script_before="document.querySelector('.load-more')?.click()",
script_after="document.querySelectorAll('.product').length"
)
assert isinstance(content.script_result, dict)
assert 'script_before' in content.script_result
assert 'script_after' in content.script_result
```
#### Error Handling Validation
```python
async def test_javascript_error_handling():
"""Test graceful handling of JavaScript errors"""
content = await get(
"http://localhost:8083/",
script="document.querySelector('.nonexistent').click()"
)
assert content.has_script_error
assert content.script_error is not None
assert content.content is not None # Static content still available
```
### Batch Processing Tests
#### Same Script for Multiple URLs
```python
async def test_batch_same_script():
"""Test applying same script to multiple URLs"""
urls = [
"http://localhost:8083/react/",
"http://localhost:8083/vue/",
"http://localhost:8083/angular/"
]
results = await get_many(
urls,
script="window.testData?.framework || 'unknown'"
)
assert len(results) == 3
assert all(r.has_script_result for r in results if r)
```
#### Per-URL Custom Scripts
```python
async def test_batch_custom_scripts():
"""Test different scripts for different URLs"""
urls = ["http://localhost:8083/react/", "http://localhost:8083/vue/"]
scripts = [
"React.version || 'React not found'",
"Vue.version || 'Vue not found'"
]
results = await get_many(urls, script=scripts)
assert results[0].script_result != results[1].script_result
```
## Modern Framework Testing
### React Application Testing
```python
# tests/test_modern_frameworks.py:45-89
async def test_react_component_detection():
"""Test React application analysis and component detection"""
content = await get(
"http://localhost:8083/react/",
script="""
({
framework: window.testData?.framework,
version: window.React?.version,
componentCount: window.testData?.componentCount(),
features: window.testData?.detectReactFeatures()
})
"""
)
result = content.script_result
assert result['framework'] == 'react'
assert 'version' in result
assert result['componentCount'] > 0
assert 'hooks' in result['features']
```
### Vue Application Testing
```python
async def test_vue_reactivity_system():
"""Test Vue reactivity and composition API"""
content = await get(
"http://localhost:8083/vue/",
script="""
({
framework: window.testData?.framework,
hasCompositionAPI: typeof window.Vue?.ref === 'function',
reactiveFeatures: window.testData?.checkReactivity()
})
"""
)
result = content.script_result
assert result['framework'] == 'vue'
assert result['hasCompositionAPI'] is True
```
### Angular Application Testing
```python
async def test_angular_dependency_injection():
"""Test Angular service injection and RxJS integration"""
content = await get(
"http://localhost:8083/angular/",
script="""
({
framework: window.testData?.framework,
hasServices: window.testData?.hasServices(),
rxjsIntegration: window.testData?.checkRxJS()
})
"""
)
result = content.script_result
assert result['framework'] == 'angular'
assert result['hasServices'] is True
```
## Performance Testing
### Memory Management Tests
```python
# tests/test_memory_management.py:68-128
class TestMemoryBaseline:
async def test_memory_baseline_establishment(self):
"""Test establishing memory usage baseline"""
initial_memory = memory_profiler.get_memory_usage()
content = await get("http://localhost:8083/memory-test")
final_memory = memory_profiler.get_memory_usage()
memory_growth = final_memory - initial_memory
# Memory growth should be reasonable (under 5MB for single page)
assert memory_growth < 5_000_000
```
### Performance Under Pressure
```python
# tests/test_performance_under_pressure.py:112-165
async def test_cpu_stress_with_web_workers():
"""Test handling CPU stress from Web Workers"""
stress_script = """
// Create multiple Web Workers for CPU stress
const workers = [];
for (let i = 0; i < 4; i++) {
const worker = new Worker('data:application/javascript,' +
encodeURIComponent(`
let result = 0;
for (let j = 0; j < 1000000; j++) {
result += Math.sqrt(j);
}
postMessage(result);
`)
);
workers.push(worker);
}
return 'stress test initiated';
"""
content = await get("http://localhost:8083/stress-test", script=stress_script)
assert content.script_result == 'stress test initiated'
```
### Network Resilience Testing
```python
# tests/test_production_network_resilience.py:89-142
async def test_enterprise_proxy_configuration():
"""Test handling enterprise proxy configurations"""
# Simulate enterprise network conditions
proxy_config = {
'http_proxy': 'http://proxy.company.com:8080',
'https_proxy': 'https://proxy.company.com:8080',
'no_proxy': 'localhost,127.0.0.1,.company.com'
}
# Test with proxy simulation
content = await get(
"http://localhost:8083/enterprise-test",
script="navigator.connection?.effectiveType || 'unknown'"
)
assert content.script_result in ['4g', '3g', 'slow-2g', 'unknown']
```
## Browser Engine Compatibility
### Cross-Engine Testing
```python
# tests/test_browser_engine_compatibility.py:67-120
async def test_engine_detection_accuracy():
"""Test accurate detection of browser engines"""
engines = ['chromium', 'firefox', 'safari', 'edge']
for engine in engines:
content = await get(
"http://localhost:8083/engine-test",
script="""
({
userAgent: navigator.userAgent,
vendor: navigator.vendor,
engine: typeof chrome !== 'undefined' ? 'chromium' :
typeof InstallTrigger !== 'undefined' ? 'firefox' :
/constructor/i.test(window.HTMLElement) ? 'safari' :
'unknown'
})
"""
)
result = content.script_result
assert 'engine' in result
assert result['userAgent'] is not None
```
### JavaScript API Compatibility
```python
async def test_javascript_api_compatibility():
"""Test JavaScript API consistency across engines"""
api_test_script = """
({
asyncAwait: typeof async function() {} === 'function',
promises: typeof Promise !== 'undefined',
fetch: typeof fetch !== 'undefined',
webWorkers: typeof Worker !== 'undefined',
localStorage: typeof localStorage !== 'undefined',
sessionStorage: typeof sessionStorage !== 'undefined',
indexedDB: typeof indexedDB !== 'undefined'
})
"""
content = await get("http://localhost:8083/api-test", script=api_test_script)
result = content.script_result
assert result['asyncAwait'] is True
assert result['promises'] is True
assert result['fetch'] is True
```
## Security Testing
### XSS Prevention
```python
# tests/test_security_penetration.py:78-125
async def test_xss_script_injection_prevention():
"""Test prevention of XSS through script injection"""
malicious_script = """
try {
eval('');
return 'XSS_SUCCESSFUL';
} catch (e) {
return 'XSS_BLOCKED';
}
"""
content = await get("http://localhost:8083/security-test", script=malicious_script)
# Should block or safely handle malicious scripts
assert content.script_result == 'XSS_BLOCKED'
```
### Input Validation
```python
async def test_javascript_input_validation():
"""Test validation of JavaScript input parameters"""
# Test with various malicious inputs
malicious_inputs = [
"'; DROP TABLE users; --",
"",
"javascript:alert('xss')",
"eval('malicious code')"
]
for malicious_input in malicious_inputs:
content = await get(
"http://localhost:8083/validation-test",
script=f"document.querySelector('.safe').textContent = '{malicious_input}'; 'input processed'"
)
# Should handle safely without execution
assert content.script_result == 'input processed'
assert '