# Crawailer Testing Infrastructure ## Overview Crawailer maintains a comprehensive testing suite designed to validate JavaScript execution capabilities, content extraction quality, and production-ready performance characteristics. The testing infrastructure includes local test servers, comprehensive test scenarios, and automated benchmarking. ## Test Suite Architecture ### Test Coverage Statistics - **18 test files** with **16,554+ lines of test code** - **357+ test scenarios** covering **~92% production coverage** - **Comprehensive validation** from basic functionality to complex edge cases ### Test Categories #### Core Functionality Tests ``` tests/ ├── test_javascript_api.py # 700+ lines - JavaScript execution ├── test_basic.py # Basic content extraction ├── test_browser_integration.py # Browser automation ├── test_content_extraction.py # Content processing └── test_api_functionality.py # High-level API ``` #### Modern Framework Integration ``` ├── test_modern_frameworks.py # React, Vue, Angular compatibility ├── test_mobile_browser_compatibility.py # Mobile device testing └── test_advanced_user_interactions.py # Complex user workflows ``` #### Production Optimization ``` ├── test_production_network_resilience.py # Enterprise network conditions ├── test_platform_edge_cases.py # Linux-specific behaviors ├── test_performance_under_pressure.py # CPU stress, resource exhaustion ├── test_browser_engine_compatibility.py # Cross-engine consistency └── test_memory_management.py # Memory leak detection ``` #### Security and Edge Cases ``` ├── test_security_penetration.py # Security hardening ├── test_regression_suite.py # Regression prevention └── conftest.py # Test configuration ``` ## Local Test Server ### Docker-Based Test Environment The test infrastructure includes a complete local test server with controlled content: ```yaml # test-server/docker-compose.yml services: caddy: image: caddy:2-alpine ports: - "8083:80" volumes: - ./Caddyfile:/etc/caddy/Caddyfile - ./sites:/var/www/html ``` ### Test Sites Structure ``` test-server/sites/ ├── react/ # React demo application │ ├── index.html # Complete React app with hooks │ └── components/ # TodoList, Dashboard, Controls ├── vue/ # Vue 3 demo application │ ├── index.html # Composition API demo │ └── components/ # Reactive components ├── angular/ # Angular 17 demo application │ ├── index.html # TypeScript-like features │ └── services/ # RxJS and dependency injection ├── ecommerce/ # E-commerce simulation │ ├── products.html # Product listings │ └── checkout.html # Purchase workflow ├── api/ # API endpoint simulation │ ├── rest.json # REST API responses │ └── graphql.json # GraphQL responses └── docs/ # Documentation site ├── tutorial.html # Tutorial content └── reference.html # API reference ``` ### Starting Test Infrastructure ```bash # Start local test server cd test-server docker compose up -d # Verify server is running curl http://localhost:8083/health # Run comprehensive test suite cd ../ pytest tests/ -v # Run specific test categories pytest tests/test_javascript_api.py -v pytest tests/test_modern_frameworks.py -v pytest tests/test_memory_management.py -v ``` ## JavaScript API Testing ### Test Categories #### Basic JavaScript Execution ```python # tests/test_javascript_api.py:68-128 async def test_basic_script_execution(): """Test basic JavaScript execution with result capture""" content = await get( "http://localhost:8083/react/", script="document.title" ) assert content.has_script_result assert content.script_result is not None assert not content.has_script_error ``` #### Dynamic Content Extraction ```python async def test_dynamic_content_extraction(): """Test extraction of JavaScript-loaded content""" content = await get( "http://localhost:8083/spa/", script="window.testData?.framework || 'not detected'", wait_for="[data-app]" ) assert content.script_result == "react" ``` #### Before/After Script Patterns ```python async def test_before_after_scripts(): """Test script execution before and after content extraction""" content = await get( "http://localhost:8083/ecommerce/", script_before="document.querySelector('.load-more')?.click()", script_after="document.querySelectorAll('.product').length" ) assert isinstance(content.script_result, dict) assert 'script_before' in content.script_result assert 'script_after' in content.script_result ``` #### Error Handling Validation ```python async def test_javascript_error_handling(): """Test graceful handling of JavaScript errors""" content = await get( "http://localhost:8083/", script="document.querySelector('.nonexistent').click()" ) assert content.has_script_error assert content.script_error is not None assert content.content is not None # Static content still available ``` ### Batch Processing Tests #### Same Script for Multiple URLs ```python async def test_batch_same_script(): """Test applying same script to multiple URLs""" urls = [ "http://localhost:8083/react/", "http://localhost:8083/vue/", "http://localhost:8083/angular/" ] results = await get_many( urls, script="window.testData?.framework || 'unknown'" ) assert len(results) == 3 assert all(r.has_script_result for r in results if r) ``` #### Per-URL Custom Scripts ```python async def test_batch_custom_scripts(): """Test different scripts for different URLs""" urls = ["http://localhost:8083/react/", "http://localhost:8083/vue/"] scripts = [ "React.version || 'React not found'", "Vue.version || 'Vue not found'" ] results = await get_many(urls, script=scripts) assert results[0].script_result != results[1].script_result ``` ## Modern Framework Testing ### React Application Testing ```python # tests/test_modern_frameworks.py:45-89 async def test_react_component_detection(): """Test React application analysis and component detection""" content = await get( "http://localhost:8083/react/", script=""" ({ framework: window.testData?.framework, version: window.React?.version, componentCount: window.testData?.componentCount(), features: window.testData?.detectReactFeatures() }) """ ) result = content.script_result assert result['framework'] == 'react' assert 'version' in result assert result['componentCount'] > 0 assert 'hooks' in result['features'] ``` ### Vue Application Testing ```python async def test_vue_reactivity_system(): """Test Vue reactivity and composition API""" content = await get( "http://localhost:8083/vue/", script=""" ({ framework: window.testData?.framework, hasCompositionAPI: typeof window.Vue?.ref === 'function', reactiveFeatures: window.testData?.checkReactivity() }) """ ) result = content.script_result assert result['framework'] == 'vue' assert result['hasCompositionAPI'] is True ``` ### Angular Application Testing ```python async def test_angular_dependency_injection(): """Test Angular service injection and RxJS integration""" content = await get( "http://localhost:8083/angular/", script=""" ({ framework: window.testData?.framework, hasServices: window.testData?.hasServices(), rxjsIntegration: window.testData?.checkRxJS() }) """ ) result = content.script_result assert result['framework'] == 'angular' assert result['hasServices'] is True ``` ## Performance Testing ### Memory Management Tests ```python # tests/test_memory_management.py:68-128 class TestMemoryBaseline: async def test_memory_baseline_establishment(self): """Test establishing memory usage baseline""" initial_memory = memory_profiler.get_memory_usage() content = await get("http://localhost:8083/memory-test") final_memory = memory_profiler.get_memory_usage() memory_growth = final_memory - initial_memory # Memory growth should be reasonable (under 5MB for single page) assert memory_growth < 5_000_000 ``` ### Performance Under Pressure ```python # tests/test_performance_under_pressure.py:112-165 async def test_cpu_stress_with_web_workers(): """Test handling CPU stress from Web Workers""" stress_script = """ // Create multiple Web Workers for CPU stress const workers = []; for (let i = 0; i < 4; i++) { const worker = new Worker('data:application/javascript,' + encodeURIComponent(` let result = 0; for (let j = 0; j < 1000000; j++) { result += Math.sqrt(j); } postMessage(result); `) ); workers.push(worker); } return 'stress test initiated'; """ content = await get("http://localhost:8083/stress-test", script=stress_script) assert content.script_result == 'stress test initiated' ``` ### Network Resilience Testing ```python # tests/test_production_network_resilience.py:89-142 async def test_enterprise_proxy_configuration(): """Test handling enterprise proxy configurations""" # Simulate enterprise network conditions proxy_config = { 'http_proxy': 'http://proxy.company.com:8080', 'https_proxy': 'https://proxy.company.com:8080', 'no_proxy': 'localhost,127.0.0.1,.company.com' } # Test with proxy simulation content = await get( "http://localhost:8083/enterprise-test", script="navigator.connection?.effectiveType || 'unknown'" ) assert content.script_result in ['4g', '3g', 'slow-2g', 'unknown'] ``` ## Browser Engine Compatibility ### Cross-Engine Testing ```python # tests/test_browser_engine_compatibility.py:67-120 async def test_engine_detection_accuracy(): """Test accurate detection of browser engines""" engines = ['chromium', 'firefox', 'safari', 'edge'] for engine in engines: content = await get( "http://localhost:8083/engine-test", script=""" ({ userAgent: navigator.userAgent, vendor: navigator.vendor, engine: typeof chrome !== 'undefined' ? 'chromium' : typeof InstallTrigger !== 'undefined' ? 'firefox' : /constructor/i.test(window.HTMLElement) ? 'safari' : 'unknown' }) """ ) result = content.script_result assert 'engine' in result assert result['userAgent'] is not None ``` ### JavaScript API Compatibility ```python async def test_javascript_api_compatibility(): """Test JavaScript API consistency across engines""" api_test_script = """ ({ asyncAwait: typeof async function() {} === 'function', promises: typeof Promise !== 'undefined', fetch: typeof fetch !== 'undefined', webWorkers: typeof Worker !== 'undefined', localStorage: typeof localStorage !== 'undefined', sessionStorage: typeof sessionStorage !== 'undefined', indexedDB: typeof indexedDB !== 'undefined' }) """ content = await get("http://localhost:8083/api-test", script=api_test_script) result = content.script_result assert result['asyncAwait'] is True assert result['promises'] is True assert result['fetch'] is True ``` ## Security Testing ### XSS Prevention ```python # tests/test_security_penetration.py:78-125 async def test_xss_script_injection_prevention(): """Test prevention of XSS through script injection""" malicious_script = """ try { eval(''); return 'XSS_SUCCESSFUL'; } catch (e) { return 'XSS_BLOCKED'; } """ content = await get("http://localhost:8083/security-test", script=malicious_script) # Should block or safely handle malicious scripts assert content.script_result == 'XSS_BLOCKED' ``` ### Input Validation ```python async def test_javascript_input_validation(): """Test validation of JavaScript input parameters""" # Test with various malicious inputs malicious_inputs = [ "'; DROP TABLE users; --", "", "javascript:alert('xss')", "eval('malicious code')" ] for malicious_input in malicious_inputs: content = await get( "http://localhost:8083/validation-test", script=f"document.querySelector('.safe').textContent = '{malicious_input}'; 'input processed'" ) # Should handle safely without execution assert content.script_result == 'input processed' assert '