Ryan Malloy 2f82e8d2e0 Implement comprehensive Docker development environment with major performance optimizations
* Docker Infrastructure:
  - Multi-stage Dockerfile.dev with optimized Go proxy configuration
  - Complete compose.dev.yml with service orchestration
  - Fixed critical GOPROXY setting achieving 42x performance improvement
  - Migrated from Poetry to uv for faster Python package management

* Build System Enhancements:
  - Enhanced Mage build system with caching and parallelization
  - Added incremental build capabilities with SHA256 checksums
  - Implemented parallel task execution with dependency resolution
  - Added comprehensive test orchestration targets

* Testing Infrastructure:
  - Complete API testing suite with OpenAPI validation
  - Performance testing with multi-worker simulation
  - Integration testing for end-to-end workflows
  - Database testing with migration validation
  - Docker-based test environments

* Documentation:
  - Comprehensive Docker development guides
  - Performance optimization case study
  - Build system architecture documentation
  - Test infrastructure usage guides

* Performance Results:
  - Build time reduced from 60+ min failures to 9.5 min success
  - Go module downloads: 42x faster (84.2s vs 60+ min timeouts)
  - Success rate: 0% → 100%
  - Developer onboarding: days → 10 minutes

Fixes critical Docker build failures and establishes production-ready
containerized development environment with comprehensive testing.
2025-09-09 12:11:08 -06:00

15 KiB

title weight description
Architecture Guide 40 Deep dive into Docker optimization principles, architectural decisions, and design philosophy

Docker Architecture Guide

This guide explains the architectural principles, optimization strategies, and design decisions behind Flamenco's Docker development environment. Understanding these concepts helps you appreciate why the system works reliably and how to extend it effectively.

The Optimization Journey

The Original Problem

The original Docker setup suffered from fundamental architectural flaws that made development virtually impossible:

  1. Network Reliability Issues: Using GOPROXY=direct forced Go to clone repositories directly from Git, creating network failures after 60+ minutes of downloading
  2. Platform Incompatibility: Alpine Linux differences weren't properly addressed, causing Python tooling failures
  3. Inefficient Caching: Poor layer organization meant dependency changes invalidated the entire build
  4. Mount Conflicts: Docker bind mounts overwrote compiled binaries, causing runtime failures

The result was a 100% failure rate with builds that never completed successfully.

The Transformation

The optimized architecture transformed this broken system into a reliable development platform:

  • 42x faster Go module downloads (84.2 seconds vs 60+ minute failures)
  • 100% build success rate (vs 100% failure rate)
  • 9.5-minute total build time (vs indefinite failures)
  • Comprehensive testing integration with Playwright validation

This wasn't just an incremental improvement - it was a complete architectural overhaul.

Core Architectural Principles

1. Network-First Design

Philosophy: In containerized environments, network reliability trumps everything else.

Implementation:

ENV GOPROXY=https://proxy.golang.org,direct
ENV GOSUMDB=sum.golang.org

Why this works:

  • Go proxy servers have better uptime than individual Git repositories
  • Proxies provide pre-fetched, cached modules
  • Checksum verification ensures integrity while enabling caching
  • Fallback to direct access maintains flexibility

Alternative approaches considered:

  • Private proxy servers (too complex for development)
  • Vendor directories (poor development experience)
  • Module replacement directives (brittle maintenance)

2. Multi-Stage Build Strategy

Philosophy: Separate concerns into cacheable layers that reflect the development workflow.

Stage Architecture:

Base → Dependencies → Build-tools → Development
  ↓                                     ↓
Tools                               Production

Design rationale:

Base Stage: Common system dependencies that rarely change

  • Alpine packages (git, make, Node.js, Python, Java)
  • Go environment configuration
  • System-level optimizations

Dependencies Stage: Language-specific dependencies

  • Go modules (cached separately from source)
  • Node.js packages (yarn with frozen lockfile)
  • Python packages (modern uv tool vs legacy Poetry)

Build-tools Stage: Flamenco-specific build infrastructure

  • Mage compilation
  • Code generators (OpenAPI, mock generation)
  • Build environment preparation

Development Stage: Full development environment

  • Hot-reloading tools (Air, CompileDaemon)
  • Built binaries with proper placement
  • All development capabilities

Production Stage: Minimal runtime environment

  • Only runtime dependencies
  • Security-hardened (non-root user)
  • Smallest possible attack surface

This separation ensures that source code changes (frequent) don't invalidate dependency layers (expensive to rebuild).

3. Intelligent Caching Strategy

Philosophy: Optimize for the 90% case where developers change source code, not dependencies.

Cache Hierarchy:

  1. System packages (changes quarterly)
  2. Language dependencies (changes monthly)
  3. Build tools (changes rarely)
  4. Application code (changes hourly)

Volume Strategy:

volumes:
  - go-mod-cache:/go/pkg/mod      # Persistent Go module cache
  - yarn-cache:/usr/local/share/.cache/yarn  # Persistent npm cache
  - .:/app                        # Source code (ephemeral)
  - /app/node_modules            # Prevent cache override

Why this works:

  • Build artifacts persist between container rebuilds
  • Source changes don't invalidate expensive operations
  • Cache warming happens once per environment
  • Development iterations are near-instantaneous

4. Platform Compatibility Strategy

Philosophy: Handle platform differences explicitly rather than hoping they don't matter.

Python Package Management: The migration from Poetry to uv exemplifies this principle:

# Before: Assumed pip exists
RUN pip install poetry

# After: Explicit platform compatibility
RUN apk add --no-cache python3 py3-pip
RUN pip3 install --no-cache-dir --break-system-packages uv

Why uv vs Poetry:

  • Speed: Rust-based implementation is 2-3x faster
  • Memory: Lower resource consumption during resolution
  • Standards: Better PEP compliance and modern Python tooling integration
  • Caching: More efficient dependency caching mechanisms

Binary Placement Strategy:

# Copy binaries to system location to avoid mount conflicts
RUN cp flamenco-manager /usr/local/bin/ && cp flamenco-worker /usr/local/bin/

This prevents Docker bind mounts from overriding compiled binaries, a subtle but critical issue in development environments.

Service Architecture

Container Orchestration Philosophy

Design Principle: Each container should have a single, clear responsibility, but containers should compose seamlessly.

Core Services:

flamenco-manager: Central coordination

  • Handles job scheduling and API
  • Serves web interface
  • Manages database and shared storage
  • Provides debugging endpoints

flamenco-worker: Task execution

  • Connects to manager automatically
  • Executes render tasks
  • Manages local task state
  • Reports status back to manager

Storage Services: Data persistence

  • flamenco-data: Database files and configuration
  • flamenco-shared: Render assets and outputs
  • Cache volumes: Build artifacts and dependencies

Development vs Production Philosophy

Development Priority: Developer experience and debugging capability

  • All debugging endpoints enabled
  • Hot-reloading for rapid iteration
  • Comprehensive logging and monitoring
  • Source code mounted for live editing

Production Priority: Security and resource efficiency

  • Minimal runtime dependencies
  • Non-root execution
  • Read-only filesystems where possible
  • Resource limits and health monitoring

Shared Infrastructure: Both environments use identical:

  • Database schemas and migrations
  • API contracts and interfaces
  • Core business logic
  • Network protocols and data formats

This ensures development-production parity while optimizing for different use cases.

Network Architecture

Service Discovery Strategy

Philosophy: Use Docker's built-in networking rather than external service discovery.

Implementation:

networks:
  flamenco-net:
    driver: bridge
    name: ${COMPOSE_PROJECT_NAME}-network

Benefits:

  • Automatic DNS resolution (flamenco-manager.flamenco-net)
  • Network isolation from host and other projects
  • Predictable performance characteristics
  • Simple debugging and troubleshooting

Reverse Proxy Integration

Philosophy: Support both direct access (development) and proxy access (production).

Caddy Integration:

labels:
  caddy: manager.${DOMAIN}
  caddy.reverse_proxy: "{{upstreams 8080}}"
  caddy.header: "X-Forwarded-Proto https"

This enables:

  • Automatic HTTPS certificate management
  • Load balancing across multiple instances
  • Centralized access logging and monitoring
  • Development/production environment consistency

Data Architecture

Database Strategy

Philosophy: Embed the database for development simplicity, but design for external databases in production.

SQLite for Development:

  • Zero configuration overhead
  • Consistent behavior across platforms
  • Easy backup and restoration
  • Perfect for single-developer workflows

Migration Strategy:

  • All schema changes via versioned migrations
  • Automatic application on startup
  • Manual control available for development
  • Database state explicitly managed

File Organization:

/data/
├── flamenco-manager.sqlite    # Manager database
└── shaman-storage/            # Asset storage (optional)

/shared-storage/
├── projects/                  # Project files
├── renders/                   # Render outputs  
└── assets/                    # Shared assets

Storage Philosophy

Principle: Separate ephemeral data (containers) from persistent data (volumes).

Volume Strategy:

  • Application data: Database files, configuration, logs
  • Shared storage: Render assets, project files, outputs
  • Cache data: Dependency downloads, build artifacts
  • Source code: Development mounts (bind mounts)

This separation enables:

  • Container replacement without data loss
  • Backup and restoration strategies
  • Development environment reset capabilities
  • Production deployment flexibility

Performance Architecture

Build Performance Strategy

Philosophy: Optimize for the critical path while maintaining reliability.

Critical Path Analysis:

  1. System packages (377.2 seconds / 6.3 minutes) - Unavoidable, but cacheable
  2. Go modules (84.2 seconds) - Optimized via proxy (42x improvement)
  3. Python deps (54.4 seconds) - Optimized via uv
  4. Node.js deps (6.2 seconds) - Already efficient
  5. Code generation (17.7 seconds) - Cacheable
  6. Binary compilation (12.2 seconds) - Cacheable

Optimization Strategies:

  • Proxy utilization: Leverage external caches when possible
  • Tool selection: Choose faster, native implementations
  • Layer organization: Expensive operations in stable layers
  • Parallel execution: Independent operations run concurrently

Runtime Performance Considerations

Memory Management:

  • Go applications: Minimal runtime overhead
  • Alpine base: ~5MB base footprint
  • Development tools: Only loaded when needed
  • Cache warming: Amortized across development sessions

Resource Scaling:

deploy:
  resources:
    limits:
      memory: 1G      # Manager
      memory: 512M    # Worker

These limits prevent resource contention while allowing burst capacity for intensive operations.

Testing and Validation Architecture

Playwright Integration Philosophy

Principle: Test the system as users experience it, not as developers build it.

Testing Strategy:

  • End-to-end validation: Complete setup wizard flow
  • Real browser interaction: Actual user interface testing
  • Network validation: WebSocket and API communication
  • Visual verification: Screenshot comparison capabilities

Integration Points:

  • Automatic startup verification
  • Worker connection testing
  • Web interface functionality validation
  • Real-time communication testing

This ensures the optimized Docker environment actually delivers a working system, not just a system that builds successfully.

Security Architecture

Development Security Model

Philosophy: Balance security with developer productivity.

Development Compromises:

  • Authentication disabled for ease of access
  • CORS allows all origins for development tools
  • Debug endpoints exposed for troubleshooting
  • Bind mounts provide direct file system access

Compensating Controls:

  • Network isolation (Docker networks)
  • Local-only binding (not accessible externally)
  • Explicit environment marking
  • Clear documentation of security implications

Production Security Hardening

Philosophy: Secure by default with explicit overrides for development.

Production Security Features:

  • Non-root container execution
  • Minimal runtime dependencies
  • Read-only filesystems where possible
  • No development tools in production images
  • Network policy enforcement capabilities

Design Trade-offs and Alternatives

Why Alpine Linux?

Chosen: Alpine Linux as base image Alternative considered: Ubuntu/Debian

Trade-offs:

  • Pro: Smaller images, faster builds, security-focused
  • Con: Package compatibility issues (pip vs pip3)
  • Decision: Explicit compatibility handling provides best of both worlds

Why Multi-stage vs Single Stage?

Chosen: Multi-stage builds Alternative considered: Single large stage

Trade-offs:

  • Pro: Better caching, smaller production images, separation of concerns
  • Con: More complex Dockerfile, debugging across stages
  • Decision: Build complexity worth it for runtime benefits

Why uv vs Poetry?

Chosen: uv for Python package management Alternative considered: Poetry, pip-tools

Trade-offs:

  • Pro: 2-3x faster, lower memory, better standards compliance
  • Con: Newer tool, less ecosystem familiarity
  • Decision: Performance gains justify learning curve

Why Docker Compose vs Kubernetes?

Chosen: Docker Compose for development Alternative considered: Kubernetes, raw Docker

Trade-offs:

  • Pro: Simpler setup, better development experience, easier debugging
  • Con: Not production-identical, limited scaling options
  • Decision: Development optimized for developer productivity

Extensibility Architecture

Adding New Services

Pattern: Follow the established service template:

  1. Add to compose.dev.yml with consistent patterns
  2. Use the same volume and network strategies
  3. Implement health checks and logging
  4. Add corresponding Makefile targets
  5. Document configuration variables

Adding Build Steps

Pattern: Integrate with the multi-stage strategy:

  1. Determine appropriate stage for new step
  2. Consider caching implications
  3. Add environment variables for configuration
  4. Test impact on build performance
  5. Update documentation

Platform Extensions

Pattern: Use the variable system for platform differences:

  1. Add platform-specific variables to .env
  2. Configure service environment appropriately
  3. Test across different development platforms
  4. Document platform-specific requirements

Conclusion: Architecture as Problem-Solving

The Flamenco Docker architecture represents a systematic approach to solving real development problems:

  1. Network reliability through intelligent proxy usage
  2. Build performance through multi-stage optimization
  3. Developer experience through comprehensive tooling
  4. Production readiness through security hardening
  5. Maintainability through clear separation of concerns

The 42x performance improvement and 100% reliability gain weren't achieved through a single optimization, but through systematic application of architectural principles that compound to create a robust development platform.

This architecture serves as a template for containerizing complex, multi-language development environments while maintaining both performance and reliability. The principles apply beyond Flamenco to any system requiring fast, reliable Docker-based development workflows.


The architecture reflects iterative improvement based on real-world usage rather than theoretical optimization - each decision was made to solve actual problems encountered during Flamenco development.