# Database Schema Documentation This document describes the SQLite database schema for the Claude Code Project Tracker. ## Overview The database is designed to capture comprehensive development workflow data through a normalized relational structure. All timestamps are stored in UTC format. ## Entity Relationship Diagram ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ projects │ │ sessions │ │conversations│ │ │ │ │ │ │ │ id (PK) │◀──┤ project_id │ │ session_id │ │ name │ │ id (PK) │◀──┤ id (PK) │ │ path │ │ start_time │ │ timestamp │ │ git_repo │ │ end_time │ │ user_prompt │ │ created_at │ │ type │ │ claude_resp │ │ ... │ │ ... │ │ ... │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ ┌─────────────┐ ┌─────────────┐ │ activities │ │waiting_perds│ │ │ │ │ │ session_id │ │ session_id │ │ id (PK) │ │ id (PK) │ │ tool_name │ │ start_time │ │ file_path │ │ end_time │ │ timestamp │ │ duration │ │ ... │ │ ... │ └─────────────┘ └─────────────┘ │ │ ┌─────────────┐ │git_operations│ │ │ │ session_id │ │ id (PK) │ │ operation │ │ command │ │ timestamp │ │ ... │ └─────────────┘ ``` ## Table Definitions ### projects Stores metadata about tracked projects. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PRIMARY KEY | Unique project identifier | | name | VARCHAR(255) | NOT NULL | Project display name | | path | TEXT | NOT NULL UNIQUE | Absolute filesystem path | | git_repo | VARCHAR(500) | NULL | Git repository URL if applicable | | languages | JSON | NULL | Array of detected programming languages | | created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | First time project was tracked | | last_session | TIMESTAMP | NULL | Most recent session timestamp | | total_sessions | INTEGER | NOT NULL DEFAULT 0 | Count of development sessions | | total_time_minutes | INTEGER | NOT NULL DEFAULT 0 | Cumulative session duration | | files_modified_count | INTEGER | NOT NULL DEFAULT 0 | Total unique files changed | | lines_changed_count | INTEGER | NOT NULL DEFAULT 0 | Total lines added + removed | **Indexes:** - `idx_projects_path` ON (path) - `idx_projects_last_session` ON (last_session) ### sessions Individual development sessions within projects. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PRIMARY KEY | Unique session identifier | | project_id | INTEGER | NOT NULL FK(projects.id) | Associated project | | start_time | TIMESTAMP | NOT NULL | Session start timestamp | | end_time | TIMESTAMP | NULL | Session end timestamp (NULL if active) | | session_type | VARCHAR(50) | NOT NULL | startup, resume, clear | | working_directory | TEXT | NOT NULL | Directory path when session started | | git_branch | VARCHAR(255) | NULL | Active git branch | | environment | JSON | NULL | System environment details | | duration_minutes | INTEGER | NULL | Calculated session length | | activity_count | INTEGER | NOT NULL DEFAULT 0 | Number of tool uses | | conversation_count | INTEGER | NOT NULL DEFAULT 0 | Number of exchanges | | files_touched | JSON | NULL | Array of file paths accessed | **Indexes:** - `idx_sessions_project_start` ON (project_id, start_time) - `idx_sessions_active` ON (end_time) WHERE end_time IS NULL ### conversations Dialogue exchanges between user and Claude. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PRIMARY KEY | Unique conversation identifier | | session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session | | timestamp | TIMESTAMP | NOT NULL | When exchange occurred | | user_prompt | TEXT | NULL | User's input message | | claude_response | TEXT | NULL | Claude's response | | tools_used | JSON | NULL | Array of tools used in response | | files_affected | JSON | NULL | Array of files mentioned/modified | | context | JSON | NULL | Additional context metadata | | tokens_input | INTEGER | NULL | Estimated input token count | | tokens_output | INTEGER | NULL | Estimated output token count | | exchange_type | VARCHAR(50) | NOT NULL | user_prompt, claude_response | **Indexes:** - `idx_conversations_session_time` ON (session_id, timestamp) - `idx_conversations_search` ON (user_prompt, claude_response) USING fts5 ### activities Tool usage and file operations during development. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PRIMARY KEY | Unique activity identifier | | session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session | | conversation_id | INTEGER | NULL FK(conversations.id) | Associated exchange | | timestamp | TIMESTAMP | NOT NULL | When activity occurred | | tool_name | VARCHAR(50) | NOT NULL | Edit, Write, Read, Bash, etc. | | action | VARCHAR(100) | NOT NULL | Specific action taken | | file_path | TEXT | NULL | Target file path if applicable | | metadata | JSON | NULL | Tool-specific data | | success | BOOLEAN | NOT NULL DEFAULT true | Whether operation succeeded | | error_message | TEXT | NULL | Error details if failed | | lines_added | INTEGER | NULL | Lines added (for Edit/Write) | | lines_removed | INTEGER | NULL | Lines removed (for Edit) | **Indexes:** - `idx_activities_session_time` ON (session_id, timestamp) - `idx_activities_tool_file` ON (tool_name, file_path) ### waiting_periods Time intervals when Claude is waiting for user input. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PRIMARY KEY | Unique waiting period identifier | | session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session | | start_time | TIMESTAMP | NOT NULL | When waiting began | | end_time | TIMESTAMP | NULL | When user responded | | duration_seconds | INTEGER | NULL | Calculated wait duration | | context_before | TEXT | NULL | Claude's last message | | context_after | TEXT | NULL | User's next message | | likely_activity | VARCHAR(50) | NULL | thinking, research, external_work, break | **Indexes:** - `idx_waiting_session_start` ON (session_id, start_time) - `idx_waiting_duration` ON (duration_seconds) ### git_operations Git commands and repository state changes. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PRIMARY KEY | Unique git operation identifier | | session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session | | timestamp | TIMESTAMP | NOT NULL | When operation occurred | | operation | VARCHAR(50) | NOT NULL | commit, push, pull, branch, etc. | | command | TEXT | NOT NULL | Full git command executed | | result | TEXT | NULL | Command output | | success | BOOLEAN | NOT NULL | Whether command succeeded | | files_changed | JSON | NULL | Array of affected files | | lines_added | INTEGER | NULL | Lines added in commit | | lines_removed | INTEGER | NULL | Lines removed in commit | | commit_hash | VARCHAR(40) | NULL | Git commit SHA | | branch_from | VARCHAR(255) | NULL | Source branch | | branch_to | VARCHAR(255) | NULL | Target branch | **Indexes:** - `idx_git_session_time` ON (session_id, timestamp) - `idx_git_operation` ON (operation) - `idx_git_commit` ON (commit_hash) ## Analytics Views ### project_productivity_summary Aggregated productivity metrics per project. ```sql CREATE VIEW project_productivity_summary AS SELECT p.id, p.name, p.path, COUNT(DISTINCT s.id) as total_sessions, SUM(s.duration_minutes) as total_time_minutes, AVG(s.duration_minutes) as avg_session_minutes, COUNT(DISTINCT a.file_path) as unique_files_modified, SUM(a.lines_added + a.lines_removed) as total_lines_changed, AVG(wp.duration_seconds) as avg_think_time_seconds, MAX(s.start_time) as last_activity FROM projects p LEFT JOIN sessions s ON p.id = s.project_id LEFT JOIN activities a ON s.id = a.session_id LEFT JOIN waiting_periods wp ON s.id = wp.session_id GROUP BY p.id, p.name, p.path; ``` ### daily_productivity_metrics Daily productivity trends across all projects. ```sql CREATE VIEW daily_productivity_metrics AS SELECT DATE(s.start_time) as date, COUNT(DISTINCT s.id) as sessions_count, SUM(s.duration_minutes) as total_time_minutes, COUNT(DISTINCT a.file_path) as files_modified, SUM(a.lines_added + a.lines_removed) as lines_changed, AVG(wp.duration_seconds) as avg_think_time, COUNT(DISTINCT s.project_id) as projects_worked_on FROM sessions s LEFT JOIN activities a ON s.id = a.session_id AND a.tool_name IN ('Edit', 'Write') LEFT JOIN waiting_periods wp ON s.id = wp.session_id WHERE s.end_time IS NOT NULL GROUP BY DATE(s.start_time) ORDER BY date DESC; ``` ## Data Retention - **Conversation Full Text**: Stored indefinitely for search and analysis - **Activity Details**: Kept for all operations to maintain complete audit trail - **Analytics Aggregations**: Computed on-demand from source data - **Cleanup**: Manual cleanup tools provided, no automatic data expiration ## Performance Considerations - Database file size grows approximately 1-5MB per day of active development - Full-text search indexes require periodic optimization (`PRAGMA optimize`) - Analytics queries use covering indexes to avoid table scans - Large file content is not stored, only file paths and change metrics ## Migration Strategy Schema migrations are handled through versioned SQL scripts in `/migrations/`: - Each migration has up/down scripts - Version tracking in `schema_versions` table - Automatic backup before migrations - Rollback capability for failed migrations