claude-code-tracker/docs/database-schema.md
Ryan Malloy 44ed9936b7 Initial commit: Claude Code Project Tracker
Add comprehensive development intelligence system that tracks:
- Development sessions with automatic start/stop
- Full conversation history with semantic search
- Tool usage and file operation analytics
- Think time and engagement analysis
- Git activity correlation
- Learning pattern recognition
- Productivity insights and metrics

Features:
- FastAPI backend with SQLite database
- Modern web dashboard with interactive charts
- Claude Code hook integration for automatic tracking
- Comprehensive test suite with 100+ tests
- Complete API documentation (OpenAPI/Swagger)
- Privacy-first design with local data storage

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 02:59:21 -06:00

252 lines
11 KiB
Markdown

# Database Schema Documentation
This document describes the SQLite database schema for the Claude Code Project Tracker.
## Overview
The database is designed to capture comprehensive development workflow data through a normalized relational structure. All timestamps are stored in UTC format.
## Entity Relationship Diagram
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ projects │ │ sessions │ │conversations│
│ │ │ │ │ │
│ id (PK) │◀──┤ project_id │ │ session_id │
│ name │ │ id (PK) │◀──┤ id (PK) │
│ path │ │ start_time │ │ timestamp │
│ git_repo │ │ end_time │ │ user_prompt │
│ created_at │ │ type │ │ claude_resp │
│ ... │ │ ... │ │ ... │
└─────────────┘ └─────────────┘ └─────────────┘
┌─────────────┐ ┌─────────────┐
│ activities │ │waiting_perds│
│ │ │ │
│ session_id │ │ session_id │
│ id (PK) │ │ id (PK) │
│ tool_name │ │ start_time │
│ file_path │ │ end_time │
│ timestamp │ │ duration │
│ ... │ │ ... │
└─────────────┘ └─────────────┘
┌─────────────┐
│git_operations│
│ │
│ session_id │
│ id (PK) │
│ operation │
│ command │
│ timestamp │
│ ... │
└─────────────┘
```
## Table Definitions
### projects
Stores metadata about tracked projects.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PRIMARY KEY | Unique project identifier |
| name | VARCHAR(255) | NOT NULL | Project display name |
| path | TEXT | NOT NULL UNIQUE | Absolute filesystem path |
| git_repo | VARCHAR(500) | NULL | Git repository URL if applicable |
| languages | JSON | NULL | Array of detected programming languages |
| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | First time project was tracked |
| last_session | TIMESTAMP | NULL | Most recent session timestamp |
| total_sessions | INTEGER | NOT NULL DEFAULT 0 | Count of development sessions |
| total_time_minutes | INTEGER | NOT NULL DEFAULT 0 | Cumulative session duration |
| files_modified_count | INTEGER | NOT NULL DEFAULT 0 | Total unique files changed |
| lines_changed_count | INTEGER | NOT NULL DEFAULT 0 | Total lines added + removed |
**Indexes:**
- `idx_projects_path` ON (path)
- `idx_projects_last_session` ON (last_session)
### sessions
Individual development sessions within projects.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PRIMARY KEY | Unique session identifier |
| project_id | INTEGER | NOT NULL FK(projects.id) | Associated project |
| start_time | TIMESTAMP | NOT NULL | Session start timestamp |
| end_time | TIMESTAMP | NULL | Session end timestamp (NULL if active) |
| session_type | VARCHAR(50) | NOT NULL | startup, resume, clear |
| working_directory | TEXT | NOT NULL | Directory path when session started |
| git_branch | VARCHAR(255) | NULL | Active git branch |
| environment | JSON | NULL | System environment details |
| duration_minutes | INTEGER | NULL | Calculated session length |
| activity_count | INTEGER | NOT NULL DEFAULT 0 | Number of tool uses |
| conversation_count | INTEGER | NOT NULL DEFAULT 0 | Number of exchanges |
| files_touched | JSON | NULL | Array of file paths accessed |
**Indexes:**
- `idx_sessions_project_start` ON (project_id, start_time)
- `idx_sessions_active` ON (end_time) WHERE end_time IS NULL
### conversations
Dialogue exchanges between user and Claude.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PRIMARY KEY | Unique conversation identifier |
| session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session |
| timestamp | TIMESTAMP | NOT NULL | When exchange occurred |
| user_prompt | TEXT | NULL | User's input message |
| claude_response | TEXT | NULL | Claude's response |
| tools_used | JSON | NULL | Array of tools used in response |
| files_affected | JSON | NULL | Array of files mentioned/modified |
| context | JSON | NULL | Additional context metadata |
| tokens_input | INTEGER | NULL | Estimated input token count |
| tokens_output | INTEGER | NULL | Estimated output token count |
| exchange_type | VARCHAR(50) | NOT NULL | user_prompt, claude_response |
**Indexes:**
- `idx_conversations_session_time` ON (session_id, timestamp)
- `idx_conversations_search` ON (user_prompt, claude_response) USING fts5
### activities
Tool usage and file operations during development.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PRIMARY KEY | Unique activity identifier |
| session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session |
| conversation_id | INTEGER | NULL FK(conversations.id) | Associated exchange |
| timestamp | TIMESTAMP | NOT NULL | When activity occurred |
| tool_name | VARCHAR(50) | NOT NULL | Edit, Write, Read, Bash, etc. |
| action | VARCHAR(100) | NOT NULL | Specific action taken |
| file_path | TEXT | NULL | Target file path if applicable |
| metadata | JSON | NULL | Tool-specific data |
| success | BOOLEAN | NOT NULL DEFAULT true | Whether operation succeeded |
| error_message | TEXT | NULL | Error details if failed |
| lines_added | INTEGER | NULL | Lines added (for Edit/Write) |
| lines_removed | INTEGER | NULL | Lines removed (for Edit) |
**Indexes:**
- `idx_activities_session_time` ON (session_id, timestamp)
- `idx_activities_tool_file` ON (tool_name, file_path)
### waiting_periods
Time intervals when Claude is waiting for user input.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PRIMARY KEY | Unique waiting period identifier |
| session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session |
| start_time | TIMESTAMP | NOT NULL | When waiting began |
| end_time | TIMESTAMP | NULL | When user responded |
| duration_seconds | INTEGER | NULL | Calculated wait duration |
| context_before | TEXT | NULL | Claude's last message |
| context_after | TEXT | NULL | User's next message |
| likely_activity | VARCHAR(50) | NULL | thinking, research, external_work, break |
**Indexes:**
- `idx_waiting_session_start` ON (session_id, start_time)
- `idx_waiting_duration` ON (duration_seconds)
### git_operations
Git commands and repository state changes.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PRIMARY KEY | Unique git operation identifier |
| session_id | INTEGER | NOT NULL FK(sessions.id) | Associated session |
| timestamp | TIMESTAMP | NOT NULL | When operation occurred |
| operation | VARCHAR(50) | NOT NULL | commit, push, pull, branch, etc. |
| command | TEXT | NOT NULL | Full git command executed |
| result | TEXT | NULL | Command output |
| success | BOOLEAN | NOT NULL | Whether command succeeded |
| files_changed | JSON | NULL | Array of affected files |
| lines_added | INTEGER | NULL | Lines added in commit |
| lines_removed | INTEGER | NULL | Lines removed in commit |
| commit_hash | VARCHAR(40) | NULL | Git commit SHA |
| branch_from | VARCHAR(255) | NULL | Source branch |
| branch_to | VARCHAR(255) | NULL | Target branch |
**Indexes:**
- `idx_git_session_time` ON (session_id, timestamp)
- `idx_git_operation` ON (operation)
- `idx_git_commit` ON (commit_hash)
## Analytics Views
### project_productivity_summary
Aggregated productivity metrics per project.
```sql
CREATE VIEW project_productivity_summary AS
SELECT
p.id,
p.name,
p.path,
COUNT(DISTINCT s.id) as total_sessions,
SUM(s.duration_minutes) as total_time_minutes,
AVG(s.duration_minutes) as avg_session_minutes,
COUNT(DISTINCT a.file_path) as unique_files_modified,
SUM(a.lines_added + a.lines_removed) as total_lines_changed,
AVG(wp.duration_seconds) as avg_think_time_seconds,
MAX(s.start_time) as last_activity
FROM projects p
LEFT JOIN sessions s ON p.id = s.project_id
LEFT JOIN activities a ON s.id = a.session_id
LEFT JOIN waiting_periods wp ON s.id = wp.session_id
GROUP BY p.id, p.name, p.path;
```
### daily_productivity_metrics
Daily productivity trends across all projects.
```sql
CREATE VIEW daily_productivity_metrics AS
SELECT
DATE(s.start_time) as date,
COUNT(DISTINCT s.id) as sessions_count,
SUM(s.duration_minutes) as total_time_minutes,
COUNT(DISTINCT a.file_path) as files_modified,
SUM(a.lines_added + a.lines_removed) as lines_changed,
AVG(wp.duration_seconds) as avg_think_time,
COUNT(DISTINCT s.project_id) as projects_worked_on
FROM sessions s
LEFT JOIN activities a ON s.id = a.session_id AND a.tool_name IN ('Edit', 'Write')
LEFT JOIN waiting_periods wp ON s.id = wp.session_id
WHERE s.end_time IS NOT NULL
GROUP BY DATE(s.start_time)
ORDER BY date DESC;
```
## Data Retention
- **Conversation Full Text**: Stored indefinitely for search and analysis
- **Activity Details**: Kept for all operations to maintain complete audit trail
- **Analytics Aggregations**: Computed on-demand from source data
- **Cleanup**: Manual cleanup tools provided, no automatic data expiration
## Performance Considerations
- Database file size grows approximately 1-5MB per day of active development
- Full-text search indexes require periodic optimization (`PRAGMA optimize`)
- Analytics queries use covering indexes to avoid table scans
- Large file content is not stored, only file paths and change metrics
## Migration Strategy
Schema migrations are handled through versioned SQL scripts in `/migrations/`:
- Each migration has up/down scripts
- Version tracking in `schema_versions` table
- Automatic backup before migrations
- Rollback capability for failed migrations