mcghidra/CHANGELOG.md
Ryan Malloy 5300fb24b8 refactor: Remove wait/timeout params from docker_auto_start
The wait parameter was a convenience anti-pattern that caused LLMs
to block on a single tool call for up to 5 minutes with no visibility
into progress.

Now docker_auto_start always returns immediately. Clients should use
docker_wait(port) separately to poll for container readiness. This
gives visibility into progress and allows early bailout.
2026-02-06 00:44:44 -07:00

244 lines
15 KiB
Markdown

# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Added
- **Symbol CRUD Operations:** Full create/rename/delete support for symbols and labels:
- `symbols_create` - Create new label/symbol at an address
- `symbols_rename` - Rename existing symbol
- `symbols_delete` - Delete symbol at an address
- `symbols_imports` - List imported symbols with pagination
- `symbols_exports` - List exported symbols with pagination
- **Bookmark Management:** Tools for managing Ghidra bookmarks:
- `bookmarks_list` - List bookmarks with type/category filtering
- `bookmarks_create` - Create bookmark at address (Note, Warning, Error, Info types)
- `bookmarks_delete` - Delete bookmarks at an address
- **Enum & Typedef Creation:** Data type creation tools:
- `enums_create` - Create new enum data type
- `enums_list` - List enum types with members
- `typedefs_create` - Create new typedef
- `typedefs_list` - List typedef data types
- **Variable Management:** Enhanced variable operations:
- `variables_list` - List variables with global_only filter
- `variables_rename` - Rename and retype function variables
- `functions_variables` - List local variables and parameters for a function
- **Namespace & Class Tools:**
- `namespaces_list` - List all non-global namespaces
- `classes_list` - List class namespaces with qualified names
- **Memory Segment Tools:**
- `segments_list` - List memory segments with R/W/X permissions and size info
- **Progress Reporting for Long Operations:** 7 MCP prompts now report real-time progress during multi-step scanning operations:
- `malware_triage` - Reports progress across 21 scanning steps
- `analyze_imports` - Reports progress across 12 capability categories
- `identify_crypto` - Reports progress across 20 pattern scans
- `find_authentication` - Reports progress across 30 auth pattern scans
- `find_main_logic` - Reports progress across 22 entry point searches
- `find_error_handlers` - Reports progress across 35 error pattern scans
- `find_config_parsing` - Reports progress across 23 config pattern scans
- Uses FastMCP's `Context.report_progress()` for numeric progress updates
- Uses `Context.info()` for descriptive step notifications
- Helper functions `report_step()` and `report_progress()` for consistent reporting
- **Specialized Analysis Prompts:** 13 new MCP prompts for common reverse engineering workflows:
- `analyze_strings` - String analysis with categorization and cross-reference guidance
- `trace_data_flow` - Data flow and taint analysis through functions
- `identify_crypto` - Cryptographic function and constant identification
- `malware_triage` - Quick malware analysis with capability assessment checklist
- `analyze_protocol` - Network/file protocol reverse engineering framework
- `find_main_logic` - Navigate past CRT initialization to find actual program logic
- `analyze_imports` - Categorize imports by capability with suspicious pattern detection
- `find_authentication` - Locate auth, license checks, and credential handling code
- `analyze_switch_table` - Reverse engineer command dispatchers and jump tables
- `find_config_parsing` - Identify configuration file parsing and settings management
- `compare_functions` - Compare two functions for similarity (patches, variants, libraries)
- `document_struct` - Comprehensively document data structure fields and usage
- `find_error_handlers` - Map error handling, cleanup routines, and exit paths
### Changed
- **Docker Port Allocation:** Ports are now auto-allocated from pool (8192-8223) instead of client-specified. Prevents session collisions in multi-agent environments.
- **docker_auto_start:** Removed `wait` and `timeout` parameters entirely. Always returns immediately after starting container. Use `docker_wait(port)` separately to poll for readiness. This prevents LLMs from blocking on a single tool call for minutes.
### Fixed
- **instances_use Hanging:** Eliminated 4+ hour hangs by removing blocking HTTP call. Now uses lazy registration — just creates a stub entry, validates on first real tool call.
- **All Docker Operations Non-Blocking:** ALL Docker subprocess calls (`docker ps`, `docker run`, `docker stop`, etc.) now run in thread executor via `run_in_executor()`. Previously only `docker_health` was fixed, but `docker_status`, `docker_start`, `docker_stop`, `docker_logs`, `docker_build`, and `docker_cleanup` still blocked the event loop. This caused `docker_auto_start(wait=True)` to freeze the MCP server.
- **Session Isolation:** `docker_stop` now validates container belongs to current session before stopping. `docker_cleanup` defaults to `session_only=True` to prevent cross-session interference.
- **Background Discovery Thread:** Fixed timeout from 30s to 0.5s for port scanning, reducing discovery cycle from 300s+ to ~15s.
- **Typedef/Variable Type Resolution:** Fixed `handle_typedef_create` and `handle_variable_rename` to use shared `resolve_data_type()` for builtin types (int, char, etc.).
## [2025.12.1] - 2025-12-01
### Added
- **Cursor-Based Pagination System:** Implemented efficient pagination for large responses (10K+ items) without filling context windows.
- `page_size` parameter (default: 50, max: 500) for controlling items per page
- `cursor_id` returned for navigating to subsequent pages
- Session isolation prevents cursor cross-contamination between MCP clients
- TTL-based cursor expiration (5 minutes) with LRU eviction (max 100 cursors)
- **Grep/Regex Filtering:** Added `grep` and `grep_ignorecase` parameters to filter results with regex patterns before pagination.
- **Bypass Option:** Added `return_all` parameter to retrieve complete datasets (with large response warnings).
- **Cursor Management Tools:** New MCP tools for cursor lifecycle management:
- `cursor_next(cursor_id)` - Fetch next page of results
- `cursor_list()` - List active cursors for current session
- `cursor_delete(cursor_id)` - Delete specific cursor
- `cursor_delete_all()` - Delete all session cursors
- **Enumeration Resources:** New lightweight MCP resources for quick data enumeration (more efficient than tool calls):
- `ghidra://instances` - List all active Ghidra instances
- `ghidra://instance/{port}/summary` - Program overview with statistics
- `ghidra://instance/{port}/functions` - List functions (capped at 1000)
- `ghidra://instance/{port}/strings` - List strings (capped at 500)
- `ghidra://instance/{port}/data` - List data items (capped at 1000)
- `ghidra://instance/{port}/structs` - List struct types (capped at 500)
- `ghidra://instance/{port}/xrefs/to/{address}` - Cross-references to an address
- `ghidra://instance/{port}/xrefs/from/{address}` - Cross-references from an address
### Changed
- **MCP Dependency Upgrade:** Updated from `mcp==1.6.0` to `mcp>=1.22.0` for FastMCP Context support.
- **Version Strategy:** Switched to date-based versioning (YYYY.MM.D format).
- **Tool Updates:** 11 tools now support pagination with grep filtering:
- `functions_list` - List functions with pagination
- `functions_decompile` - Decompiled code with line pagination (grep for code patterns)
- `functions_disassemble` - Assembly with instruction pagination (grep for opcodes)
- `functions_get_variables` - Function variables with pagination
- `data_list` - List data items with pagination
- `data_list_strings` - List strings with pagination
- `xrefs_list` - List cross-references with pagination
- `structs_list` - List struct types with pagination
- `structs_get` - Struct fields with pagination (grep for field names/types)
- `analysis_get_callgraph` - Call graph edges with pagination
- `analysis_get_dataflow` - Data flow steps with pagination
- **LLM-Friendly Responses:** Added prominent `_message` field to guide LLMs on cursor continuation.
### Fixed
- **FastMCP Compatibility:** Removed deprecated `version` parameter from FastMCP constructor.
### Security
- **ReDoS Protection:** Added validation for grep regex patterns to prevent catastrophic backtracking attacks.
- Pattern length limit (500 chars)
- Repetition operator limit (15 max)
- Detection of dangerous nested quantifier patterns like `(a+)+`
- **Session Spoofing Prevention:** Removed user-controllable `session_id` parameter from all tools.
- Sessions now derived from FastMCP context (`ctx.session`, `ctx.client_id`)
- Prevents users from accessing or manipulating other sessions' cursors
- **Recursion Depth Limit:** Added depth limit (10) to grep matching to prevent stack overflow on deeply nested data.
## [2.0.0] - 2025-11-11
### Added
- **MCP Integration Refactor:** Refactored the Python bridge for improved MCP integration. (337f89e)
- Introduced MCP resources for loading context (e.g., instances, functions, disassembly).
- Added namespaced tools (e.g., `instance.*`, `function.*`, `data.*`) for better organization and discoverability.
- Implemented a "current working instance" concept to simplify commands by implicitly targeting the active Ghidra instance.
- **Analysis Prompts:** Added pre-defined prompts for common analysis tasks, including `reverse_engineer_binary` for comprehensive analysis. (337f89e, 3134581)
- **String Data Listing:** Added a new endpoint to list string data in the binary, with pagination and filtering by content. Python bridge support via `list_strings()` function. (f71f4aa)
- **Comprehensive Data Manipulation:** Added tools/endpoints for creating (`create_data`), deleting (`delete_data`), renaming (`rename_data`), changing type (`set_data_type`), and combined updates (`update_data`) for data items. Supports common types (byte, word, dword, string, etc.). (6c28553, 5797fb3, 28870e9)
- **Enhanced Cross-Reference (Xrefs) Analysis:** Implemented accurate xref tools (`get_references_to`, `get_references_from`) using Ghidra's ReferenceManager. Features include detailed info, bi-directional search, type filtering, and simplified bridge output. (96788f3)
- **Memory Operations:** Added tools/endpoints for reading (`read_memory`) and writing (`write_memory`) to program memory. (454c739)
- **Function Addressing Flexibility:** MCP bridge now supports addressing functions by name or address. (4f3042f)
- **API Version Check:** Bridge script now verifies compatibility with the Java plugin (expects API v2). (fedd2d0)
- **Enhanced Decompiler Controls:** Added options for raw vs. clean pseudocode output and multiple simplification styles. (454c739)
### Changed
- **Bridge Refactor & Namespacing:** Reorganized bridge tools into namespaces (e.g., `instance.list_instances`, `function.get_function_details`) as part of the MCP integration refactor. (337f89e)
- **Breaking: HATEOAS API v2 & Bridge Update:** Migrated fully to a HATEOAS-driven API (v2). The Python bridge (`bridge_mcp_hydra.py`) now *exclusively* uses this API, removing legacy support. Responses are simplified for AI agents, including text representations for structured data (e.g., disassembly). All endpoints require HATEOAS compliance (e.g., `_links`). (4bc2267, 4f3042f)
- **Optimized Variable Listing:** Improved performance of the `/variables` endpoint with efficient pagination and a `globalOnly` filter. (6c865c4)
- **Standardized Responses:** Unified all endpoints to use structured JSON and standardized HATEOAS links. (454c739, 4bc2267)
- **Improved Error Handling:** Enhanced error reporting and parameter validation across the API and bridge. (454c739, 4f3042f, 3df129f)
- **API Documentation:** Updated documentation to reflect the HATEOAS v2 API and new features. (28870e9, 3fd0cf4)
### Fixed
- **Real Instruction Disassembly:** The `/disassembly` endpoint now provides actual instruction disassembly instead of placeholders. (3df129f)
- **Ghidra 11+ Compatibility:** Resolved various API compatibility issues, particularly for cross-references (`XrefsEndpoints`). (5dc59ce, 2b1fe6c, 0eaa19a, 9443101)
- **Data Operations:** Fixed issues with HTTP request body consumption, parameter naming (`type` vs `dataType`), and name preservation during type changes. (28870e9)
- **Function Commenting:** Corrected `set_decompiler_comment` to apply comments at the function level. (2a1607c)
- **Call Graph Parameter Handling:** Updated the CallGraph endpoint to properly accept both function name and address parameters for flexibility. (fa8cc64)
- **Endpoint Functionality:** Addressed various issues including endpoint registration, handling of program-dependent endpoints, URL encoding, transaction management, and inconsistent response formats. (various commits, e.g., 4bc2267)
## [1.4.0] - 2025-04-08
### Added
- Structured JSON communication between Python bridge and Java plugin
- Consistent response format with metadata (timestamp, port, instance type)
- Comprehensive test suites for HTTP API and MCP bridge
- Test runner script for easy test execution
- Detailed testing documentation in TESTING.md
- Origin checking for API requests
- Mutating tests for API functionality
### Changed
- Improved error handling in API responses
- Enhanced JSON parsing in the Java plugin
- Updated documentation with JSON communication details
- Standardized API responses across all endpoints
- Improved version handling in build system
### Fixed
- Build complete package in `package` phase
- Versioning and naming of JAR files
- GitHub Actions workflow permissions
- Extension ZIP inclusion in complete package
- ProgramManager requirement
- Git tag fetching functionality
- MCP bridge test failures
## [1.3.0] - 2025-04-02
### Added
- Added docstrings for all @mcp.tool functions
- Variable manipulation tools (rename/retype variables)
- New endpoints for function variable management
- Dynamic version output in API responses
- Enhanced function analysis capabilities
- Support for searching variables by name
- New tools for working with function variables:
- get_function_by_address
- get_current_address
- get_current_function
- decompile_function_by_address
- disassemble_function
- set_decompiler_comment
- set_disassembly_comment
- rename_local_variable
- rename_function_by_address
- set_function_prototype
- set_local_variable_type
### Changed
- Improved version handling in build system
- Reorganized imports in bridge_mcp_hydra.py
- Updated MANIFEST.MF with more detailed description
## [1.2] - 2025-03-30
### Added
- Enhanced function analysis capabilities
- Additional variable manipulation tools
- Support for multiple Ghidra instances
### Changed
- Improved error handling in API calls
- Optimized performance for large binaries
## [1.1] - 2025-03-30
### Added
- Initial release of GhydraMCP bridge
- Basic Ghidra instance management tools
- Function analysis tools
- Variable manipulation tools
## [1.0] - 2025-03-24
### Added
- Initial project setup
- Basic MCP bridge functionality
[unreleased]: https://github.com/teal-bauer/GhydraMCP/compare/v2025.12.1...HEAD
[2025.12.1]: https://github.com/teal-bauer/GhydraMCP/compare/v2.0.0...v2025.12.1
[2.0.0]: https://github.com/teal-bauer/GhydraMCP/compare/v1.4.0...v2.0.0
[1.4.0]: https://github.com/teal-bauer/GhydraMCP/compare/v1.3.0...v1.4.0
[1.3.0]: https://github.com/teal-bauer/GhydraMCP/compare/v1.2...v1.3.0
[1.2]: https://github.com/teal-bauer/GhydraMCP/compare/v1.1...v1.2
[1.1]: https://github.com/teal-bauer/GhydraMCP/compare/1.0...v1.1
[1.0]: https://github.com/teal-bauer/GhydraMCP/releases/tag/1.0