mcghidra/CHANGELOG.md
Ryan Malloy d1750cb339
Some checks are pending
Build Ghidra Plugin / build (push) Waiting to run
fix: Address code review issues across core modules
- http_client: Defensive copy before .pop() to avoid mutating caller's dict
- analysis.py: Add debug logging for fallback paths instead of silent swallow
- docker.py: Add debug logging to PortPool exception handlers
- docker.py: Fix file descriptor leak in _try_acquire_port with inner try/except
- docker.py: Lazy PortPool initialization via property to avoid side effects
- server.py: Wrap initial discovery in _instances_lock for thread safety
- server.py: Call configure_logging() at startup with GHYDRAMCP_DEBUG support
- pagination.py: Use SHA-256 instead of MD5 for query hash consistency
- base.py: Add proper type annotations (Dict[str, Any])
- filtering.py: Use List[str] from typing for consistency
- filtering.py: Add docstrings to private helper methods
- structs.py: Rename project_fields param to fields for API consistency
- logging.py: Fix import path from deprecated mcp.server.fastmcp to fastmcp
2026-02-06 04:50:47 -07:00

17 KiB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

Unreleased

Added

  • Symbol CRUD Operations: Full create/rename/delete support for symbols and labels:
    • symbols_create - Create new label/symbol at an address
    • symbols_rename - Rename existing symbol
    • symbols_delete - Delete symbol at an address
    • symbols_imports - List imported symbols with pagination
    • symbols_exports - List exported symbols with pagination
  • Bookmark Management: Tools for managing Ghidra bookmarks:
    • bookmarks_list - List bookmarks with type/category filtering
    • bookmarks_create - Create bookmark at address (Note, Warning, Error, Info types)
    • bookmarks_delete - Delete bookmarks at an address
  • Enum & Typedef Creation: Data type creation tools:
    • enums_create - Create new enum data type
    • enums_list - List enum types with members
    • typedefs_create - Create new typedef
    • typedefs_list - List typedef data types
  • Variable Management: Enhanced variable operations:
    • variables_list - List variables with global_only filter
    • variables_rename - Rename and retype function variables
    • functions_variables - List local variables and parameters for a function
  • Namespace & Class Tools:
    • namespaces_list - List all non-global namespaces
    • classes_list - List class namespaces with qualified names
  • Memory Segment Tools:
    • segments_list - List memory segments with R/W/X permissions and size info
  • Progress Reporting for Long Operations: 7 MCP prompts now report real-time progress during multi-step scanning operations:
    • malware_triage - Reports progress across 21 scanning steps
    • analyze_imports - Reports progress across 12 capability categories
    • identify_crypto - Reports progress across 20 pattern scans
    • find_authentication - Reports progress across 30 auth pattern scans
    • find_main_logic - Reports progress across 22 entry point searches
    • find_error_handlers - Reports progress across 35 error pattern scans
    • find_config_parsing - Reports progress across 23 config pattern scans
    • Uses FastMCP's Context.report_progress() for numeric progress updates
    • Uses Context.info() for descriptive step notifications
    • Helper functions report_step() and report_progress() for consistent reporting
  • Specialized Analysis Prompts: 13 new MCP prompts for common reverse engineering workflows:
    • analyze_strings - String analysis with categorization and cross-reference guidance
    • trace_data_flow - Data flow and taint analysis through functions
    • identify_crypto - Cryptographic function and constant identification
    • malware_triage - Quick malware analysis with capability assessment checklist
    • analyze_protocol - Network/file protocol reverse engineering framework
    • find_main_logic - Navigate past CRT initialization to find actual program logic
    • analyze_imports - Categorize imports by capability with suspicious pattern detection
    • find_authentication - Locate auth, license checks, and credential handling code
    • analyze_switch_table - Reverse engineer command dispatchers and jump tables
    • find_config_parsing - Identify configuration file parsing and settings management
    • compare_functions - Compare two functions for similarity (patches, variants, libraries)
    • document_struct - Comprehensively document data structure fields and usage
    • find_error_handlers - Map error handling, cleanup routines, and exit paths

Changed

  • Docker Port Allocation: Ports are now auto-allocated from pool (8192-8223) instead of client-specified. Prevents session collisions in multi-agent environments.
  • docker_auto_start: Removed wait and timeout parameters. Always returns immediately after starting container.
  • Removed docker_wait tool: This tool blocked for up to 5 minutes in a single call. LLMs should poll docker_health(port) in their own loop instead — this gives visibility into progress and ability to check logs between polls.

Fixed

  • instances_use Hanging: Eliminated 4+ hour hangs by removing blocking HTTP call. Now uses lazy registration — just creates a stub entry, validates on first real tool call.
  • All Docker Operations Non-Blocking: ALL Docker subprocess calls (docker ps, docker run, docker stop, etc.) now run in thread executor via run_in_executor(). Previously only docker_health was fixed, but docker_status, docker_start, docker_stop, docker_logs, docker_build, and docker_cleanup still blocked the event loop. This caused docker_auto_start(wait=True) to freeze the MCP server.
  • Session Isolation: docker_stop now validates container belongs to current session before stopping. docker_cleanup defaults to session_only=True to prevent cross-session interference.
  • Background Discovery Thread: Fixed timeout from 30s to 0.5s for port scanning, reducing discovery cycle from 300s+ to ~15s.
  • Typedef/Variable Type Resolution: Fixed handle_typedef_create and handle_variable_rename to use shared resolve_data_type() for builtin types (int, char, etc.).
  • DockerMixin Inheritance: Fixed crash when DockerMixin called get_instance_port() — was inheriting from wrong base class.
  • Deprecated asyncio API: Replaced asyncio.get_event_loop() with asyncio.get_running_loop() for Python 3.10+ compatibility.
  • HTTP Client Data Mutation: safe_post, safe_put, and safe_patch no longer mutate the caller's data dict via .pop().
  • Race Condition in Discovery: Initial instance discovery in main() now uses _instances_lock for thread safety.
  • Silent Exception Handling: Added debug logging to PortPool exception handlers and analysis fallback paths.
  • File Descriptor Leak: Fixed potential leak in PortPool._try_acquire_port() if write operations fail after lock acquisition.
  • Hash Algorithm Consistency: Changed query hash from MD5 to SHA-256 in pagination module for consistency with cursor ID generation.
  • Lazy PortPool Initialization: PortPool now created on first use, avoiding /tmp/ghydramcp-ports directory creation when Docker tools are never used.
  • Logging Configuration: configure_logging() now called during server startup — debug messages actually work now.
  • Type Hint Consistency: Aligned filtering.py to use List[T] from typing module like rest of codebase.
  • Parameter Naming: Renamed project_fields to fields in structs_get() for consistency with other tools.
  • Import Path: Fixed logging.py to import Context from fastmcp (not deprecated mcp.server.fastmcp path).

Added

  • Debug Logging Environment Variable: Set GHYDRAMCP_DEBUG=1 to enable DEBUG-level logging for troubleshooting.

2025.12.1 - 2025-12-01

Added

  • Cursor-Based Pagination System: Implemented efficient pagination for large responses (10K+ items) without filling context windows.
    • page_size parameter (default: 50, max: 500) for controlling items per page
    • cursor_id returned for navigating to subsequent pages
    • Session isolation prevents cursor cross-contamination between MCP clients
    • TTL-based cursor expiration (5 minutes) with LRU eviction (max 100 cursors)
  • Grep/Regex Filtering: Added grep and grep_ignorecase parameters to filter results with regex patterns before pagination.
  • Bypass Option: Added return_all parameter to retrieve complete datasets (with large response warnings).
  • Cursor Management Tools: New MCP tools for cursor lifecycle management:
    • cursor_next(cursor_id) - Fetch next page of results
    • cursor_list() - List active cursors for current session
    • cursor_delete(cursor_id) - Delete specific cursor
    • cursor_delete_all() - Delete all session cursors
  • Enumeration Resources: New lightweight MCP resources for quick data enumeration (more efficient than tool calls):
    • ghidra://instances - List all active Ghidra instances
    • ghidra://instance/{port}/summary - Program overview with statistics
    • ghidra://instance/{port}/functions - List functions (capped at 1000)
    • ghidra://instance/{port}/strings - List strings (capped at 500)
    • ghidra://instance/{port}/data - List data items (capped at 1000)
    • ghidra://instance/{port}/structs - List struct types (capped at 500)
    • ghidra://instance/{port}/xrefs/to/{address} - Cross-references to an address
    • ghidra://instance/{port}/xrefs/from/{address} - Cross-references from an address

Changed

  • MCP Dependency Upgrade: Updated from mcp==1.6.0 to mcp>=1.22.0 for FastMCP Context support.
  • Version Strategy: Switched to date-based versioning (YYYY.MM.D format).
  • Tool Updates: 11 tools now support pagination with grep filtering:
    • functions_list - List functions with pagination
    • functions_decompile - Decompiled code with line pagination (grep for code patterns)
    • functions_disassemble - Assembly with instruction pagination (grep for opcodes)
    • functions_get_variables - Function variables with pagination
    • data_list - List data items with pagination
    • data_list_strings - List strings with pagination
    • xrefs_list - List cross-references with pagination
    • structs_list - List struct types with pagination
    • structs_get - Struct fields with pagination (grep for field names/types)
    • analysis_get_callgraph - Call graph edges with pagination
    • analysis_get_dataflow - Data flow steps with pagination
  • LLM-Friendly Responses: Added prominent _message field to guide LLMs on cursor continuation.

Fixed

  • FastMCP Compatibility: Removed deprecated version parameter from FastMCP constructor.

Security

  • ReDoS Protection: Added validation for grep regex patterns to prevent catastrophic backtracking attacks.
    • Pattern length limit (500 chars)
    • Repetition operator limit (15 max)
    • Detection of dangerous nested quantifier patterns like (a+)+
  • Session Spoofing Prevention: Removed user-controllable session_id parameter from all tools.
    • Sessions now derived from FastMCP context (ctx.session, ctx.client_id)
    • Prevents users from accessing or manipulating other sessions' cursors
  • Recursion Depth Limit: Added depth limit (10) to grep matching to prevent stack overflow on deeply nested data.

2.0.0 - 2025-11-11

Added

  • MCP Integration Refactor: Refactored the Python bridge for improved MCP integration. (337f89e)
    • Introduced MCP resources for loading context (e.g., instances, functions, disassembly).
    • Added namespaced tools (e.g., instance.*, function.*, data.*) for better organization and discoverability.
    • Implemented a "current working instance" concept to simplify commands by implicitly targeting the active Ghidra instance.
  • Analysis Prompts: Added pre-defined prompts for common analysis tasks, including reverse_engineer_binary for comprehensive analysis. (337f89e, 3134581)
  • String Data Listing: Added a new endpoint to list string data in the binary, with pagination and filtering by content. Python bridge support via list_strings() function. (f71f4aa)
  • Comprehensive Data Manipulation: Added tools/endpoints for creating (create_data), deleting (delete_data), renaming (rename_data), changing type (set_data_type), and combined updates (update_data) for data items. Supports common types (byte, word, dword, string, etc.). (6c28553, 5797fb3, 28870e9)
  • Enhanced Cross-Reference (Xrefs) Analysis: Implemented accurate xref tools (get_references_to, get_references_from) using Ghidra's ReferenceManager. Features include detailed info, bi-directional search, type filtering, and simplified bridge output. (96788f3)
  • Memory Operations: Added tools/endpoints for reading (read_memory) and writing (write_memory) to program memory. (454c739)
  • Function Addressing Flexibility: MCP bridge now supports addressing functions by name or address. (4f3042f)
  • API Version Check: Bridge script now verifies compatibility with the Java plugin (expects API v2). (fedd2d0)
  • Enhanced Decompiler Controls: Added options for raw vs. clean pseudocode output and multiple simplification styles. (454c739)

Changed

  • Bridge Refactor & Namespacing: Reorganized bridge tools into namespaces (e.g., instance.list_instances, function.get_function_details) as part of the MCP integration refactor. (337f89e)
  • Breaking: HATEOAS API v2 & Bridge Update: Migrated fully to a HATEOAS-driven API (v2). The Python bridge (bridge_mcp_hydra.py) now exclusively uses this API, removing legacy support. Responses are simplified for AI agents, including text representations for structured data (e.g., disassembly). All endpoints require HATEOAS compliance (e.g., _links). (4bc2267, 4f3042f)
  • Optimized Variable Listing: Improved performance of the /variables endpoint with efficient pagination and a globalOnly filter. (6c865c4)
  • Standardized Responses: Unified all endpoints to use structured JSON and standardized HATEOAS links. (454c739, 4bc2267)
  • Improved Error Handling: Enhanced error reporting and parameter validation across the API and bridge. (454c739, 4f3042f, 3df129f)
  • API Documentation: Updated documentation to reflect the HATEOAS v2 API and new features. (28870e9, 3fd0cf4)

Fixed

  • Real Instruction Disassembly: The /disassembly endpoint now provides actual instruction disassembly instead of placeholders. (3df129f)
  • Ghidra 11+ Compatibility: Resolved various API compatibility issues, particularly for cross-references (XrefsEndpoints). (5dc59ce, 2b1fe6c, 0eaa19a, 9443101)
  • Data Operations: Fixed issues with HTTP request body consumption, parameter naming (type vs dataType), and name preservation during type changes. (28870e9)
  • Function Commenting: Corrected set_decompiler_comment to apply comments at the function level. (2a1607c)
  • Call Graph Parameter Handling: Updated the CallGraph endpoint to properly accept both function name and address parameters for flexibility. (fa8cc64)
  • Endpoint Functionality: Addressed various issues including endpoint registration, handling of program-dependent endpoints, URL encoding, transaction management, and inconsistent response formats. (various commits, e.g., 4bc2267)

1.4.0 - 2025-04-08

Added

  • Structured JSON communication between Python bridge and Java plugin
  • Consistent response format with metadata (timestamp, port, instance type)
  • Comprehensive test suites for HTTP API and MCP bridge
  • Test runner script for easy test execution
  • Detailed testing documentation in TESTING.md
  • Origin checking for API requests
  • Mutating tests for API functionality

Changed

  • Improved error handling in API responses
  • Enhanced JSON parsing in the Java plugin
  • Updated documentation with JSON communication details
  • Standardized API responses across all endpoints
  • Improved version handling in build system

Fixed

  • Build complete package in package phase
  • Versioning and naming of JAR files
  • GitHub Actions workflow permissions
  • Extension ZIP inclusion in complete package
  • ProgramManager requirement
  • Git tag fetching functionality
  • MCP bridge test failures

1.3.0 - 2025-04-02

Added

  • Added docstrings for all @mcp.tool functions
  • Variable manipulation tools (rename/retype variables)
  • New endpoints for function variable management
  • Dynamic version output in API responses
  • Enhanced function analysis capabilities
  • Support for searching variables by name
  • New tools for working with function variables:
    • get_function_by_address
    • get_current_address
    • get_current_function
    • decompile_function_by_address
    • disassemble_function
    • set_decompiler_comment
    • set_disassembly_comment
    • rename_local_variable
    • rename_function_by_address
    • set_function_prototype
    • set_local_variable_type

Changed

  • Improved version handling in build system
  • Reorganized imports in bridge_mcp_hydra.py
  • Updated MANIFEST.MF with more detailed description

1.2 - 2025-03-30

Added

  • Enhanced function analysis capabilities
  • Additional variable manipulation tools
  • Support for multiple Ghidra instances

Changed

  • Improved error handling in API calls
  • Optimized performance for large binaries

1.1 - 2025-03-30

Added

  • Initial release of GhydraMCP bridge
  • Basic Ghidra instance management tools
  • Function analysis tools
  • Variable manipulation tools

1.0 - 2025-03-24

Added

  • Initial project setup
  • Basic MCP bridge functionality