From 6b2e572bd4dc7040765dbc0a4f204e282974d9a4 Mon Sep 17 00:00:00 2001 From: Teal Bauer Date: Wed, 9 Apr 2025 20:52:55 +0200 Subject: [PATCH] API evolution plan --- GHIDRA_HTTP_API.md | 373 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 311 insertions(+), 62 deletions(-) diff --git a/GHIDRA_HTTP_API.md b/GHIDRA_HTTP_API.md index 8dbb19f..d056d80 100644 --- a/GHIDRA_HTTP_API.md +++ b/GHIDRA_HTTP_API.md @@ -1,73 +1,322 @@ -# Ghidra HTTP API Documentation +# GhydraMCP Ghidra Plugin HTTP API v1 -## Base URL -`http://{host}:{port}/` (default port: 8192) +## Overview -## Endpoints +This API provides a Hypermedia-driven interface (HATEOAS) to interact with Ghidra's CodeBrowser, enabling AI-driven and automated reverse engineering workflows. It allows interaction with Ghidra projects, programs (binaries), functions, symbols, data, memory segments, cross-references, and analysis features. Programs are addressed by their unique identifier within Ghidra (`project:/path/to/file`). -### Instance Management -- `GET /instances` - List active instances -- `GET /info` - Get project information -- `GET /` - Root endpoint with basic info +## General Concepts -### Program Analysis -- `GET /functions` - List functions - - Parameters: - - `offset` - Pagination offset - - `limit` - Max items to return - - `query` - Search string for function names +### Request Format -- `GET /functions/{name}` - Get function details - - Parameters: - - `cCode` - Return C-style code (true/false) - - `syntaxTree` - Include syntax tree (true/false) - - `simplificationStyle` - Decompiler style +- Use standard HTTP verbs: + - `GET`: Retrieve resources or lists. + - `POST`: Create new resources. + - `PATCH`: Modify existing resources partially. + - `PUT`: Replace existing resources entirely (Use with caution, `PATCH` is often preferred). + - `DELETE`: Remove resources. +- Request bodies for `POST`, `PUT`, `PATCH` should be JSON (`Content-Type: application/json`). +- Include an optional `X-Request-ID` header with a unique identifier for correlation. -- `GET /get_function_by_address` - Get function by address - - Parameters: - - `address` - Memory address in hex +### Response Format -- `GET /classes` - List classes -- `GET /segments` - List memory segments -- `GET /symbols/imports` - List imported symbols -- `GET /symbols/exports` - List exported symbols -- `GET /namespaces` - List namespaces -- `GET /data` - List data items -- `GET /variables` - List global variables -- `GET /functions/{name}/variables` - List function variables +All non-error responses are JSON (`Content-Type: application/json`) containing at least the following keys: -### Modifications -- `POST /functions/{name}` - Rename function - - Body: `{"newName": string}` - -- `POST /data` - Rename data at address - - Body: `{"address": string, "newName": string}` - -- `POST /set_decompiler_comment` - Add decompiler comment - - Body: `{"address": string, "comment": string}` - -- `POST /set_disassembly_comment` - Add disassembly comment - - Body: `{"address": string, "comment": string}` - -- `POST /rename_local_variable` - Rename local variable - - Body: `{"functionAddress": string, "oldName": string, "newName": string}` - -- `POST /rename_function_by_address` - Rename function by address - - Body: `{"functionAddress": string, "newName": string}` - -- `POST /set_function_prototype` - Update function prototype - - Body: `{"functionAddress": string, "prototype": string}` - -- `POST /set_local_variable_type` - Change variable type - - Body: `{"functionAddress": string, "variableName": string, "newType": string}` - -## Response Format -All endpoints return JSON with standard structure: ```json { - "success": boolean, - "result": object|array, - "error": string, // if success=false - "timestamp": number, - "port": number + "id": "[correlation identifier]", + "instance": "[instance url]", + "success": true, + "result": Object | Array, + "_links": { // Optional: HATEOAS links + "self": { "href": "/path/to/current/resource" }, + "related_resource": { "href": "/path/to/related" } + // ... other relevant links + } } +``` + +- `id`: The identifier from the `X-Request-ID` header if provided, or a random opaque identifier otherwise. +- `instance`: The URL of the Ghidra plugin instance that handled the request. +- `success`: Boolean `true` for successful operations. +- `result`: The main data payload, either a single JSON object or an array of objects for lists. +- `_links`: (Optional) Contains HATEOAS-style links to related resources or actions, facilitating discovery. + +#### List Responses + +List results (arrays in `result`) will typically include pagination information and a total count: + +```json +{ + "id": "req-123", + "instance": "http://localhost:1337", + "success": true, + "result": [ ... objects ... ], + "size": 150, // Total number of items matching the query across all pages + "offset": 0, + "limit": 50, + "_links": { + "self": { "href": "/programs/proj:/file.bin/functions?offset=0&limit=50" }, + "next": { "href": "/programs/proj:/file.bin/functions?offset=50&limit=50" }, // Present if more items exist + "prev": { "href": "/programs/proj:/file.bin/functions?offset=0&limit=50" } // Present if not the first page + } +} +``` + +### Error Responses + +Errors use appropriate HTTP status codes (4xx, 5xx) and have a JSON payload with an `error` key: + +```json +{ + "id": "[correlation identifier]", + "instance": "[instance url]", + "success": false, + "error": { + "code": "RESOURCE_NOT_FOUND", // Optional: Machine-readable code + "message": "Descriptive error message" + // Potentially other details like invalid parameters + } +} +``` + +Common HTTP Status Codes: +- `200 OK`: Successful `GET`, `PATCH`, `PUT`, `DELETE`. +- `201 Created`: Successful `POST` resulting in resource creation. +- `204 No Content`: Successful `DELETE` or `PATCH`/`PUT` where no body is returned. +- `400 Bad Request`: Invalid syntax, missing required parameters, invalid data format. +- `401 Unauthorized`: Authentication required or failed (if implemented). +- `403 Forbidden`: Authenticated user lacks permission (if implemented). +- `404 Not Found`: Resource or endpoint does not exist, or query yielded no results. +- `405 Method Not Allowed`: HTTP verb not supported for this endpoint. +- `500 Internal Server Error`: Unexpected error within the Ghidra plugin. + +### Addressing and Searching + +Resources like functions, data, and symbols often exist at specific memory addresses and may have names. The primary identifier for a program is its Ghidra path, e.g., `myproject:/path/to/mybinary.exe`. + +- **By Address:** Use the resource's path with the address (hexadecimal, e.g., `0x401000` or `08000004`). + - Example: `GET /programs/myproject:/mybinary.exe/functions/0x401000` +- **Querying Lists:** List endpoints (e.g., `/functions`, `/symbols`, `/data`) support filtering via query parameters: + - `?addr=[address in hex]`: Find item at a specific address. + - `?name=[full_name]`: Find item(s) with an exact name match (case-sensitive). + - `?name_contains=[substring]`: Find item(s) whose name contains the substring (case-insensitive). + - `?name_matches_regex=[regex]`: Find item(s) whose name matches the Java-compatible regular expression. + +### Pagination + +List endpoints support pagination using query parameters: +- `?offset=[int]`: Number of items to skip (default: 0). +- `?limit=[int]`: Maximum number of items to return (default: implementation-defined, e.g., 100). + +## Meta Endpoints + +### `GET /plugin-version` +Returns the version of the running Ghidra plugin and its API. Essential for compatibility checks by clients like the MCP bridge. +```json +{ + "id": "req-meta-ver", + "instance": "http://localhost:1337", + "success": true, + "result": { + "plugin_version": "v1.4.0", // Example plugin build version + "api_version": 1 // Ordinal API version + }, + "_links": { + "self": { "href": "/plugin-version" } + } +} +``` + +## Resource Types + +Base path for all program-specific resources: `/programs/{program_id}` where `program_id` is the URL-encoded Ghidra identifier (e.g., `myproject%3A%2Fpath%2Fto%2Fmybinary.exe`). + +### 1. Projects + +Represents Ghidra projects, containers for programs. + +- **`GET /projects`**: List all available Ghidra projects. +- **`POST /projects`**: Create a new Ghidra project. Request body should specify `name` and optionally `directory`. +- **`GET /projects/{project_name}`**: Get details about a specific project (e.g., location, list of open programs within it via links). + +### 2. Programs + +Represents individual binaries loaded in Ghidra projects. + +- **`GET /programs`**: List all programs across all projects. Can be filtered by project (`?project={project_name}`). +- **`POST /programs`**: Load/import a new binary into a specified project. Request body needs `project_name`, `file_path`, and optionally `language_id`, `compiler_spec_id`, and loader options. Returns the newly created program resource details upon successful import and analysis (which might take time). +- **`GET /programs/{program_id}`**: Get metadata for a specific program (e.g., name, architecture, memory layout, analysis status). + ```json + // Example Response Fragment for GET /programs/myproject%3A%2Fmybinary.exe + "result": { + "program_id": "myproject:/mybinary.exe", + "name": "mybinary.exe", + "project": "myproject", + "language_id": "x86:LE:64:default", + "compiler_spec_id": "gcc", + "image_base": "0x400000", + "memory_size": 1048576, + "is_open": true, + "analysis_complete": true + // ... other metadata + }, + "_links": { + "self": { "href": "/programs/myproject%3A%2Fmybinary.exe" }, + "project": { "href": "/projects/myproject" }, + "functions": { "href": "/programs/myproject%3A%2Fmybinary.exe/functions" }, + "symbols": { "href": "/programs/myproject%3A%2Fmybinary.exe/symbols" }, + "data": { "href": "/programs/myproject%3A%2Fmybinary.exe/data" }, + "segments": { "href": "/programs/myproject%3A%2Fmybinary.exe/segments" }, + "memory": { "href": "/programs/myproject%3A%2Fmybinary.exe/memory" }, + "xrefs": { "href": "/programs/myproject%3A%2Fmybinary.exe/xrefs" }, + "analysis": { "href": "/programs/myproject%3A%2Fmybinary.exe/analysis" } + // Potentially actions like "close", "analyze" + } + ``` +- **`DELETE /programs/{program_id}`**: Close and potentially remove a program from its project (behavior depends on Ghidra state). + +### 3. Functions + +Represents functions within a program. Base path: `/programs/{program_id}/functions`. + +- **`GET /functions`**: List functions. Supports searching (by name/address/regex) and pagination. + ```json + // Example Response Fragment + "result": [ + { "name": "FUN_08000004", "address": "08000004", "_links": { "self": { "href": "/programs/proj%3A%2Ffile.bin/functions/08000004" } } }, + { "name": "init_peripherals", "address": "08001cf0", "_links": { "self": { "href": "/programs/proj%3A%2Ffile.bin/functions/08001cf0" } } } + ] + ``` +- **`POST /functions`**: Create a function at a specific address. Requires `address` in the request body. Returns the created function resource. +- **`GET /functions/{address}`**: Get details for a specific function (name, signature, size, stack info, etc.). + ```json + // Example Response Fragment for GET /programs/proj%3A%2Ffile.bin/functions/0x4010a0 + "result": { + "name": "process_data", + "address": "0x4010a0", + "signature": "int process_data(char * data, int size)", + "size": 128, + "stack_depth": 16, + "has_varargs": false, + "calling_convention": "__stdcall" + // ... other details + }, + "_links": { + "self": { "href": "/programs/proj%3A%2Ffile.bin/functions/0x4010a0" }, + "decompile": { "href": "/programs/proj%3A%2Ffile.bin/functions/0x4010a0/decompile" }, + "disassembly": { "href": "/programs/proj%3A%2Ffile.bin/functions/0x4010a0/disassembly" }, + "variables": { "href": "/programs/proj%3A%2Ffile.bin/functions/0x4010a0/variables" }, + "xrefs_to": { "href": "/programs/proj%3A%2Ffile.bin/xrefs?to_addr=0x4010a0" }, + "xrefs_from": { "href": "/programs/proj%3A%2Ffile.bin/xrefs?from_addr=0x4010a0" } + } + ``` +- **`PATCH /functions/{address}`**: Modify a function. Addressable only by address. Payload can contain: + - `name`: New function name. + - `signature`: Full function signature string (e.g., `void my_func(int p1, char * p2)`). + - `comment`: Set/update the function's primary comment. + ```json + // Example PATCH payload + { "name": "calculate_checksum", "signature": "uint32_t calculate_checksum(uint8_t* buffer, size_t length)" } + ``` +- **`DELETE /functions/{address}`**: Delete the function definition at the specified address. + +#### Function Sub-Resources + +- **`GET /functions/{address}/decompile`**: Get decompiled C-like code for the function. + - Query Parameters: + - `?syntax_tree=true`: Include the decompiler's internal syntax tree (JSON). + - `?style=[style_name]`: Apply a specific decompiler simplification style (e.g., `normalize`, `paramid`). + - `?timeout=[seconds]`: Set a timeout for the decompilation process. + ```json + // Example Response Fragment (without syntax tree) + "result": { + "address": "0x4010a0", + "ccode": "int process_data(char *param_1, int param_2)\n{\n // ... function body ...\n return result;\n}\n" + } + ``` +- **`GET /functions/{address}/disassembly`**: Get assembly listing for the function. Supports pagination (`?offset=`, `?limit=`). + ```json + // Example Response Fragment + "result": [ + { "address": "0x4010a0", "mnemonic": "PUSH", "operands": "RBP", "bytes": "55" }, + { "address": "0x4010a1", "mnemonic": "MOV", "operands": "RBP, RSP", "bytes": "4889E5" }, + // ... more instructions + ] + ``` +- **`GET /functions/{address}/variables`**: List local variables defined within the function. Supports searching by name. +- **`PATCH /functions/{address}/variables/{variable_name}`**: Modify a local variable (rename, change type). Requires `name` and/or `type` in the payload. + +### 4. Symbols & Labels + +Represents named locations (functions, data, labels). Base path: `/programs/{program_id}/symbols`. + +- **`GET /symbols`**: List all symbols in the program. Supports searching (by name/address/regex) and pagination. Can filter by type (`?type=function`, `?type=data`, `?type=label`). +- **`POST /symbols`**: Create or rename a symbol at a specific address. Requires `address` and `name` in the payload. If a symbol exists, it's renamed; otherwise, a new label is created. +- **`GET /symbols/{address}`**: Get details of the symbol at the specified address. +- **`PATCH /symbols/{address}`**: Modify properties of the symbol (e.g., set as primary, change namespace). Payload specifies changes. +- **`DELETE /symbols/{address}`**: Remove the symbol at the specified address. + +### 5. Data + +Represents defined data items in memory. Base path: `/programs/{program_id}/data`. + +- **`GET /data`**: List defined data items. Supports searching (by name/address/regex) and pagination. Can filter by type (`?type=string`, `?type=dword`, etc.). +- **`POST /data`**: Define a new data item. Requires `address`, `type`, and optionally `size` or `length` in the payload. +- **`GET /data/{address}`**: Get details of the data item at the specified address (type, size, value representation). +- **`PATCH /data/{address}`**: Modify a data item (e.g., change `name`, `type`, `comment`). Payload specifies changes. +- **`DELETE /data/{address}`**: Undefine the data item at the specified address. + +### 6. Memory Segments + +Represents memory blocks/sections defined in the program. Base path: `/programs/{program_id}/segments`. + +- **`GET /segments`**: List all memory segments (e.g., `.text`, `.data`, `.bss`). +- **`GET /segments/{segment_name}`**: Get details for a specific segment (address range, permissions, size). + +### 7. Memory Access + +Provides raw memory access. Base path: `/programs/{program_id}/memory`. + +- **`GET /memory/{address}`**: Read bytes from memory. + - Query Parameters: + - `?length=[bytes]`: Number of bytes to read (required, max limit applies). + - `?format=[hex|base64|string]`: How to encode the returned bytes (default: hex). + ```json + // Example Response Fragment for GET /programs/proj%3A%2Ffile.bin/memory/0x402000?length=16&format=hex + "result": { + "address": "0x402000", + "length": 16, + "format": "hex", + "bytes": "48656C6C6F20576F726C642100000000" // "Hello World!...." + } + ``` +- **`PATCH /memory/{address}`**: Write bytes to memory. Requires `bytes` (in specified `format`) and `format` in the payload. Use with extreme caution. + +### 8. Cross-References (XRefs) + +Provides information about references to/from addresses. Base path: `/programs/{program_id}/xrefs`. + +- **`GET /xrefs`**: Search for cross-references. Supports pagination. + - Query Parameters (at least one required): + - `?to_addr=[address]`: Find references *to* this address. + - `?from_addr=[address]`: Find references *from* this address or within the function/data at this address. + - `?type=[CALL|READ|WRITE|DATA|POINTER|...]`: Filter by reference type. +- **`GET /functions/{address}/xrefs`**: Convenience endpoint, equivalent to `GET /xrefs?to_addr={address}` and potentially `GET /xrefs?from_addr={address}` combined or linked. + +### 9. Analysis + +Provides access to Ghidra's analysis results. Base path: `/programs/{program_id}/analysis`. + +- **`GET /analysis/callgraph`**: Retrieve the function call graph (potentially filtered or paginated). Format might be nodes/edges JSON or a standard graph format like DOT. +- **`GET /analysis/dataflow/{address}`**: Perform data flow analysis starting from a specific address or instruction. Requires parameters specifying forward/backward, context, etc. (Details TBD). +- **`POST /analysis/analyze`**: Trigger a full or partial re-analysis of the program. + +## Design Considerations for AI Usage + +- **Structured responses**: JSON format ensures predictable parsing by AI agents. +- **HATEOAS Links**: `_links` allow agents to discover available actions and related resources without hardcoding paths. +- **Address and Name Resolution**: Key elements like functions and symbols are addressable by both memory address and name where applicable. +- **Explicit Operations**: Actions like decompilation, disassembly, and analysis are distinct endpoints. +- **Pagination & Filtering**: Essential for handling potentially large datasets (symbols, functions, xrefs, disassembly). +- **Clear Error Reporting**: `success: false` and the `error` object provide actionable feedback. +- **No Injected Summaries**: The API should return raw or structured Ghidra data, leaving interpretation and summarization to the AI agent.