mcghidra/GHIDRA_HTTP_API.md
Teal Bauer 30d9bb17da feat: add struct data type management API
Add endpoints and MCP tools to create, read, update, and delete struct
data types in Ghidra's data type manager. Enables programmatic definition
of complex data structures for reverse engineering workflows.

Includes pagination, category filtering, and field-level operations
(add, update by name or offset).
2025-11-14 12:10:34 +01:00

742 lines
25 KiB
Markdown

# GhydraMCP Ghidra Plugin HTTP API v2
## Overview
This API provides a Hypermedia-driven interface (HATEOAS) to interact with Ghidra's CodeBrowser, enabling AI-driven and automated reverse engineering workflows. It allows interaction with Ghidra projects, programs (binaries), functions, symbols, data, memory segments, cross-references, and analysis features. Each program open in Ghidra will have its own plugin instance, so all resources are specific to that program.
## General Concepts
### Request Format
- Use standard HTTP verbs:
- `GET`: Retrieve resources or lists.
- `POST`: Create new resources.
- `PATCH`: Modify existing resources partially.
- `PUT`: Replace existing resources entirely (Use with caution, `PATCH` is often preferred).
- `DELETE`: Remove resources.
- Request bodies for `POST`, `PUT`, `PATCH` should be JSON (`Content-Type: application/json`).
- Include an optional `X-Request-ID` header with a unique identifier for correlation.
### Response Format
All non-error responses are JSON (`Content-Type: application/json`) containing at least the following keys:
```json
{
"id": "[correlation identifier]",
"instance": "[instance url]",
"success": true,
"result": Object | Array<Object>,
"_links": { // Optional: HATEOAS links
"self": { "href": "/path/to/current/resource" },
"related_resource": { "href": "/path/to/related" }
// ... other relevant links
}
}
```
- `id`: The identifier from the `X-Request-ID` header if provided, or a random opaque identifier otherwise.
- `instance`: The URL of the Ghidra plugin instance that handled the request.
- `success`: Boolean `true` for successful operations.
- `result`: The main data payload, either a single JSON object or an array of objects for lists.
- `_links`: (Optional) Contains HATEOAS-style links to related resources or actions, facilitating discovery.
#### List Responses
List results (arrays in `result`) will typically include pagination information and a total count:
```json
{
"id": "req-123",
"instance": "http://localhost:8192",
"success": true,
"result": [ ... objects ... ],
"size": 150, // Total number of items matching the query across all pages
"offset": 0,
"limit": 50,
"_links": {
"self": { "href": "/functions?offset=0&limit=50" },
"next": { "href": "/functions?offset=50&limit=50" }, // Present if more items exist
"prev": { "href": "/functions?offset=0&limit=50" } // Present if not the first page
}
}
```
### Error Responses
Errors use appropriate HTTP status codes (4xx, 5xx) and have a JSON payload with an `error` key:
```json
{
"id": "[correlation identifier]",
"instance": "[instance url]",
"success": false,
"error": {
"code": "RESOURCE_NOT_FOUND", // Optional: Machine-readable code
"message": "Descriptive error message"
// Potentially other details like invalid parameters
}
}
```
Common HTTP Status Codes:
- `200 OK`: Successful `GET`, `PATCH`, `PUT`, `DELETE`.
- `201 Created`: Successful `POST` resulting in resource creation.
- `204 No Content`: Successful `DELETE` or `PATCH`/`PUT` where no body is returned.
- `400 Bad Request`: Invalid syntax, missing required parameters, invalid data format.
- `401 Unauthorized`: Authentication required or failed (if implemented).
- `403 Forbidden`: Authenticated user lacks permission (if implemented).
- `404 Not Found`: Resource or endpoint does not exist, or query yielded no results.
- `405 Method Not Allowed`: HTTP verb not supported for this endpoint.
- `500 Internal Server Error`: Unexpected error within the Ghidra plugin.
### Addressing and Searching
Resources like functions, data, and symbols often exist at specific memory addresses and may have names.
- **By Address:** Use the resource's path with the address (hexadecimal, e.g., `0x401000` or `08000004`).
- Example: `GET /functions/0x401000`
- **Querying Lists:** List endpoints (e.g., `/functions`, `/symbols`, `/data`) support filtering via query parameters:
- `?addr=[address in hex]`: Find item at a specific address.
- `?name=[full_name]`: Find item(s) with an exact name match (case-sensitive).
- `?name_contains=[substring]`: Find item(s) whose name contains the substring (case-insensitive).
- `?name_matches_regex=[regex]`: Find item(s) whose name matches the Java-compatible regular expression.
### Pagination
List endpoints support pagination using query parameters:
- `?offset=[int]`: Number of items to skip (default: 0).
- `?limit=[int]`: Maximum number of items to return (default: implementation-defined, e.g., 100).
## Meta Endpoints
### `GET /plugin-version`
Returns the version of the running Ghidra plugin and its API. Essential for compatibility checks by clients like the MCP bridge.
```json
{
"id": "req-meta-ver",
"instance": "http://localhost:8192",
"success": true,
"result": {
"plugin_version": "v2.0.0", // Example plugin build version
"api_version": 2 // Ordinal API version
},
"_links": {
"self": { "href": "/plugin-version" },
"root": { "href": "/" }
}
}
```
### `GET /info`
Returns information about the current plugin instance, including details about the loaded program and project.
```json
{
"id": "req-info",
"instance": "http://localhost:8192",
"success": true,
"result": {
"isBaseInstance": true,
"file": "example.exe",
"architecture": "x86:LE:64:default",
"processor": "x86",
"addressSize": 64,
"creationDate": "2023-01-01T12:00:00Z",
"executable": "/path/to/example.exe",
"project": "MyProject",
"projectLocation": "/path/to/MyProject",
"serverPort": 8192,
"serverStartTime": 1672531200000,
"instanceCount": 1
},
"_links": {
"self": { "href": "/info" },
"root": { "href": "/" },
"instances": { "href": "/instances" },
"program": { "href": "/program" }
}
}
```
### `GET /instances`
Returns information about all active GhydraMCP plugin instances.
```json
{
"id": "req-instances",
"instance": "http://localhost:8192",
"success": true,
"result": [
{
"port": 8192,
"url": "http://localhost:8192",
"type": "base",
"project": "MyProject",
"file": "example.exe",
"_links": {
"self": { "href": "/instances/8192" },
"info": { "href": "http://localhost:8192/info" },
"connect": { "href": "http://localhost:8192" }
}
},
{
"port": 8193,
"url": "http://localhost:8193",
"type": "standard",
"project": "MyProject",
"file": "library.dll",
"_links": {
"self": { "href": "/instances/8193" },
"info": { "href": "http://localhost:8193/info" },
"connect": { "href": "http://localhost:8193" }
}
}
],
"_links": {
"self": { "href": "/instances" },
"register": { "href": "/registerInstance", "method": "POST" },
"unregister": { "href": "/unregisterInstance", "method": "POST" },
"programs": { "href": "/programs" }
}
}
```
## Resource Types
Each Ghidra plugin instance runs in the context of a single program, so all resources are relative to the current program. The program's details are available through the `GET /info` and `GET /program` endpoints.
### 1. Project
Represents the current Ghidra project, which is a container for programs.
- **`GET /project`**: Get details about the current project (e.g., location, list of open programs within it via links).
### 2. Program
Represents the current binary loaded in Ghidra.
- **`GET /program`**: Get metadata for the current program (e.g., name, architecture, memory layout, analysis status).
```json
// Example Response Fragment for GET /program
"result": {
"programId": "myproject:/path/to/mybinary.exe",
"name": "mybinary.exe",
"isOpen": true,
"languageId": "x86:LE:64:default",
"compilerSpecId": "gcc",
"imageBase": "0x400000",
"memorySize": 1048576,
"analysisComplete": true
},
"_links": {
"self": { "href": "/program" },
"project": { "href": "/project" },
"functions": { "href": "/functions" },
"symbols": { "href": "/symbols" },
"data": { "href": "/data" },
"segments": { "href": "/segments" },
"memory": { "href": "/memory" },
"xrefs": { "href": "/xrefs" },
"analysis": { "href": "/analysis" }
}
```
### 3. Current Location
Provides information about the current cursor position and function in Ghidra's CodeBrowser.
- **`GET /address`**: Get the current cursor position.
```json
// Example Response
"result": {
"address": "0x401000",
"program": "mybinary.exe"
},
"_links": {
"self": { "href": "/address" },
"program": { "href": "/program" },
"memory": { "href": "/memory/0x401000?length=16" },
"function": { "href": "/functions/0x401000" },
"decompile": { "href": "/functions/0x401000/decompile" }
}
```
- **`GET /function`**: Get information about the function at the current cursor position.
```json
// Example Response
"result": {
"name": "main",
"address": "0x401000",
"signature": "int main(int argc, char** argv)",
"size": 256
},
"_links": {
"self": { "href": "/function" },
"program": { "href": "/program" },
"function": { "href": "/functions/0x401000" },
"decompile": { "href": "/functions/0x401000/decompile" },
"disassembly": { "href": "/functions/0x401000/disassembly" },
"variables": { "href": "/functions/0x401000/variables" },
"xrefs": { "href": "/xrefs?to_addr=0x401000" }
}
```
### 4. Functions
Represents functions within the current program.
- **`GET /functions`**: List functions. Supports searching (by name/address/regex) and pagination.
```json
// Example Response Fragment
"result": [
{ "name": "FUN_08000004", "address": "08000004", "_links": { "self": { "href": "/functions/08000004" } } },
{ "name": "init_peripherals", "address": "08001cf0", "_links": { "self": { "href": "/functions/08001cf0" } } }
]
```
- **`POST /functions`**: Create a function at a specific address. Requires `address` in the request body. Returns the created function resource.
- **`GET /functions/{address}`**: Get details for a specific function (name, signature, size, stack info, etc.).
```json
// Example Response Fragment for GET /functions/0x4010a0
"result": {
"name": "process_data",
"address": "0x4010a0",
"signature": "int process_data(char * data, int size)",
"size": 128,
"stack_depth": 16,
"has_varargs": false,
"calling_convention": "__stdcall"
// ... other details
},
"_links": {
"self": { "href": "/functions/0x4010a0" },
"decompile": { "href": "/functions/0x4010a0/decompile" },
"disassembly": { "href": "/functions/0x4010a0/disassembly" },
"variables": { "href": "/functions/0x4010a0/variables" },
"xrefs_to": { "href": "/xrefs?to_addr=0x4010a0" },
"xrefs_from": { "href": "/xrefs?from_addr=0x4010a0" }
}
```
- **`PATCH /functions/{address}`**: Modify a function. Addressable only by address. Payload can contain:
- `name`: New function name.
- `signature`: Full function signature string (e.g., `void my_func(int p1, char * p2)`).
- `comment`: Set/update the function's primary comment.
```json
// Example PATCH payload
{ "name": "calculate_checksum", "signature": "uint32_t calculate_checksum(uint8_t* buffer, size_t length)" }
```
- **`DELETE /functions/{address}`**: Delete the function definition at the specified address.
#### Function Sub-Resources
- **`GET /functions/{address}/decompile`**: Get decompiled C-like code for the function.
- Query Parameters:
- `?syntax_tree=true`: Include the decompiler's internal syntax tree (JSON).
- `?style=[style_name]`: Apply a specific decompiler simplification style (e.g., `normalize`, `paramid`).
- `?timeout=[seconds]`: Set a timeout for the decompilation process.
```json
// Example Response Fragment (without syntax tree)
"result": {
"address": "0x4010a0",
"ccode": "int process_data(char *param_1, int param_2)\n{\n // ... function body ...\n return result;\n}\n"
}
```
- **`GET /functions/{address}/disassembly`**: Get assembly listing for the function. Supports pagination (`?offset=`, `?limit=`).
```json
// Example Response Fragment
"result": [
{ "address": "0x4010a0", "mnemonic": "PUSH", "operands": "RBP", "bytes": "55" },
{ "address": "0x4010a1", "mnemonic": "MOV", "operands": "RBP, RSP", "bytes": "4889E5" },
// ... more instructions
]
```
- **`GET /functions/{address}/variables`**: List local variables defined within the function. Supports searching by name.
- **`PATCH /functions/{address}/variables/{variable_name}`**: Modify a local variable (rename, change type). Requires `name` and/or `type` in the payload.
### 5. Symbols & Labels
Represents named locations (functions, data, labels).
- **`GET /symbols`**: List all symbols in the program. Supports searching (by name/address/regex) and pagination. Can filter by type (`?type=function`, `?type=data`, `?type=label`).
- **`POST /symbols`**: Create or rename a symbol at a specific address. Requires `address` and `name` in the payload. If a symbol exists, it's renamed; otherwise, a new label is created.
- **`GET /symbols/{address}`**: Get details of the symbol at the specified address.
- **`PATCH /symbols/{address}`**: Modify properties of the symbol (e.g., set as primary, change namespace). Payload specifies changes.
- **`DELETE /symbols/{address}`**: Remove the symbol at the specified address.
### 6. Data
Represents defined data items in memory.
- **`GET /data`**: List defined data items. Supports searching (by name/address/regex) and pagination. Can filter by type (`?type=string`, `?type=dword`, etc.).
- **`POST /data`**: Define a new data item. Requires `address`, `type`, and optionally `size` or `length` in the payload.
- **`GET /data/{address}`**: Get details of the data item at the specified address (type, size, value representation).
- **`PATCH /data/{address}`**: Modify a data item (e.g., change `name`, `type`, `comment`). Payload specifies changes.
- **`DELETE /data/{address}`**: Undefine the data item at the specified address.
### 6.1 Strings
Provides access to string data in the binary.
- **`GET /strings`**: List all defined strings in the binary. Supports pagination and filtering.
- Query Parameters:
- `?offset=[int]`: Number of strings to skip (default: 0).
- `?limit=[int]`: Maximum number of strings to return (default: 2000).
- `?filter=[string]`: Only include strings containing this substring (case-insensitive).
```json
// Example Response
"result": [
{
"address": "0x00401234",
"value": "Hello, world!",
"length": 14,
"type": "string",
"name": "aHelloWorld"
},
{
"address": "0x00401250",
"value": "Error: could not open file",
"length": 26,
"type": "string",
"name": "aErrorCouldNotO"
}
],
"_links": {
"self": { "href": "/strings?offset=0&limit=10" },
"next": { "href": "/strings?offset=10&limit=10" }
}
```
### 6.2 Structs
Provides functionality for creating and managing struct (composite) data types.
- **`GET /structs`**: List all struct data types in the program. Supports pagination and filtering.
- Query Parameters:
- `?offset=[int]`: Number of structs to skip (default: 0).
- `?limit=[int]`: Maximum number of structs to return (default: 100).
- `?category=[string]`: Filter by category path (e.g. "/winapi").
```json
// Example Response
"result": [
{
"name": "MyStruct",
"path": "/custom/MyStruct",
"size": 16,
"numFields": 4,
"category": "/custom",
"description": "Custom data structure"
},
{
"name": "FileHeader",
"path": "/FileHeader",
"size": 32,
"numFields": 8,
"category": "/",
"description": ""
}
],
"_links": {
"self": { "href": "/structs?offset=0&limit=100" },
"program": { "href": "/program" }
}
```
- **`GET /structs?name={struct_name}`**: Get detailed information about a specific struct including all fields.
```json
// Example Response for GET /structs?name=MyStruct
"result": {
"name": "MyStruct",
"path": "/custom/MyStruct",
"size": 16,
"category": "/custom",
"description": "Custom data structure",
"numFields": 4,
"fields": [
{
"name": "id",
"offset": 0,
"length": 4,
"type": "int",
"typePath": "/int",
"comment": "Unique identifier"
},
{
"name": "flags",
"offset": 4,
"length": 4,
"type": "dword",
"typePath": "/dword",
"comment": ""
},
{
"name": "data_ptr",
"offset": 8,
"length": 4,
"type": "pointer",
"typePath": "/pointer",
"comment": "Pointer to data"
},
{
"name": "size",
"offset": 12,
"length": 4,
"type": "uint",
"typePath": "/uint",
"comment": ""
}
]
},
"_links": {
"self": { "href": "/structs?name=MyStruct" },
"structs": { "href": "/structs" },
"program": { "href": "/program" }
}
```
- **`POST /structs/create`**: Create a new struct data type.
- Request Payload:
- `name`: Name for the new struct (required).
- `category`: Category path (optional, defaults to root).
- `description`: Description for the struct (optional).
```json
// Example Request Payload
{
"name": "NetworkPacket",
"category": "/network",
"description": "Network packet structure"
}
// Example Response
"result": {
"name": "NetworkPacket",
"path": "/network/NetworkPacket",
"category": "/network",
"size": 0,
"message": "Struct created successfully"
}
```
- **`POST /structs/addfield`**: Add a field to an existing struct.
- Request Payload:
- `struct`: Name of the struct to modify (required).
- `fieldName`: Name for the new field (required).
- `fieldType`: Data type for the field (required, e.g. "int", "char", "pointer").
- `offset`: Specific offset to insert field (optional, appends to end if not specified).
- `comment`: Comment for the field (optional).
```json
// Example Request Payload
{
"struct": "NetworkPacket",
"fieldName": "header",
"fieldType": "dword",
"comment": "Packet header"
}
// Example Response
"result": {
"struct": "NetworkPacket",
"fieldName": "header",
"fieldType": "dword",
"offset": 0,
"length": 4,
"structSize": 4,
"message": "Field added successfully"
}
```
- **`POST /structs/updatefield`**: Update an existing field in a struct (rename, change type, or modify comment).
- Request Payload:
- `struct`: Name of the struct to modify (required).
- `fieldOffset` OR `fieldName`: Identify the field to update (one required).
- `newName`: New name for the field (optional).
- `newType`: New data type for the field (optional).
- `newComment`: New comment for the field (optional).
- At least one of `newName`, `newType`, or `newComment` must be provided.
```json
// Example Request Payload - rename a field
{
"struct": "NetworkPacket",
"fieldName": "header",
"newName": "packet_header",
"newComment": "Updated packet header field"
}
// Example Request Payload - change type by offset
{
"struct": "NetworkPacket",
"fieldOffset": 0,
"newType": "qword"
}
// Example Response
"result": {
"struct": "NetworkPacket",
"offset": 0,
"originalName": "header",
"originalType": "dword",
"originalComment": "Packet header",
"newName": "packet_header",
"newType": "dword",
"newComment": "Updated packet header field",
"length": 4,
"message": "Field updated successfully"
}
```
- **`POST /structs/delete`**: Delete a struct data type.
- Request Payload:
- `name`: Name of the struct to delete (required).
```json
// Example Request Payload
{
"name": "NetworkPacket"
}
// Example Response
"result": {
"name": "NetworkPacket",
"path": "/network/NetworkPacket",
"category": "/network",
"message": "Struct deleted successfully"
}
```
### 7. Memory Segments
Represents memory blocks/sections defined in the program.
- **`GET /segments`**: List all memory segments (e.g., `.text`, `.data`, `.bss`).
- **`GET /segments/{segment_name}`**: Get details for a specific segment (address range, permissions, size).
### 8. Memory Access
Provides raw memory access.
- **`GET /memory/{address}`**: Read bytes from memory.
- Query Parameters:
- `?length=[bytes]`: Number of bytes to read (required, max limit applies).
- `?format=[hex|base64|string]`: How to encode the returned bytes (default: hex).
```json
// Example Response Fragment for GET /programs/proj%3A%2Ffile.bin/memory/0x402000?length=16&format=hex
"result": {
"address": "0x402000",
"length": 16,
"format": "hex",
"bytes": "48656C6C6F20576F726C642100000000" // "Hello World!...."
}
```
- **`PATCH /memory/{address}`**: Write bytes to memory. Requires `bytes` (in specified `format`) and `format` in the payload. Use with extreme caution.
### 9. Cross-References (XRefs)
Provides information about references to/from addresses.
- **`GET /xrefs`**: Search for cross-references. Supports pagination.
- Query Parameters (at least one required):
- `?to_addr=[address]`: Find references *to* this address.
- `?from_addr=[address]`: Find references *from* this address or within the function/data at this address.
- `?type=[CALL|READ|WRITE|DATA|POINTER|...]`: Filter by reference type.
- **`GET /functions/{address}/xrefs`**: Convenience endpoint, equivalent to `GET /xrefs?to_addr={address}` and potentially `GET /xrefs?from_addr={address}` combined or linked.
### 10. Analysis
Provides access to Ghidra's analysis results.
- **`GET /analysis`**: Get information about the analysis status and available analyzers.
```json
// Example Response
"result": {
"program": "mybinary.exe",
"analysis_enabled": true,
"available_analyzers": [
"Function Start Analyzer",
"Basic Block Model Analyzer",
"Reference Analyzer",
"Call Convention Analyzer",
"Data Reference Analyzer",
"Decompiler Parameter ID",
"Stack Analyzer"
]
},
"_links": {
"self": { "href": "/analysis" },
"program": { "href": "/program" },
"analyze": { "href": "/analysis", "method": "POST" },
"callgraph": { "href": "/analysis/callgraph" }
}
```
- **`POST /analysis`**: Trigger a full or partial re-analysis of the program.
```json
// Example Response
"result": {
"program": "mybinary.exe",
"analysis_triggered": true,
"message": "Analysis initiated on program"
}
```
- **`GET /analysis/callgraph`**: Retrieve the function call graph.
- Query Parameters:
- `?function=[function_name]`: Start the call graph from this function (default: entry point).
- `?max_depth=[int]`: Maximum depth of the call graph (default: 3).
```json
// Example Response
"result": {
"root": "main",
"root_address": "0x401000",
"max_depth": 3,
"nodes": [
{
"id": "0x401000",
"name": "main",
"address": "0x401000",
"depth": 0,
"_links": {
"self": { "href": "/functions/0x401000" }
}
},
// ... more nodes
],
"edges": [
{
"from": "0x401000",
"to": "0x401100",
"type": "call",
"call_site": "0x401050"
},
// ... more edges
]
}
```
- **`GET /analysis/dataflow`**: Perform data flow analysis starting from a specific address.
- Query Parameters:
- `?address=[address]`: Starting address for data flow analysis (required).
- `?direction=[forward|backward]`: Direction of data flow analysis (default: forward).
- `?max_steps=[int]`: Maximum number of steps to analyze (default: 50).
```json
// Example Response
"result": {
"start_address": "0x401050",
"direction": "forward",
"max_steps": 50,
"steps": [
{
"address": "0x401050",
"instruction": "MOV EAX, [RBP+0x8]",
"description": "Starting point of data flow analysis"
},
// ... more steps
]
}
```
## Design Considerations for AI Usage
- **Structured responses**: JSON format ensures predictable parsing by AI agents.
- **HATEOAS Links**: `_links` allow agents to discover available actions and related resources without hardcoding paths.
- **Address and Name Resolution**: Key elements like functions and symbols are addressable by both memory address and name where applicable.
- **Explicit Operations**: Actions like decompilation, disassembly, and analysis are distinct endpoints.
- **Pagination & Filtering**: Essential for handling potentially large datasets (symbols, functions, xrefs, disassembly).
- **Clear Error Reporting**: `success: false` and the `error` object provide actionable feedback.
- **No Injected Summaries**: The API should return raw or structured Ghidra data, leaving interpretation and summarization to the AI agent.