Add documentation for the new strings endpoint in both: - README.md: List as a feature and add example usage - GHIDRA_HTTP_API.md: Add complete API reference with parameters and response format
547 lines
20 KiB
Markdown
547 lines
20 KiB
Markdown
# GhydraMCP Ghidra Plugin HTTP API v2
|
|
|
|
## Overview
|
|
|
|
This API provides a Hypermedia-driven interface (HATEOAS) to interact with Ghidra's CodeBrowser, enabling AI-driven and automated reverse engineering workflows. It allows interaction with Ghidra projects, programs (binaries), functions, symbols, data, memory segments, cross-references, and analysis features. Each program open in Ghidra will have its own plugin instance, so all resources are specific to that program.
|
|
|
|
## General Concepts
|
|
|
|
### Request Format
|
|
|
|
- Use standard HTTP verbs:
|
|
- `GET`: Retrieve resources or lists.
|
|
- `POST`: Create new resources.
|
|
- `PATCH`: Modify existing resources partially.
|
|
- `PUT`: Replace existing resources entirely (Use with caution, `PATCH` is often preferred).
|
|
- `DELETE`: Remove resources.
|
|
- Request bodies for `POST`, `PUT`, `PATCH` should be JSON (`Content-Type: application/json`).
|
|
- Include an optional `X-Request-ID` header with a unique identifier for correlation.
|
|
|
|
### Response Format
|
|
|
|
All non-error responses are JSON (`Content-Type: application/json`) containing at least the following keys:
|
|
|
|
```json
|
|
{
|
|
"id": "[correlation identifier]",
|
|
"instance": "[instance url]",
|
|
"success": true,
|
|
"result": Object | Array<Object>,
|
|
"_links": { // Optional: HATEOAS links
|
|
"self": { "href": "/path/to/current/resource" },
|
|
"related_resource": { "href": "/path/to/related" }
|
|
// ... other relevant links
|
|
}
|
|
}
|
|
```
|
|
|
|
- `id`: The identifier from the `X-Request-ID` header if provided, or a random opaque identifier otherwise.
|
|
- `instance`: The URL of the Ghidra plugin instance that handled the request.
|
|
- `success`: Boolean `true` for successful operations.
|
|
- `result`: The main data payload, either a single JSON object or an array of objects for lists.
|
|
- `_links`: (Optional) Contains HATEOAS-style links to related resources or actions, facilitating discovery.
|
|
|
|
#### List Responses
|
|
|
|
List results (arrays in `result`) will typically include pagination information and a total count:
|
|
|
|
```json
|
|
{
|
|
"id": "req-123",
|
|
"instance": "http://localhost:8192",
|
|
"success": true,
|
|
"result": [ ... objects ... ],
|
|
"size": 150, // Total number of items matching the query across all pages
|
|
"offset": 0,
|
|
"limit": 50,
|
|
"_links": {
|
|
"self": { "href": "/functions?offset=0&limit=50" },
|
|
"next": { "href": "/functions?offset=50&limit=50" }, // Present if more items exist
|
|
"prev": { "href": "/functions?offset=0&limit=50" } // Present if not the first page
|
|
}
|
|
}
|
|
```
|
|
|
|
### Error Responses
|
|
|
|
Errors use appropriate HTTP status codes (4xx, 5xx) and have a JSON payload with an `error` key:
|
|
|
|
```json
|
|
{
|
|
"id": "[correlation identifier]",
|
|
"instance": "[instance url]",
|
|
"success": false,
|
|
"error": {
|
|
"code": "RESOURCE_NOT_FOUND", // Optional: Machine-readable code
|
|
"message": "Descriptive error message"
|
|
// Potentially other details like invalid parameters
|
|
}
|
|
}
|
|
```
|
|
|
|
Common HTTP Status Codes:
|
|
- `200 OK`: Successful `GET`, `PATCH`, `PUT`, `DELETE`.
|
|
- `201 Created`: Successful `POST` resulting in resource creation.
|
|
- `204 No Content`: Successful `DELETE` or `PATCH`/`PUT` where no body is returned.
|
|
- `400 Bad Request`: Invalid syntax, missing required parameters, invalid data format.
|
|
- `401 Unauthorized`: Authentication required or failed (if implemented).
|
|
- `403 Forbidden`: Authenticated user lacks permission (if implemented).
|
|
- `404 Not Found`: Resource or endpoint does not exist, or query yielded no results.
|
|
- `405 Method Not Allowed`: HTTP verb not supported for this endpoint.
|
|
- `500 Internal Server Error`: Unexpected error within the Ghidra plugin.
|
|
|
|
### Addressing and Searching
|
|
|
|
Resources like functions, data, and symbols often exist at specific memory addresses and may have names.
|
|
|
|
- **By Address:** Use the resource's path with the address (hexadecimal, e.g., `0x401000` or `08000004`).
|
|
- Example: `GET /functions/0x401000`
|
|
- **Querying Lists:** List endpoints (e.g., `/functions`, `/symbols`, `/data`) support filtering via query parameters:
|
|
- `?addr=[address in hex]`: Find item at a specific address.
|
|
- `?name=[full_name]`: Find item(s) with an exact name match (case-sensitive).
|
|
- `?name_contains=[substring]`: Find item(s) whose name contains the substring (case-insensitive).
|
|
- `?name_matches_regex=[regex]`: Find item(s) whose name matches the Java-compatible regular expression.
|
|
|
|
### Pagination
|
|
|
|
List endpoints support pagination using query parameters:
|
|
- `?offset=[int]`: Number of items to skip (default: 0).
|
|
- `?limit=[int]`: Maximum number of items to return (default: implementation-defined, e.g., 100).
|
|
|
|
## Meta Endpoints
|
|
|
|
### `GET /plugin-version`
|
|
Returns the version of the running Ghidra plugin and its API. Essential for compatibility checks by clients like the MCP bridge.
|
|
```json
|
|
{
|
|
"id": "req-meta-ver",
|
|
"instance": "http://localhost:8192",
|
|
"success": true,
|
|
"result": {
|
|
"plugin_version": "v2.0.0", // Example plugin build version
|
|
"api_version": 2 // Ordinal API version
|
|
},
|
|
"_links": {
|
|
"self": { "href": "/plugin-version" },
|
|
"root": { "href": "/" }
|
|
}
|
|
}
|
|
```
|
|
|
|
### `GET /info`
|
|
Returns information about the current plugin instance, including details about the loaded program and project.
|
|
```json
|
|
{
|
|
"id": "req-info",
|
|
"instance": "http://localhost:8192",
|
|
"success": true,
|
|
"result": {
|
|
"isBaseInstance": true,
|
|
"file": "example.exe",
|
|
"architecture": "x86:LE:64:default",
|
|
"processor": "x86",
|
|
"addressSize": 64,
|
|
"creationDate": "2023-01-01T12:00:00Z",
|
|
"executable": "/path/to/example.exe",
|
|
"project": "MyProject",
|
|
"projectLocation": "/path/to/MyProject",
|
|
"serverPort": 8192,
|
|
"serverStartTime": 1672531200000,
|
|
"instanceCount": 1
|
|
},
|
|
"_links": {
|
|
"self": { "href": "/info" },
|
|
"root": { "href": "/" },
|
|
"instances": { "href": "/instances" },
|
|
"program": { "href": "/program" }
|
|
}
|
|
}
|
|
```
|
|
|
|
### `GET /instances`
|
|
Returns information about all active GhydraMCP plugin instances.
|
|
```json
|
|
{
|
|
"id": "req-instances",
|
|
"instance": "http://localhost:8192",
|
|
"success": true,
|
|
"result": [
|
|
{
|
|
"port": 8192,
|
|
"url": "http://localhost:8192",
|
|
"type": "base",
|
|
"project": "MyProject",
|
|
"file": "example.exe",
|
|
"_links": {
|
|
"self": { "href": "/instances/8192" },
|
|
"info": { "href": "http://localhost:8192/info" },
|
|
"connect": { "href": "http://localhost:8192" }
|
|
}
|
|
},
|
|
{
|
|
"port": 8193,
|
|
"url": "http://localhost:8193",
|
|
"type": "standard",
|
|
"project": "MyProject",
|
|
"file": "library.dll",
|
|
"_links": {
|
|
"self": { "href": "/instances/8193" },
|
|
"info": { "href": "http://localhost:8193/info" },
|
|
"connect": { "href": "http://localhost:8193" }
|
|
}
|
|
}
|
|
],
|
|
"_links": {
|
|
"self": { "href": "/instances" },
|
|
"register": { "href": "/registerInstance", "method": "POST" },
|
|
"unregister": { "href": "/unregisterInstance", "method": "POST" },
|
|
"programs": { "href": "/programs" }
|
|
}
|
|
}
|
|
```
|
|
|
|
## Resource Types
|
|
|
|
Each Ghidra plugin instance runs in the context of a single program, so all resources are relative to the current program. The program's details are available through the `GET /info` and `GET /program` endpoints.
|
|
|
|
### 1. Project
|
|
|
|
Represents the current Ghidra project, which is a container for programs.
|
|
|
|
- **`GET /project`**: Get details about the current project (e.g., location, list of open programs within it via links).
|
|
|
|
### 2. Program
|
|
|
|
Represents the current binary loaded in Ghidra.
|
|
|
|
- **`GET /program`**: Get metadata for the current program (e.g., name, architecture, memory layout, analysis status).
|
|
```json
|
|
// Example Response Fragment for GET /program
|
|
"result": {
|
|
"programId": "myproject:/path/to/mybinary.exe",
|
|
"name": "mybinary.exe",
|
|
"isOpen": true,
|
|
"languageId": "x86:LE:64:default",
|
|
"compilerSpecId": "gcc",
|
|
"imageBase": "0x400000",
|
|
"memorySize": 1048576,
|
|
"analysisComplete": true
|
|
},
|
|
"_links": {
|
|
"self": { "href": "/program" },
|
|
"project": { "href": "/project" },
|
|
"functions": { "href": "/functions" },
|
|
"symbols": { "href": "/symbols" },
|
|
"data": { "href": "/data" },
|
|
"segments": { "href": "/segments" },
|
|
"memory": { "href": "/memory" },
|
|
"xrefs": { "href": "/xrefs" },
|
|
"analysis": { "href": "/analysis" }
|
|
}
|
|
```
|
|
|
|
### 3. Current Location
|
|
|
|
Provides information about the current cursor position and function in Ghidra's CodeBrowser.
|
|
|
|
- **`GET /address`**: Get the current cursor position.
|
|
```json
|
|
// Example Response
|
|
"result": {
|
|
"address": "0x401000",
|
|
"program": "mybinary.exe"
|
|
},
|
|
"_links": {
|
|
"self": { "href": "/address" },
|
|
"program": { "href": "/program" },
|
|
"memory": { "href": "/memory/0x401000?length=16" },
|
|
"function": { "href": "/functions/0x401000" },
|
|
"decompile": { "href": "/functions/0x401000/decompile" }
|
|
}
|
|
```
|
|
|
|
- **`GET /function`**: Get information about the function at the current cursor position.
|
|
```json
|
|
// Example Response
|
|
"result": {
|
|
"name": "main",
|
|
"address": "0x401000",
|
|
"signature": "int main(int argc, char** argv)",
|
|
"size": 256
|
|
},
|
|
"_links": {
|
|
"self": { "href": "/function" },
|
|
"program": { "href": "/program" },
|
|
"function": { "href": "/functions/0x401000" },
|
|
"decompile": { "href": "/functions/0x401000/decompile" },
|
|
"disassembly": { "href": "/functions/0x401000/disassembly" },
|
|
"variables": { "href": "/functions/0x401000/variables" },
|
|
"xrefs": { "href": "/xrefs?to_addr=0x401000" }
|
|
}
|
|
```
|
|
|
|
### 4. Functions
|
|
|
|
Represents functions within the current program.
|
|
|
|
- **`GET /functions`**: List functions. Supports searching (by name/address/regex) and pagination.
|
|
```json
|
|
// Example Response Fragment
|
|
"result": [
|
|
{ "name": "FUN_08000004", "address": "08000004", "_links": { "self": { "href": "/functions/08000004" } } },
|
|
{ "name": "init_peripherals", "address": "08001cf0", "_links": { "self": { "href": "/functions/08001cf0" } } }
|
|
]
|
|
```
|
|
- **`POST /functions`**: Create a function at a specific address. Requires `address` in the request body. Returns the created function resource.
|
|
- **`GET /functions/{address}`**: Get details for a specific function (name, signature, size, stack info, etc.).
|
|
```json
|
|
// Example Response Fragment for GET /functions/0x4010a0
|
|
"result": {
|
|
"name": "process_data",
|
|
"address": "0x4010a0",
|
|
"signature": "int process_data(char * data, int size)",
|
|
"size": 128,
|
|
"stack_depth": 16,
|
|
"has_varargs": false,
|
|
"calling_convention": "__stdcall"
|
|
// ... other details
|
|
},
|
|
"_links": {
|
|
"self": { "href": "/functions/0x4010a0" },
|
|
"decompile": { "href": "/functions/0x4010a0/decompile" },
|
|
"disassembly": { "href": "/functions/0x4010a0/disassembly" },
|
|
"variables": { "href": "/functions/0x4010a0/variables" },
|
|
"xrefs_to": { "href": "/xrefs?to_addr=0x4010a0" },
|
|
"xrefs_from": { "href": "/xrefs?from_addr=0x4010a0" }
|
|
}
|
|
```
|
|
- **`PATCH /functions/{address}`**: Modify a function. Addressable only by address. Payload can contain:
|
|
- `name`: New function name.
|
|
- `signature`: Full function signature string (e.g., `void my_func(int p1, char * p2)`).
|
|
- `comment`: Set/update the function's primary comment.
|
|
```json
|
|
// Example PATCH payload
|
|
{ "name": "calculate_checksum", "signature": "uint32_t calculate_checksum(uint8_t* buffer, size_t length)" }
|
|
```
|
|
- **`DELETE /functions/{address}`**: Delete the function definition at the specified address.
|
|
|
|
#### Function Sub-Resources
|
|
|
|
- **`GET /functions/{address}/decompile`**: Get decompiled C-like code for the function.
|
|
- Query Parameters:
|
|
- `?syntax_tree=true`: Include the decompiler's internal syntax tree (JSON).
|
|
- `?style=[style_name]`: Apply a specific decompiler simplification style (e.g., `normalize`, `paramid`).
|
|
- `?timeout=[seconds]`: Set a timeout for the decompilation process.
|
|
```json
|
|
// Example Response Fragment (without syntax tree)
|
|
"result": {
|
|
"address": "0x4010a0",
|
|
"ccode": "int process_data(char *param_1, int param_2)\n{\n // ... function body ...\n return result;\n}\n"
|
|
}
|
|
```
|
|
- **`GET /functions/{address}/disassembly`**: Get assembly listing for the function. Supports pagination (`?offset=`, `?limit=`).
|
|
```json
|
|
// Example Response Fragment
|
|
"result": [
|
|
{ "address": "0x4010a0", "mnemonic": "PUSH", "operands": "RBP", "bytes": "55" },
|
|
{ "address": "0x4010a1", "mnemonic": "MOV", "operands": "RBP, RSP", "bytes": "4889E5" },
|
|
// ... more instructions
|
|
]
|
|
```
|
|
- **`GET /functions/{address}/variables`**: List local variables defined within the function. Supports searching by name.
|
|
- **`PATCH /functions/{address}/variables/{variable_name}`**: Modify a local variable (rename, change type). Requires `name` and/or `type` in the payload.
|
|
|
|
### 5. Symbols & Labels
|
|
|
|
Represents named locations (functions, data, labels).
|
|
|
|
- **`GET /symbols`**: List all symbols in the program. Supports searching (by name/address/regex) and pagination. Can filter by type (`?type=function`, `?type=data`, `?type=label`).
|
|
- **`POST /symbols`**: Create or rename a symbol at a specific address. Requires `address` and `name` in the payload. If a symbol exists, it's renamed; otherwise, a new label is created.
|
|
- **`GET /symbols/{address}`**: Get details of the symbol at the specified address.
|
|
- **`PATCH /symbols/{address}`**: Modify properties of the symbol (e.g., set as primary, change namespace). Payload specifies changes.
|
|
- **`DELETE /symbols/{address}`**: Remove the symbol at the specified address.
|
|
|
|
### 6. Data
|
|
|
|
Represents defined data items in memory.
|
|
|
|
- **`GET /data`**: List defined data items. Supports searching (by name/address/regex) and pagination. Can filter by type (`?type=string`, `?type=dword`, etc.).
|
|
- **`POST /data`**: Define a new data item. Requires `address`, `type`, and optionally `size` or `length` in the payload.
|
|
- **`GET /data/{address}`**: Get details of the data item at the specified address (type, size, value representation).
|
|
- **`PATCH /data/{address}`**: Modify a data item (e.g., change `name`, `type`, `comment`). Payload specifies changes.
|
|
- **`DELETE /data/{address}`**: Undefine the data item at the specified address.
|
|
|
|
### 6.1 Strings
|
|
|
|
Provides access to string data in the binary.
|
|
|
|
- **`GET /strings`**: List all defined strings in the binary. Supports pagination and filtering.
|
|
- Query Parameters:
|
|
- `?offset=[int]`: Number of strings to skip (default: 0).
|
|
- `?limit=[int]`: Maximum number of strings to return (default: 2000).
|
|
- `?filter=[string]`: Only include strings containing this substring (case-insensitive).
|
|
```json
|
|
// Example Response
|
|
"result": [
|
|
{
|
|
"address": "0x00401234",
|
|
"value": "Hello, world!",
|
|
"length": 14,
|
|
"type": "string",
|
|
"name": "aHelloWorld"
|
|
},
|
|
{
|
|
"address": "0x00401250",
|
|
"value": "Error: could not open file",
|
|
"length": 26,
|
|
"type": "string",
|
|
"name": "aErrorCouldNotO"
|
|
}
|
|
],
|
|
"_links": {
|
|
"self": { "href": "/strings?offset=0&limit=10" },
|
|
"next": { "href": "/strings?offset=10&limit=10" }
|
|
}
|
|
```
|
|
|
|
### 7. Memory Segments
|
|
|
|
Represents memory blocks/sections defined in the program.
|
|
|
|
- **`GET /segments`**: List all memory segments (e.g., `.text`, `.data`, `.bss`).
|
|
- **`GET /segments/{segment_name}`**: Get details for a specific segment (address range, permissions, size).
|
|
|
|
### 8. Memory Access
|
|
|
|
Provides raw memory access.
|
|
|
|
- **`GET /memory/{address}`**: Read bytes from memory.
|
|
- Query Parameters:
|
|
- `?length=[bytes]`: Number of bytes to read (required, max limit applies).
|
|
- `?format=[hex|base64|string]`: How to encode the returned bytes (default: hex).
|
|
```json
|
|
// Example Response Fragment for GET /programs/proj%3A%2Ffile.bin/memory/0x402000?length=16&format=hex
|
|
"result": {
|
|
"address": "0x402000",
|
|
"length": 16,
|
|
"format": "hex",
|
|
"bytes": "48656C6C6F20576F726C642100000000" // "Hello World!...."
|
|
}
|
|
```
|
|
- **`PATCH /memory/{address}`**: Write bytes to memory. Requires `bytes` (in specified `format`) and `format` in the payload. Use with extreme caution.
|
|
|
|
### 9. Cross-References (XRefs)
|
|
|
|
Provides information about references to/from addresses.
|
|
|
|
- **`GET /xrefs`**: Search for cross-references. Supports pagination.
|
|
- Query Parameters (at least one required):
|
|
- `?to_addr=[address]`: Find references *to* this address.
|
|
- `?from_addr=[address]`: Find references *from* this address or within the function/data at this address.
|
|
- `?type=[CALL|READ|WRITE|DATA|POINTER|...]`: Filter by reference type.
|
|
- **`GET /functions/{address}/xrefs`**: Convenience endpoint, equivalent to `GET /xrefs?to_addr={address}` and potentially `GET /xrefs?from_addr={address}` combined or linked.
|
|
|
|
### 10. Analysis
|
|
|
|
Provides access to Ghidra's analysis results.
|
|
|
|
- **`GET /analysis`**: Get information about the analysis status and available analyzers.
|
|
```json
|
|
// Example Response
|
|
"result": {
|
|
"program": "mybinary.exe",
|
|
"analysis_enabled": true,
|
|
"available_analyzers": [
|
|
"Function Start Analyzer",
|
|
"Basic Block Model Analyzer",
|
|
"Reference Analyzer",
|
|
"Call Convention Analyzer",
|
|
"Data Reference Analyzer",
|
|
"Decompiler Parameter ID",
|
|
"Stack Analyzer"
|
|
]
|
|
},
|
|
"_links": {
|
|
"self": { "href": "/analysis" },
|
|
"program": { "href": "/program" },
|
|
"analyze": { "href": "/analysis", "method": "POST" },
|
|
"callgraph": { "href": "/analysis/callgraph" }
|
|
}
|
|
```
|
|
|
|
- **`POST /analysis`**: Trigger a full or partial re-analysis of the program.
|
|
```json
|
|
// Example Response
|
|
"result": {
|
|
"program": "mybinary.exe",
|
|
"analysis_triggered": true,
|
|
"message": "Analysis initiated on program"
|
|
}
|
|
```
|
|
|
|
- **`GET /analysis/callgraph`**: Retrieve the function call graph.
|
|
- Query Parameters:
|
|
- `?function=[function_name]`: Start the call graph from this function (default: entry point).
|
|
- `?max_depth=[int]`: Maximum depth of the call graph (default: 3).
|
|
```json
|
|
// Example Response
|
|
"result": {
|
|
"root": "main",
|
|
"root_address": "0x401000",
|
|
"max_depth": 3,
|
|
"nodes": [
|
|
{
|
|
"id": "0x401000",
|
|
"name": "main",
|
|
"address": "0x401000",
|
|
"depth": 0,
|
|
"_links": {
|
|
"self": { "href": "/functions/0x401000" }
|
|
}
|
|
},
|
|
// ... more nodes
|
|
],
|
|
"edges": [
|
|
{
|
|
"from": "0x401000",
|
|
"to": "0x401100",
|
|
"type": "call",
|
|
"call_site": "0x401050"
|
|
},
|
|
// ... more edges
|
|
]
|
|
}
|
|
```
|
|
|
|
- **`GET /analysis/dataflow`**: Perform data flow analysis starting from a specific address.
|
|
- Query Parameters:
|
|
- `?address=[address]`: Starting address for data flow analysis (required).
|
|
- `?direction=[forward|backward]`: Direction of data flow analysis (default: forward).
|
|
- `?max_steps=[int]`: Maximum number of steps to analyze (default: 50).
|
|
```json
|
|
// Example Response
|
|
"result": {
|
|
"start_address": "0x401050",
|
|
"direction": "forward",
|
|
"max_steps": 50,
|
|
"steps": [
|
|
{
|
|
"address": "0x401050",
|
|
"instruction": "MOV EAX, [RBP+0x8]",
|
|
"description": "Starting point of data flow analysis"
|
|
},
|
|
// ... more steps
|
|
]
|
|
}
|
|
```
|
|
|
|
## Design Considerations for AI Usage
|
|
|
|
- **Structured responses**: JSON format ensures predictable parsing by AI agents.
|
|
- **HATEOAS Links**: `_links` allow agents to discover available actions and related resources without hardcoding paths.
|
|
- **Address and Name Resolution**: Key elements like functions and symbols are addressable by both memory address and name where applicable.
|
|
- **Explicit Operations**: Actions like decompilation, disassembly, and analysis are distinct endpoints.
|
|
- **Pagination & Filtering**: Essential for handling potentially large datasets (symbols, functions, xrefs, disassembly).
|
|
- **Clear Error Reporting**: `success: false` and the `error` object provide actionable feedback.
|
|
- **No Injected Summaries**: The API should return raw or structured Ghidra data, leaving interpretation and summarization to the AI agent.
|