2 Commits

Author SHA1 Message Date
dee5fdacda Hamilton review fixes: validator literal preservation, cache cluster id, CSS impact partial-failure reporting
Three findings from a margaret-hamilton-style review of the MCP server,
fixed with regression tests written first (red → green). One bonus
finding (huntpilotqueue column name) was surfaced by the third fix
itself — exactly the audit-trust failure mode that fix exists to expose.

CRITICAL #1 — sql_validator: comment-strip mutated string literals.

The cleaned query returned by validate_select() is what travels to AXL.
Previously, the comment-strip pass ran before the literal-aware pass,
so `--` or `/* */` markers inside a string literal were silently eaten:

  input:  WHERE description = 'Smith -- old line'
  to AXL: WHERE description = 'Smith    (truncated mid-literal)

The LLM saw rows that looked plausible but were not what its query
asked for. "Confidently wrong" is exactly the failure mode the review
was hunting.

Fix: only strip comments on the analysis-only copy used for keyword
detection. The cleaned output preserves the input verbatim (modulo
trailing semicolon and outer whitespace). 6 new tests covering literal
preservation across `--`, `/* */`, LIKE patterns with embedded comment
markers, and forbidden keywords inside real comments.

CRITICAL #2 — cache key omitted cluster identity.

The on-disk cache key was `method::args_json`. An operator swapping
AXL_URL between test and prod (or between two clusters) would silently
serve stale data from cluster A as if from cluster B. The audit
report would be confidently wrong with no signal anything happened.

Fix: AxlCache now takes cluster_id and prefixes all keys with it.
Server bootstrap derives cluster_id as a 12-char SHA-256 prefix of
AXL_URL. cache_stats() surfaces both the current cluster_id and a
`foreign_cluster_entries` count so an env-swap is visible. Schema
migration handles pre-fix cache files via PRAGMA table_info introspection
plus a one-shot ALTER TABLE ADD COLUMN. 5 new tests covering isolation,
shared-id sharing, stats reporting, legacy DB upgrade, and per-cluster
clear() scoping.

MAJOR #3 — find_devices_using_css summary undercounted partial failures.

The function is per-category resilient (one failed query doesn't kill
the whole impact analysis), but the resilience never propagated up to
the response. total_returned and any_truncated only reflected SUCCESSFUL
categories. An LLM consuming "47 references" had no way to know 5
categories errored and the real number was likely much higher.

Fix: response now includes complete: bool, categories_with_errors: int,
and error_categories: [list]. The LLM/auditor sees the partial-failure
state and can decide whether to act on incomplete data. 5 new tests
using a FakeAxlClient stand-in to simulate per-category failures.

BONUS finding (uncovered by Major #3 fix): huntpilotqueue join used
the wrong column. Three CSS impact categories (huntpilot_max_wait_css,
huntpilot_no_agent_css, huntpilot_queue_full_css) were silently
erroring with "Column (fknumplan) not found" because huntpilotqueue
joins via fknumplan_pilot, not fknumplan. With the Major #3 fix in
place, this surfaced immediately as `complete: False, error_categories:
[3 huntpilot_*]` against the live cluster. Fixed inline; live re-run
now reports `complete: True, total_returned: 163` for Internal-CSS.

87 unit tests passing (up from 70). Live cluster smoke test
(cucm-pub.binghammemorial.org, CUCM 15.0.1.12900-234) verifies all
three fixes plus the bonus finding work end-to-end.
2026-04-25 23:09:55 -06:00
8b3da9d729 Initial mcp-cucm-axl
Read-only MCP server for Cisco Unified CM 15 AXL — built for LLM-driven
cluster auditing, with a particular focus on the Route Plan Report:
partitions, calling search spaces, route patterns, translation patterns,
called/calling party transformations, and digit-discard instructions.

Pairs intentionally with the sibling mcp-cisco-docs server (live
cluster state + vendor docs in one LLM context).

Architecture:
  - zeep SOAP client to CUCM AXL
  - WSDL bootstrap from Cisco's axlsqltoolkit.zip (auto-extract on
    first launch; zip is gitignored, vendor-licensed)
  - SQLite response cache at ~/.cache/mcp-cucm-axl/responses/
  - Schema-grounded prompts that pull chunks from the sibling
    cisco-docs index (docs_loader.py)

Read-only by structural guarantee — never registers AXL write methods
(no executeSQLUpdate, no add*/update*/remove*/apply*/reset*/restart*
tools). SQL queries also client-side validated (sql_validator.py) to
begin with SELECT or WITH.

Tools exposed:
  Foundational: axl_version, axl_sql, axl_list_tables,
                axl_describe_table, cache_stats, cache_clear
  Route plan:   route_partitions, route_calling_search_spaces,
                route_patterns, route_inspect_pattern,
                route_lists_and_groups, route_translation_chain,
                route_digit_discard_instructions

Prompts (schema-grounded):
  route_plan_overview, investigate_pattern, audit_routing,
  cucm_sql_help

Tests cover cache, docs_loader, normalize, sql_validator, wildcard.
2026-04-25 20:29:18 -06:00