Closes the four remaining findings from the margaret-hamilton review.
13 new regression tests; all 100 pass; live cluster smoke verified.
MAJOR #4 — wildcard regex catastrophic backtracking + silent malformed.
Two changes to _wildcard_to_regex():
a) Bounded the `!` and `@` wildcards to \d{1,50} (was \d+). Adjacent
`!` patterns previously compiled to (\d+)(\d+)... which has
exponential backtracking on near-miss inputs. CUCM dial strings
are practically capped well below 50 digits; the bound keeps
complexity polynomial without losing real-world coverage.
Verified: 10 adjacent `!` against a 30-digit near-miss now finishes
in ~240ms (was unbounded; could have been minutes on real
pathological cases).
b) Unclosed `[` now raises ValueError instead of silently treating the
bracket as a literal. _pattern_matches_number catches the error
and returns False so a single bad pattern doesn't crash
translation_chain — but the bad pattern is no longer invisibly
producing wrong matches. The previous silent fallback meant a
pattern like `[0-9` (typo, missing `]`) would match input
containing the literal characters `[` `0` `-` `9`.
3 new tests covering: bounded-regex shape (`\d{1,N}`), pathological
input completes quickly, unclosed bracket raises explicitly,
well-formed character class still works.
MAJOR #5 — distinguish config errors from operational errors.
Pre-fix: any first-time connection failure set `_connection_error`
and pinned it forever. A transient network blip or session timeout
required restarting the MCP server. Hamilton's framing: Apollo's
software was *designed* to recover from transient faults; pinning
forever is the antithesis of "design the error path first."
Fix: split into two state fields:
_config_error — permanent until restart (missing env vars only)
_last_error — last operational failure, NOT a pin
Operational failures (zeep Client construction, network, TLS, session)
clear from the next call's perspective: the next call attempts fresh.
Configuration errors (missing AXL_URL etc.) stay pinned because
they don't get better on retry.
Added _ConfigError as a private subclass to make the distinction
explicit at the raise site, and connection_status() to expose
connected/connected_at/config_error/last_error for diagnostic
transparency.
3 new tests: config errors pin, operational errors don't pin,
connection_status() reports state.
MINOR #6 — _to_int silent coercion of bad data.
Pre-fix: a non-numeric value from the cluster (data corruption,
schema drift across CUCM versions) silently became None, which
downstream sort logic defaulted to 0 — jumbling the failover order
in the displayed result with no warning.
Fix: still returns None on bad data (caller error path unchanged),
but logs the offending value to stderr so an operator notices
something's wrong at the data layer. None itself is silent
(legitimately-unset column).
2 new tests: real None is silent, bad string logs to stderr with
the offending value visible.
MINOR #7 — standardize tool failure shapes; add health() tool.
Pre-fix: cache_stats and cache_clear returned `{"error": "..."}`
when _cache was None, while AXL-touching tools raised RuntimeError.
LLM consumers had to handle two shapes.
Fix: _require_cache() helper raises RuntimeError consistently with
_client(). All tool failures now use the same exception shape.
Added health() tool that reports cache/axl/docs initialization
status plus the AXL connection_status — gives operators a
self-diagnostic when something fails at bootstrap.
3 new tests: cache_stats raises, cache_clear raises, health()
reports each subsystem.
Three findings from a margaret-hamilton-style review of the MCP server,
fixed with regression tests written first (red → green). One bonus
finding (huntpilotqueue column name) was surfaced by the third fix
itself — exactly the audit-trust failure mode that fix exists to expose.
CRITICAL #1 — sql_validator: comment-strip mutated string literals.
The cleaned query returned by validate_select() is what travels to AXL.
Previously, the comment-strip pass ran before the literal-aware pass,
so `--` or `/* */` markers inside a string literal were silently eaten:
input: WHERE description = 'Smith -- old line'
to AXL: WHERE description = 'Smith (truncated mid-literal)
The LLM saw rows that looked plausible but were not what its query
asked for. "Confidently wrong" is exactly the failure mode the review
was hunting.
Fix: only strip comments on the analysis-only copy used for keyword
detection. The cleaned output preserves the input verbatim (modulo
trailing semicolon and outer whitespace). 6 new tests covering literal
preservation across `--`, `/* */`, LIKE patterns with embedded comment
markers, and forbidden keywords inside real comments.
CRITICAL #2 — cache key omitted cluster identity.
The on-disk cache key was `method::args_json`. An operator swapping
AXL_URL between test and prod (or between two clusters) would silently
serve stale data from cluster A as if from cluster B. The audit
report would be confidently wrong with no signal anything happened.
Fix: AxlCache now takes cluster_id and prefixes all keys with it.
Server bootstrap derives cluster_id as a 12-char SHA-256 prefix of
AXL_URL. cache_stats() surfaces both the current cluster_id and a
`foreign_cluster_entries` count so an env-swap is visible. Schema
migration handles pre-fix cache files via PRAGMA table_info introspection
plus a one-shot ALTER TABLE ADD COLUMN. 5 new tests covering isolation,
shared-id sharing, stats reporting, legacy DB upgrade, and per-cluster
clear() scoping.
MAJOR #3 — find_devices_using_css summary undercounted partial failures.
The function is per-category resilient (one failed query doesn't kill
the whole impact analysis), but the resilience never propagated up to
the response. total_returned and any_truncated only reflected SUCCESSFUL
categories. An LLM consuming "47 references" had no way to know 5
categories errored and the real number was likely much higher.
Fix: response now includes complete: bool, categories_with_errors: int,
and error_categories: [list]. The LLM/auditor sees the partial-failure
state and can decide whether to act on incomplete data. 5 new tests
using a FakeAxlClient stand-in to simulate per-category failures.
BONUS finding (uncovered by Major #3 fix): huntpilotqueue join used
the wrong column. Three CSS impact categories (huntpilot_max_wait_css,
huntpilot_no_agent_css, huntpilot_queue_full_css) were silently
erroring with "Column (fknumplan) not found" because huntpilotqueue
joins via fknumplan_pilot, not fknumplan. With the Major #3 fix in
place, this surfaced immediately as `complete: False, error_categories:
[3 huntpilot_*]` against the live cluster. Fixed inline; live re-run
now reports `complete: True, total_returned: 163` for Internal-CSS.
87 unit tests passing (up from 70). Live cluster smoke test
(cucm-pub.binghammemorial.org, CUCM 15.0.1.12900-234) verifies all
three fixes plus the bonus finding work end-to-end.
Two defects found during live-cluster audit shakedown.
1. SQL validator false-positives on string literals
The forbidden-keyword check tokenized the entire query, including
contents of single-quoted string literals. CSS names like
'Call Forward-CSS', DN descriptions containing 'DELETE', or partition
names with 'INSERT' all tripped the validator even though the SQL
itself was clean read-only. Found while running impact analysis on
"Call Forward-CSS".
Fix: strip string literals (single-quoted, with '' as escape) into
whitespace before the forbidden-keyword tokenization. The cleaned
query returned to the caller still contains the literals — they're
only invisible to the analysis pass.
7 new tests covering: words inside literals (Call/Drop/Delete/etc.),
escaped quotes, multiple literals, and the critical case where a
forbidden keyword appears immediately after a literal.
2. CSS impact analysis missed primary device CSS + 7 other refs
Running route_devices_using_css("E911CSS") returned total=0 even
though E911CSS is configured in the cluster. Root cause: our
enumeration covered device.fkcallingsearchspace_{reroute,restrict,
refer,rdntransform} but not the primary device.fkcallingsearchspace
itself — the column the GUI sets when assigning a CSS to a phone.
The simple unsuffixed name didn't match our earlier "%css%" schema
filter (the actual column spells out "callingsearchspace").
Added 8 new reference categories:
device_primary_css — the big one
device_cgpn_unknown_css — calling-party-unknown
line_monitoring_css — devicenumplanmap monitoring CSS
gateway_h323_called_xform_css — H.323 gateway transform
gateway_sip_called_xform_css — SIP trunk transform
huntpilot_max_wait_css — hunt pilot queue handling
huntpilot_no_agent_css — hunt pilot queue handling
huntpilot_queue_full_css — hunt pilot queue handling
Re-running on live cluster:
Internal-CSS: 146 -> 163 refs (16 new device_primary_css matches)
Call Forward-CSS: previously rejected by validator -> 150 refs
E911CSS: still 0 — high-confidence orphan finding now
Two MCP tools blew the per-response token cap when run against a real
medium-sized cluster (Bingham Memorial, ~1500 patterns in Internal-PT,
20 route filters with hundreds of member rules each):
route_devices_using_css("Internal-CSS") -> 103,590 chars
route_filters() -> 304,639 chars
Both responses are now compact-by-default with opt-in detail:
route_filters(include_members=False, default):
- returns name, clause, dial_plan, and member_count per filter
- 304,639 -> 17,354 chars (94% reduction)
- member_count is the audit-relevant signal anyway: filters with
100+ rules are complex; the count tells you that without paying
for the full rule listing
- include_members=True scopes detail to a single named filter
(BLK-ALWAYS-RF with 432 rules: 40K chars; tractable per-filter)
route_devices_using_css(max_per_category=50, default):
- each category returns at most max_per_category rows
- truncated: bool flag set when underlying count exceeds the cap
- 103,590 -> 13,855 chars (87% reduction)
- implementation uses SELECT FIRST max+1, so no extra COUNT query
per category — single round-trip with accurate truncation flag
- LLM can drill in via higher max_per_category or axl_sql when
truncated=true
Both changes are backward-compatible defaults; existing callers continue
to work and just get smaller, structured responses.
- route_plan.py: drop `NULL AS context` from voicemail_pilot_css query.
Informix rejected it as a syntax error; the column wasn't carrying any
signal anyway, so the simpler SELECT works and matches the other
reference-point queries.
- README.md: tool table now covers all 16 tools (route_device_pool_route_groups,
route_devices_using_css, route_filters were missing).
- .gitignore: explicitly ignore .env. Already covered by ~/.gitignore_global,
but worth being self-contained — anyone cloning without the global ignore
shouldn't be one stray `git add` away from leaking AXL credentials.
Read-only MCP server for Cisco Unified CM 15 AXL — built for LLM-driven
cluster auditing, with a particular focus on the Route Plan Report:
partitions, calling search spaces, route patterns, translation patterns,
called/calling party transformations, and digit-discard instructions.
Pairs intentionally with the sibling mcp-cisco-docs server (live
cluster state + vendor docs in one LLM context).
Architecture:
- zeep SOAP client to CUCM AXL
- WSDL bootstrap from Cisco's axlsqltoolkit.zip (auto-extract on
first launch; zip is gitignored, vendor-licensed)
- SQLite response cache at ~/.cache/mcp-cucm-axl/responses/
- Schema-grounded prompts that pull chunks from the sibling
cisco-docs index (docs_loader.py)
Read-only by structural guarantee — never registers AXL write methods
(no executeSQLUpdate, no add*/update*/remove*/apply*/reset*/restart*
tools). SQL queries also client-side validated (sql_validator.py) to
begin with SELECT or WITH.
Tools exposed:
Foundational: axl_version, axl_sql, axl_list_tables,
axl_describe_table, cache_stats, cache_clear
Route plan: route_partitions, route_calling_search_spaces,
route_patterns, route_inspect_pattern,
route_lists_and_groups, route_translation_chain,
route_digit_discard_instructions
Prompts (schema-grounded):
route_plan_overview, investigate_pattern, audit_routing,
cucm_sql_help
Tests cover cache, docs_loader, normalize, sql_validator, wildcard.