route_plan: add cti_failsafe_reachability tool

Closes the bug class cucx-docs flagged at Bingham — a CTI Route
Point's CFNA destination points at a number that is structurally
unreachable from the configured CFNA-CSS, so the failsafe forward
fires but finds no matching pattern and the call dies. Invisible
from any single-record inspection (CTI RP record looks fine,
destination pattern exists in some partition, CSS is fine — defect
lives in the relationship between CFNA-CSS and destination's
partition).

The motivating Bingham finding (life-safety severity):

  912-CTI-RP (Secondary CER) CFNA + CFUR → "10911" via 911CER-CSS
  Pattern "10.911" exists in CER911-PT
  911CER-CSS does NOT contain CER911-PT
  → failsafe is structurally broken; both CER servers down would
    produce fast-busy on 911 calls instead of routing through ELIN-10
    to the PSAP

Implementation per axl/agent-threads/cti-audit-prompts/002:

  - Tool, not prompt — output is structured + deterministic; same
    shape as route_patterns_targeting (Q1 confirmed as proposed)
  - Three-tier severity: HIGH for life-safety descriptions, MEDIUM
    for non-life-safety, no LOW (Q2 refined from cucx-docs's
    binary proposal — every broken forward is a real bug, just not
    all are 911)
  - Scope: CFNA + CFUR only for v1; CFB excluded by design (Q3
    confirmed — CTI RPs rarely go busy)
  - Lives in route_plan.py alongside route_patterns_targeting +
    device_grep + translation_chain (Q5 — defer cti.py namespace
    until adjacent prompts land)
  - Named cti_failsafe_reachability not _audit (Q4 — drops the
    _audit suffix per the established tool-vs-prompt naming split;
    tools use direct-action names, prompts use _audit)

Life-safety token list (case-insensitive substring match against
name AND description):

  ("emergency", "911", "cer", "psap", "panic", "alert")

Suggested-fix message names the partition where the destination's
pattern lives and proposes either "add partition X to CSS Y" or
"change CSS to a CSS containing partition X." Falls back to a
generic "manual investigation needed" message when the destination
matches no exact-literal pattern in any partition (often means a
wildcard pattern is the actual target).

Tests: 26 in TestLifeSafetyDetection + TestCtiFailsafeReachability:

  - 16 token-matching cases (10 positive, 4 negative, 2 sentinel)
  - 10 tool-level cases including the canonical Bingham bug
    reproduced verbatim (assertion compares the entire finding dict
    to the expected output from cucx-docs's 001 message)

Full mcaxl suite: 238 → 264 passing (+26 from this work).

Adjacent prompts cucx-docs flagged as lower-priority follow-ups
(cti_route_point_audit, cti_port_pool_audit,
cti_application_user_audit) deferred but tracked.
This commit is contained in:
Ryan Malloy 2026-05-09 03:28:49 -06:00
parent 91bd3a0705
commit d33cd7c809
3 changed files with 528 additions and 0 deletions

View File

@ -587,6 +587,196 @@ def _expand_charclass(spec: str) -> list[str]:
return sorted(chars) return sorted(chars)
# Description / name substrings that signal a CTI Route Point is on a
# life-safety code path. Match is case-insensitive substring against
# `description` AND `name` — covers naming conventions like `911-CTI-RP`
# and descriptions like "CER Primary Failover." Audit teams can extend
# this list site-locally if their deployment uses other vocabulary
# (e.g., "BLUE-ALERT", "CODE-GRAY") for the same role.
_LIFE_SAFETY_TOKENS: tuple[str, ...] = (
"emergency", "911", "cer", "psap", "panic", "alert",
)
def _is_life_safety_cti(name: str | None, description: str | None) -> bool:
haystack = " ".join([(name or ""), (description or "")]).lower()
return any(tok in haystack for tok in _LIFE_SAFETY_TOKENS)
def cti_failsafe_reachability(client: "AxlClient") -> dict:
"""Find CTI Route Points whose CFNA or CFUR forward destination is
unreachable from the configured forward CSS a defect class
invisible from any single-record inspection.
The bug shape: a CTI RP has a CFNA destination string that LOOKS
valid (and IS, in some other partition), and a CFNA-CSS that LOOKS
valid, but the CSS doesn't reach the partition where the destination's
matching pattern lives. The forward fires, finds nothing, and the
call dies with fast-busy or unreachable-destination tone.
Catching this requires cross-referencing CFNA-destination +
CFNA-CSS + reachable-partitions + matching-pattern. This tool
mechanizes that cross-reference for every CTI RP in the cluster.
Scope (v1): CFNA + CFUR forwards only. CFB (Call Forward Busy) is
excluded by design CTI RPs rarely go busy in the operator sense,
so the failsafe-relevant forwards are CFNA + CFUR. If CFB findings
matter on a specific deployment, the join shape is identical and
extending is mechanical; for now, scope discipline.
Returns:
``{total_cti_route_points, checked, broken_cfna, broken_cfur,
findings: [{device, description, forward_kind, destination, css,
match_count: 0, severity, reachable_partitions_in_css,
suggested_fix}, ...]}``
One ``findings`` entry per broken forward (not per device) flatter
output is easier to sort + filter for operator tooling. A device
with both CFNA and CFUR broken produces two entries.
Severity classification:
- ``HIGH`` description or name matches a life-safety token
(see ``_LIFE_SAFETY_TOKENS``) AND the forward is broken
- ``MEDIUM`` non-life-safety CTI RP with broken forward; still
a real bug, just not 911
Working CFNAs/CFURs are not reported. Output focuses on broken
forwards only.
Source observation: cucx-docs found a HIGH-severity case at Bingham
where ``912-CTI-RP`` (Secondary CER) had CFNA + CFUR pointed at
``10911`` with CFNA-CSS = ``911CER-CSS``. The pattern ``10.911``
exists in ``CER911-PT``, but ``911CER-CSS`` doesn't contain that
partition so the failsafe was structurally broken. See
``axl/agent-threads/cti-audit-prompts/001`` for the full setup.
"""
sql = """
SELECT
d.name,
d.description,
n.cfnadestination,
n.cfurdestination,
css1.name AS cfna_css_name,
css2.name AS cfur_css_name
FROM device d
JOIN typeclass tc ON d.tkclass = tc.enum
LEFT OUTER JOIN devicenumplanmap m ON m.fkdevice = d.pkid
LEFT OUTER JOIN numplan n ON m.fknumplan = n.pkid
LEFT OUTER JOIN callingsearchspace css1
ON n.fkcallingsearchspace_cfna = css1.pkid
LEFT OUTER JOIN callingsearchspace css2
ON n.fkcallingsearchspace_cfur = css2.pkid
WHERE tc.name = 'CTI Route Point'
AND (n.cfnadestination IS NOT NULL OR n.cfurdestination IS NOT NULL)
ORDER BY d.name
"""
result = client.execute_sql_query(sql)
rows = result["rows"]
total_cti_rps = len(rows)
findings: list[dict] = []
broken_cfna = 0
broken_cfur = 0
for row in rows:
name = row.get("name")
description = row.get("description")
is_life_safety = _is_life_safety_cti(name, description)
for forward_kind in ("cfna", "cfur"):
dest = row.get(f"{forward_kind}destination")
css = row.get(f"{forward_kind}_css_name")
if not dest or not css:
# No forward configured for this kind; not a defect
continue
chain = translation_chain(client, number=dest, css_name=css)
if chain["match_count"] > 0:
continue # working forward; no finding
if forward_kind == "cfna":
broken_cfna += 1
else:
broken_cfur += 1
findings.append({
"device": name,
"description": description,
"forward_kind": forward_kind,
"destination": dest,
"css": css,
"match_count": 0,
"severity": "HIGH" if is_life_safety else "MEDIUM",
"suggested_fix": _suggest_failsafe_fix(client, dest, css),
})
return {
"total_cti_route_points": total_cti_rps,
"checked": total_cti_rps,
"broken_cfna": broken_cfna,
"broken_cfur": broken_cfur,
"findings": findings,
"_note": (
"Scope: CFNA + CFUR only. CFB (busy-forward) excluded by "
"design — CTI RPs rarely go busy. Severity HIGH when name "
"or description contains any life-safety token "
f"({', '.join(_LIFE_SAFETY_TOKENS)})."
),
}
def _suggest_failsafe_fix(client: "AxlClient", dest: str, broken_css: str) -> str:
"""Produce a fixed-template fix suggestion for a broken CFNA/CFUR.
Looks up which partition(s) hold a matching pattern for ``dest``,
then suggests either adding that partition to the broken CSS or
switching the CSS to one that includes it.
Falls back to a generic message if the destination matches no
pattern in any partition (rarer; usually means the destination is
a literal extension that was deleted).
"""
safe_dest = _esc(dest)
sql = f"""
SELECT DISTINCT rp.name AS partition
FROM numplan np
LEFT OUTER JOIN routepartition rp ON np.fkroutepartition = rp.pkid
WHERE np.dnorpattern = '{safe_dest}'
AND rp.name IS NOT NULL
"""
try:
result = client.execute_sql_query(sql)
except Exception:
return (
f"Destination {dest!r} is unreachable from CSS {broken_css!r}. "
"Manual investigation needed to identify the correct partition."
)
partitions = [r["partition"] for r in result["rows"] if r.get("partition")]
if not partitions:
return (
f"Destination {dest!r} matches no exact-literal pattern in any "
f"partition. Either the destination string is wrong or it "
f"matches a wildcard pattern (use route_translation_chain to "
f"investigate further)."
)
if len(partitions) == 1:
part = partitions[0]
return (
f"Pattern {dest!r} lives in partition {part!r}. Either add "
f"{part!r} to CSS {broken_css!r}, OR change the forward CSS "
f"to a CSS that already contains {part!r}."
)
return (
f"Pattern {dest!r} exists in multiple partitions ({', '.join(partitions)}). "
f"Identify the intended target partition, then either add it to "
f"CSS {broken_css!r} or change the forward CSS accordingly."
)
def list_route_lists_and_groups(client: "AxlClient", name: str | None = None) -> dict: def list_route_lists_and_groups(client: "AxlClient", name: str | None = None) -> dict:
"""Route lists with their ordered route groups and member gateways/trunks. """Route lists with their ordered route groups and member gateways/trunks.

View File

@ -239,6 +239,29 @@ def route_inspect_pattern(pattern: str, partition: str | None = None) -> dict:
return route_plan.inspect_pattern(_client(), pattern, partition) return route_plan.inspect_pattern(_client(), pattern, partition)
@mcp.tool
def cti_failsafe_reachability() -> dict:
"""Find CTI Route Points whose CFNA or CFUR forward destination is
unreachable from the configured forward CSS a defect class
invisible from any single-record inspection.
The bug shape: a CTI RP has a CFNA destination string that LOOKS
valid (and IS, in some other partition), and a CFNA-CSS that LOOKS
valid, but the CSS doesn't reach the partition where the
destination's matching pattern lives. The forward fires, finds
nothing, and the call dies with fast-busy.
Severity is HIGH for CTI RPs whose name or description contains a
life-safety token (911, emergency, CER, PSAP, panic, alert);
MEDIUM otherwise. Working forwards are not reported.
Scope (v1): CFNA + CFUR only. CFB excluded by design CTI RPs
rarely go busy. See `axl/agent-threads/cti-audit-prompts/` for the
motivating bug observation and architectural decisions.
"""
return route_plan.cti_failsafe_reachability(_client())
@mcp.tool @mcp.tool
def device_grep( def device_grep(
pattern: str, pattern: str,

View File

@ -0,0 +1,315 @@
"""Tests for cti_failsafe_reachability — find broken CFNA/CFUR forwards.
Source: cucx-docs handoff at
``axl/agent-threads/cti-audit-prompts/001-cucx-cfna-reachability-audit.md``
documenting a real life-safety bug at Bingham (912-CTI-RP CFNA
'10911' under 911CER-CSS, where '10.911' lives in CER911-PT which
911CER-CSS doesn't reach).
The tool composes three SQL queries per broken forward:
1. Top-level forwards SQL (fetch CTI RPs with CFNA/CFUR set)
2. translation_chain's SQL (per-forward reachability check)
3. _suggest_failsafe_fix's partition-lookup SQL (one per finding)
The FakeAxlClient dispatches by query content rather than sequence
because the order of (2) and (3) interleaves across multiple findings.
"""
import pytest
from mcaxl.route_plan import (
_LIFE_SAFETY_TOKENS,
_is_life_safety_cti,
cti_failsafe_reachability,
)
class FakeAxlClient:
"""Dispatching fake — returns canned responses keyed on SQL content.
Constructor takes:
- cti_rp_rows: rows for the top-level "find CTI RPs with forwards" query
- reachable_destinations: set of (destination, css) pairs that have a
matching pattern (translation_chain returns match_count > 0 for these)
- destination_partitions: dict {destination: [partition_name, ...]}
used by the _suggest_failsafe_fix's partition-lookup query
"""
def __init__(
self,
cti_rp_rows: list[dict],
reachable_destinations: set[tuple[str, str]] | None = None,
destination_partitions: dict[str, list[str]] | None = None,
):
self._cti_rows = cti_rp_rows
self._reachable = reachable_destinations or set()
self._dest_partitions = destination_partitions or {}
self.queries: list[str] = []
def execute_sql_query(self, sql: str) -> dict:
self.queries.append(sql)
# Dispatch 1: top-level "find CTI RPs with CFNA/CFUR" query
if "tc.name = 'CTI Route Point'" in sql and "cfnadestination" in sql:
return {"row_count": len(self._cti_rows), "rows": self._cti_rows}
# Dispatch 2: translation_chain's reachability check
# Recognizable by `tkpatternusage IN (3, 5, 7)` from route_plan.py
if "tkpatternusage IN (3, 5, 7)" in sql:
# Extract the destination + CSS from the SQL to figure out
# whether to return a "match" row or no rows. The destination
# appears in the called-side filter; the CSS appears in the
# callingsearchspace WHERE clause.
#
# Simplest dispatch: scan the query for the (dest, css) pairs
# we know are reachable. If any match, return a fake matching
# pattern row.
for dest, css in self._reachable:
if f"name = '{css}'" in sql:
# For each reachable destination, the test fake returns
# a single pattern that exactly equals the destination
# so translation_chain's wildcard matcher resolves it.
return {
"row_count": 1,
"rows": [{
"pattern": dest,
"pattern_type": "Translation",
"partition_name": "Reachable-PT",
"calling_party_xform_mask": None,
"called_party_xform_mask": None,
"prefix_digits_out": None,
"digit_discard_instructions": None,
"route_filter": None,
"description": "fake-reachable",
}],
}
return {"row_count": 0, "rows": []}
# Dispatch 3: _suggest_failsafe_fix's partition-lookup query
if "rp.name IS NOT NULL" in sql and "np.dnorpattern" in sql:
# Extract the dnorpattern literal from the SQL
for dest, parts in self._dest_partitions.items():
if f"np.dnorpattern = '{dest}'" in sql:
rows = [{"partition": p} for p in parts]
return {"row_count": len(rows), "rows": rows}
return {"row_count": 0, "rows": []}
# Anything else — empty (unexpected query path; fail loud later)
return {"row_count": 0, "rows": []}
def _cti_row(name, description, cfna=None, cfur=None, cfna_css=None, cfur_css=None):
return {
"name": name,
"description": description,
"cfnadestination": cfna,
"cfurdestination": cfur,
"cfna_css_name": cfna_css,
"cfur_css_name": cfur_css,
}
# ─── Life-safety token detection (helper in isolation) ────────────────
class TestLifeSafetyDetection:
@pytest.mark.parametrize("description", [
"Primary CER Server",
"911 CTI Route Point",
"Emergency CER",
"PSAP gateway",
"PANIC button receiver",
"Code BLUE Alert",
])
def test_life_safety_tokens_match(self, description):
assert _is_life_safety_cti("some-name", description) is True
@pytest.mark.parametrize("name", [
"911-CTI-RP",
"EMERGENCY-RP",
"CER-Primary",
"psap-gateway",
])
def test_token_matched_in_name_field(self, name):
# Tokens match against name OR description — some clusters tag
# the role in the name field rather than the description
assert _is_life_safety_cti(name, "Generic CTI Route Point") is True
@pytest.mark.parametrize("description", [
"Patient Intake CTI Route Point",
"Voicemail Pilot",
"Receptionist Hunt Pilot",
"Generic application route point",
])
def test_non_life_safety_descriptions(self, description):
assert _is_life_safety_cti("regular-rp", description) is False
def test_null_name_and_description_does_not_match(self):
assert _is_life_safety_cti(None, None) is False
assert _is_life_safety_cti("", "") is False
def test_advertised_token_list_is_what_we_implement(self):
# If the token list grows or shrinks, the docstring + agent-thread
# reply must be updated alongside. Catches accidental drift.
assert _LIFE_SAFETY_TOKENS == (
"emergency", "911", "cer", "psap", "panic", "alert",
)
# ─── Tool-level integration ──────────────────────────────────────────
class TestCtiFailsafeReachability:
def test_no_cti_route_points_returns_empty_findings(self):
client = FakeAxlClient(cti_rp_rows=[])
result = cti_failsafe_reachability(client)
assert result["total_cti_route_points"] == 0
assert result["broken_cfna"] == 0
assert result["broken_cfur"] == 0
assert result["findings"] == []
def test_working_cfna_produces_no_finding(self):
client = FakeAxlClient(
cti_rp_rows=[
_cti_row("Working-RP", "Patient intake", cfna="5550100", cfna_css="Internal-CSS"),
],
reachable_destinations={("5550100", "Internal-CSS")},
)
result = cti_failsafe_reachability(client)
assert result["broken_cfna"] == 0
assert result["findings"] == []
def test_broken_cfna_non_life_safety_is_medium(self):
client = FakeAxlClient(
cti_rp_rows=[
_cti_row("Generic-RP", "Patient intake", cfna="5550100", cfna_css="BadCSS"),
],
reachable_destinations=set(), # nothing reachable
destination_partitions={"5550100": ["Internal-PT"]},
)
result = cti_failsafe_reachability(client)
assert result["broken_cfna"] == 1
assert len(result["findings"]) == 1
finding = result["findings"][0]
assert finding["device"] == "Generic-RP"
assert finding["forward_kind"] == "cfna"
assert finding["destination"] == "5550100"
assert finding["css"] == "BadCSS"
assert finding["match_count"] == 0
assert finding["severity"] == "MEDIUM"
assert "Internal-PT" in finding["suggested_fix"]
assert "BadCSS" in finding["suggested_fix"]
def test_broken_cfna_life_safety_is_high(self):
client = FakeAxlClient(
cti_rp_rows=[
_cti_row("911-CTI-RP", "Emergency dispatch", cfna="10911", cfna_css="911CER-CSS"),
],
destination_partitions={"10911": ["CER911-PT"]},
)
result = cti_failsafe_reachability(client)
assert result["findings"][0]["severity"] == "HIGH"
def test_broken_cfna_and_cfur_produce_two_findings(self):
# Same device with both forwards broken — should produce TWO entries
# (per-forward, not per-device, per the design decision)
client = FakeAxlClient(
cti_rp_rows=[
_cti_row(
"912-CTI-RP", "CTI RP for Secondary CER Server",
cfna="10911", cfna_css="911CER-CSS",
cfur="10911", cfur_css="911CER-CSS",
),
],
destination_partitions={"10911": ["CER911-PT"]},
)
result = cti_failsafe_reachability(client)
assert result["broken_cfna"] == 1
assert result["broken_cfur"] == 1
assert len(result["findings"]) == 2
kinds = {f["forward_kind"] for f in result["findings"]}
assert kinds == {"cfna", "cfur"}
# Both should be HIGH (description contains "CER")
assert all(f["severity"] == "HIGH" for f in result["findings"])
def test_only_cfna_set_does_not_check_cfur(self):
# CFUR null → don't check it (not a finding)
client = FakeAxlClient(
cti_rp_rows=[
_cti_row("Half-RP", "Generic", cfna="9999", cfna_css="BadCSS"),
],
destination_partitions={"9999": ["Some-PT"]},
)
result = cti_failsafe_reachability(client)
assert result["broken_cfna"] == 1
assert result["broken_cfur"] == 0
def test_canonical_bingham_bug_reproduced(self):
"""The canary scenario from cucx-docs's 001 — verifies the tool
produces exactly the expected output for the motivating bug."""
client = FakeAxlClient(
cti_rp_rows=[
_cti_row(
"912-CTI-RP", "CTI RP for Secondary CER Server",
cfna="10911", cfna_css="911CER-CSS",
cfur="10911", cfur_css="911CER-CSS",
),
],
destination_partitions={"10911": ["CER911-PT"]},
)
result = cti_failsafe_reachability(client)
cfna_finding = next(f for f in result["findings"] if f["forward_kind"] == "cfna")
assert cfna_finding == {
"device": "912-CTI-RP",
"description": "CTI RP for Secondary CER Server",
"forward_kind": "cfna",
"destination": "10911",
"css": "911CER-CSS",
"match_count": 0,
"severity": "HIGH", # description contains "CER"
"suggested_fix": (
"Pattern '10911' lives in partition 'CER911-PT'. "
"Either add 'CER911-PT' to CSS '911CER-CSS', "
"OR change the forward CSS to a CSS that already "
"contains 'CER911-PT'."
),
}
def test_suggested_fix_when_no_partition_holds_destination(self):
# Edge case: destination doesn't match any literal pattern
# (might match a wildcard, but not an exact-literal). Suggest_fix
# falls back to a generic message.
client = FakeAxlClient(
cti_rp_rows=[
_cti_row("Wild-RP", "Generic", cfna="orphan-dest", cfna_css="BadCSS"),
],
destination_partitions={}, # no partition holds 'orphan-dest'
)
result = cti_failsafe_reachability(client)
fix = result["findings"][0]["suggested_fix"]
assert "matches no exact-literal pattern" in fix
assert "wildcard" in fix.lower()
def test_suggested_fix_when_destination_in_multiple_partitions(self):
# Edge case: destination matches in multiple partitions; the
# fix message lists them and asks the operator to pick.
client = FakeAxlClient(
cti_rp_rows=[
_cti_row("Multi-RP", "Generic", cfna="5555", cfna_css="BadCSS"),
],
destination_partitions={"5555": ["Site-A-PT", "Site-B-PT"]},
)
result = cti_failsafe_reachability(client)
fix = result["findings"][0]["suggested_fix"]
assert "multiple partitions" in fix
assert "Site-A-PT" in fix
assert "Site-B-PT" in fix
def test_response_includes_scope_note(self):
client = FakeAxlClient(cti_rp_rows=[])
result = cti_failsafe_reachability(client)
assert "_note" in result
# Scope discipline visible at the call site — CFB exclusion is
# documented, and the life-safety token list is named.
assert "CFB" in result["_note"]
assert "emergency" in result["_note"]