Configuration control — Telegram bot migration, stuck loop fix, tool stats cleanup
Summary
Seven changes to the harness infrastructure: migrated the Telegram bot into the harness codebase, fixed a stuck decompose loop that ran 15 redundant sessions, corrected the post-session hook execution order, hardened the tool call statistics parser, added duplicate prevention to the decompose flow, disabled the legacy QC bot, and added a watchdog to detect stuck loops.
Changes
1. Telegram bot migration
The operator Telegram bot was migrated from /root/uht-journal/telegram-bot/bot.js (legacy) into the harness as a TypeScript module at src/telegram/. New CLI command: node dist/src/index.js telegram-bot <project.yaml>.
13 SE loop commands kept: /status, /systems, /reqs, /decompose, /switch, /cleanup, /cleanup-confirm, /qc-now, /lint, /orphans, /pause, /resume, /run-now. 8 QC bot commands dropped (unused). Service references updated from uht-loop.service to claude-harness.service.
Architecture: HarnessBot class with FactsClient (typed wrapper for Substrate facts with UUID-based operations) and four command modules (status, control, maintenance, help). Deployed as claude-harness-telegram.service with Restart=on-failure. The legacy uht-telegram-bot.service was stopped and disabled.
The old bot’s /run-now command was starting uht-loop.service (the retired dispatcher), which caused session 348 to run outside the harness. This is now fixed.
2. Stuck decompose loop (sessions 360–374)
The autonomous loop ran 15 consecutive decompose/QC sessions on se-surgical-robot after first-pass completion, never transitioning to the full QC flow. Claude declared “first-pass complete” in journal titles from session 360 onward but the state machine stayed in in-progress.
Root cause: The decompose flow instructed Claude to set DECOMPOSITION_STATUS (on the project subject se-surgical-robot), but the flow engine reads FLOW_STATE (on subject autonomous-loop). Since FLOW_STATE was already set to in-progress, the backwards-compatibility fallback to DECOMPOSITION_STATUS never triggered.
Fix (three-part):
- Decompose flow now instructs Claude to set
FLOW_STATEdirectly, with a bold warning that the loop runs forever without it - Flow engine now checks
DECOMPOSITION_STATUSas a fallback signal after each session FLOW_STATEmanually set tofirst-pass-completeto unstick the current project
3. Post-session hook execution order
The fix-mermaid hook was running after the Astro build, so it fixed the source .md file but the compiled HTML was already deployed from the unfixed version.
Fix: Pipeline reordered from publish → build → deploy → hooks to publish → hooks → build → deploy. The fix-mermaid script now repairs mermaid syntax before Astro compiles the HTML.
4. Hook template variable syntax
The ${JOURNAL_FILE} and ${SESSION_N} hook args were being replaced with empty strings at YAML config load time by the env var interpolator, because those aren’t environment variables — they’re runtime template variables.
Fix: Changed hook arg syntax from ${VAR} to {{VAR}} (matching the existing flow template convention). The flow engine now replaces {{SESSION_N}} and {{JOURNAL_FILE}} at hook execution time. This was the root cause of fix-mermaid and surrealdb-ingest hooks failing since session 341.
5. Tool call statistics parser improvements
The extractCliCommand function was misclassifying most tool calls because Claude’s Bash commands start with comment lines (# Check existing requirements) or standalone variable assignments (TENANT=uht-bot). Fixed the parser to skip # comment lines and standalone VAR=value/export VAR=value lines before identifying the actual command.
6. Duplicate requirement prevention
Investigation revealed three causes of duplicate requirements: no idempotency keys used (0/291 reqs), 31 unassigned REQ-SESURGICALROBOT-* refs from missing --section/--document flags, and repeated ARC entries for the same subsystem across sessions.
Fixes to decompose flow:
- Added mandatory flags instruction: every
airgen reqs createmust include--document,--section, and--idempotency-key - ARC creation now checks for existing ARC by subsystem tag before creating
- Added
--idempotency-keyto the VER template which was missing it
7. Watchdog for stuck loops
The flow engine now counts consecutive sessions with no state transition (WATCHDOG_SAME_STATE fact). After 5 consecutive same-state sessions, a Telegram alert fires: “Watchdog: N consecutive sessions in state X with flow Y — possible stuck loop.” Counter resets on any state transition.
8. Legacy service cleanup
| Service | Status |
|---|---|
uht-telegram-bot.service | Stopped, disabled |
uht-loop.timer | Disabled (was accidentally starting old dispatcher) |
uht-qc.timer | Disabled (QC bot paused) |
claude-harness.timer | Active (hourly) |
claude-harness-telegram.service | Active (polling) |
Version manifest
| Component | Before | After |
|---|---|---|
| Telegram bot | Legacy JS at /opt/uht-loop/ | TypeScript in harness, claude-harness-telegram.service |
| Pipeline order | publish → build → deploy → hooks | publish → hooks → build → deploy |
| Hook templates | ${VAR} (broken by env interpolator) | {{VAR}} (runtime replacement) |
| State transition | DECOMPOSITION_STATUS only | FLOW_STATE primary + DECOMPOSITION_STATUS fallback |
| Watchdog | None | 5-session threshold with Telegram alert |
| Tool stats parser | Noisy (comments, var assignments) | Skips # lines and standalone assignments |
| Duplicate prevention | None | Mandatory flags + ARC existence check + idempotency keys |
| Test count | 128 | 136 |
| Source modules | 27 | 33 (+telegram bot: 6 files) |