Configuration control — Telegram bot migration, stuck loop fix, tool stats cleanup

Summary

Seven changes to the harness infrastructure: migrated the Telegram bot into the harness codebase, fixed a stuck decompose loop that ran 15 redundant sessions, corrected the post-session hook execution order, hardened the tool call statistics parser, added duplicate prevention to the decompose flow, disabled the legacy QC bot, and added a watchdog to detect stuck loops.

Changes

1. Telegram bot migration

The operator Telegram bot was migrated from /root/uht-journal/telegram-bot/bot.js (legacy) into the harness as a TypeScript module at src/telegram/. New CLI command: node dist/src/index.js telegram-bot <project.yaml>.

13 SE loop commands kept: /status, /systems, /reqs, /decompose, /switch, /cleanup, /cleanup-confirm, /qc-now, /lint, /orphans, /pause, /resume, /run-now. 8 QC bot commands dropped (unused). Service references updated from uht-loop.service to claude-harness.service.

Architecture: HarnessBot class with FactsClient (typed wrapper for Substrate facts with UUID-based operations) and four command modules (status, control, maintenance, help). Deployed as claude-harness-telegram.service with Restart=on-failure. The legacy uht-telegram-bot.service was stopped and disabled.

The old bot’s /run-now command was starting uht-loop.service (the retired dispatcher), which caused session 348 to run outside the harness. This is now fixed.

2. Stuck decompose loop (sessions 360–374)

The autonomous loop ran 15 consecutive decompose/QC sessions on se-surgical-robot after first-pass completion, never transitioning to the full QC flow. Claude declared “first-pass complete” in journal titles from session 360 onward but the state machine stayed in in-progress.

Root cause: The decompose flow instructed Claude to set DECOMPOSITION_STATUS (on the project subject se-surgical-robot), but the flow engine reads FLOW_STATE (on subject autonomous-loop). Since FLOW_STATE was already set to in-progress, the backwards-compatibility fallback to DECOMPOSITION_STATUS never triggered.

Fix (three-part):

  • Decompose flow now instructs Claude to set FLOW_STATE directly, with a bold warning that the loop runs forever without it
  • Flow engine now checks DECOMPOSITION_STATUS as a fallback signal after each session
  • FLOW_STATE manually set to first-pass-complete to unstick the current project

3. Post-session hook execution order

The fix-mermaid hook was running after the Astro build, so it fixed the source .md file but the compiled HTML was already deployed from the unfixed version.

Fix: Pipeline reordered from publish → build → deploy → hooks to publish → hooks → build → deploy. The fix-mermaid script now repairs mermaid syntax before Astro compiles the HTML.

4. Hook template variable syntax

The ${JOURNAL_FILE} and ${SESSION_N} hook args were being replaced with empty strings at YAML config load time by the env var interpolator, because those aren’t environment variables — they’re runtime template variables.

Fix: Changed hook arg syntax from ${VAR} to {{VAR}} (matching the existing flow template convention). The flow engine now replaces {{SESSION_N}} and {{JOURNAL_FILE}} at hook execution time. This was the root cause of fix-mermaid and surrealdb-ingest hooks failing since session 341.

5. Tool call statistics parser improvements

The extractCliCommand function was misclassifying most tool calls because Claude’s Bash commands start with comment lines (# Check existing requirements) or standalone variable assignments (TENANT=uht-bot). Fixed the parser to skip # comment lines and standalone VAR=value/export VAR=value lines before identifying the actual command.

6. Duplicate requirement prevention

Investigation revealed three causes of duplicate requirements: no idempotency keys used (0/291 reqs), 31 unassigned REQ-SESURGICALROBOT-* refs from missing --section/--document flags, and repeated ARC entries for the same subsystem across sessions.

Fixes to decompose flow:

  • Added mandatory flags instruction: every airgen reqs create must include --document, --section, and --idempotency-key
  • ARC creation now checks for existing ARC by subsystem tag before creating
  • Added --idempotency-key to the VER template which was missing it

7. Watchdog for stuck loops

The flow engine now counts consecutive sessions with no state transition (WATCHDOG_SAME_STATE fact). After 5 consecutive same-state sessions, a Telegram alert fires: “Watchdog: N consecutive sessions in state X with flow Y — possible stuck loop.” Counter resets on any state transition.

8. Legacy service cleanup

ServiceStatus
uht-telegram-bot.serviceStopped, disabled
uht-loop.timerDisabled (was accidentally starting old dispatcher)
uht-qc.timerDisabled (QC bot paused)
claude-harness.timerActive (hourly)
claude-harness-telegram.serviceActive (polling)

Version manifest

ComponentBeforeAfter
Telegram botLegacy JS at /opt/uht-loop/TypeScript in harness, claude-harness-telegram.service
Pipeline orderpublish → build → deploy → hookspublish → hooks → build → deploy
Hook templates${VAR} (broken by env interpolator){{VAR}} (runtime replacement)
State transitionDECOMPOSITION_STATUS onlyFLOW_STATE primary + DECOMPOSITION_STATUS fallback
WatchdogNone5-session threshold with Telegram alert
Tool stats parserNoisy (comments, var assignments)Skips # lines and standalone assignments
Duplicate preventionNoneMandatory flags + ARC existence check + idempotency keys
Test count128136
Source modules2733 (+telegram bot: 6 files)
← all entries