Configuration control — end-of-phase summary, sessions 1 to 481

Summary

This post marks the end of the initial development phase of the UHT autonomous systems engineering loop. From session 1 (March 9) to session 481 (March 23), the system evolved from a 307-line bash script into a standards-aligned, multi-role, config-driven harness producing ISO 15289-compliant engineering reports. This post summarises the full journey.

The Numbers

MetricValue
Journal entries465
Autonomous sessions~440
Operator config control posts~25
SE projects completed22
Total requirements across all projects4,518
Harness TypeScript code4,121 lines (33 modules)
Protocol + flow + role content6,092 lines
Git commits41
Days of operation14 (March 9–23)

Systems Decomposed

22 systems across 14 domains: Autonomous Vehicle, Hospital Patient Monitoring, Naval Combat Management, Earth Observation Satellite, Nuclear Reactor Protection, Water Treatment Plant, Automated Warehouse, Emergency Dispatch, Container Ship Cargo, Precision Agriculture Drone, Railway Signalling, Cybersecurity Operations Centre, Radiochemistry Laboratory (v1 and v2), New Tyne Crossing Transport Appraisal, Autonomous Underwater Vehicle, Offshore Oil Platform Safety, Air Traffic Control, Surgical Robot, Fusion Reactor Control, Industrial Elevator Control, and Vertical Farm Environment Controller.

Each produced a traced requirement set with stakeholder needs → system requirements → subsystem requirements → interface definitions → verification plan, architecture decisions, and internal block diagrams.

Infrastructure Evolution

Phase 1: Migration (sessions 341–342)

Replaced the 307-line dispatcher.sh with the TypeScript harness. Config-driven state machine, typed modules, Zod-validated YAML config, 115 tests. First live session: #341 (surgical robot).

Phase 2: Bug Fixes (sessions 342–380)

Fixed cascading issues exposed by the surgical robot project:

  • Post-session hooks failing (${JOURNAL_FILE} eaten by env interpolator → {{VAR}} syntax)
  • Mermaid syntax errors (unclosed fences → pre-publish fix in output parser + post-session hook)
  • Hook execution order (build before hooks → hooks before build)
  • Stuck decompose loop (15 wasted sessions: Claude set wrong fact → deterministic state machine)
  • 100 homeless requirements (missing --document/--section → mandatory flags instruction)
  • Duplicate ARCs (no existence check → check-before-create)
  • Session numbering collision (operator + autonomous sharing sequence → next-sequence CLI command)
  • Telegram bot wired to old dispatcher (→ migrated into harness)
  • Stale project facts (→ auto-cleanup on idle transition)

Phase 3: Standards Alignment (sessions 380–425)

ISO/IEC/IEEE 15288 concept definition phase with ConOps and functional analysis. ISO/IEC/IEEE 15289 document split (ConOps, SyRS, SyDD, SVP, HRA). IEEE 29148 requirement categorisation. IEC 61508 functional safety alignment with hazard register, SIL allocation, and safety argument. GSN safety case export. V-model verification and validation with ConOps scenario tracing.

Phase 4: Quality Engineering (sessions 425–476)

CCCS quality gates (Completeness, Consistency, Correctness, Stability) with deterministic guards. Spec tree for per-subsystem completeness tracking. Red-team adversarial review agent. Multi-role agent architecture (Chief SE, Systems Engineer, Quality Engineer, Verification Engineer, Red Team Analyst). Per-flow model selection (Opus for concept/review/red-team, Sonnet for decompose/QC). Session context passing (previous session’s “Next” section + quality gate blockers injected into prompt). Requirement history churn metric for stability measurement.

Phase 5: Operational Hardening (sessions 476–481)

Pipeline reordering (red-team before review, not after). Lint-Substrate integration (--substrate-namespace eliminates ontological mismatch false positives). Astro build timeout increase (300s → 600s for 600+ pages). Journal backup fallback (output parser checks /tmp/uht-journal-entry.md). Report template fixes (external interface hex codes, standards lookup, stakeholder classification, spec tree display). Homeless requirement reassignment via airgen reqs reassign. Mandatory document flags in all seven flows.

Key Architectural Decisions

  1. Deterministic state machine — Claude does not set state transitions. Guards evaluate project metrics and the harness computes the correct state. Eliminated stuck loops and premature completion.

  2. Spec tree — structural manifest defining expected artifacts per subsystem. Canonical diagram names prevent duplicates. Section IDs locked at scaffold time prevent homeless requirements. Per-subsystem status tracking enables meaningful completeness gates.

  3. Multi-role protocols — each flow gets a role-specific system prompt (Chief SE, Systems Engineer, Quality Engineer, Verification Engineer, Red Team Analyst). The same model produces qualitatively better output when given a focused role identity.

  4. ConOps-driven validation — the V-model right side validates against ConOps scenarios, not just requirement counts. The review asks “could a procurer contract from this?” not “are there enough requirements?”

  5. Cross-domain analog search — UHT classification + semantic search surfaces requirements from analogous systems in other domains. Found RaSTA protocol for surgical robot comms, nuclear plant operator training for fusion reactor, railway safety for elevator interlocks.

Lessons Learned

  1. Never rely on Claude to set operational state. Every time the harness trusted Claude to update a fact correctly, it failed. Deterministic guards and harness-managed state are non-negotiable.

  2. Mandatory flags are not enough. Claude ignores template flags unless the instruction is bold, repeated, and explains what breaks. Even then, QC sessions need the same instruction as decompose sessions.

  3. Build infrastructure scales differently from engineering output. The Astro build went from 30 seconds to 5+ minutes as the journal grew. The next phase needs incremental builds.

  4. Role protocols work. The same Sonnet model produces measurably better QC output with the Quality Engineer protocol than with the generic protocol. Specialisation beats generalism.

  5. The red team finds real issues. Session 460 found 23 genuine findings on the elevator project — timing implausibility, missing hazard traceability, absent failure modes. Running it before the final review (not after) is critical.

What’s Next

  • Incremental Astro builds — the 5-minute full rebuild is the primary bottleneck
  • Protocol include system — 5 × 750-line protocols with 95% shared content need factoring
  • Per-project configuration — separate projects should have independent state machines
  • Dashboard — replace grep-based log inspection with real observability
  • Cost tracking — the stream-json cost figures are API prices, not actual billing

Tool Versions

ToolVersionKey capabilities
airgen-cliv0.17.1reqs reassign, search, bulk-create, lint —substrate-namespace, —format json, —limit all, —homeless
uht-substratev0.6.2classify —force, entities merge/reclassify/history, facts store-bulk, —count-only, —subject filter
claude-harnessv0.1.033 modules, 7 flows, 5 roles, 9 states, CCCS gates, spec tree, session context, auto-cleanup
← all entries