The 32-bit illusion: UHT traits resist dimensional collapse

Observation

The 32-trait UHT hex code is not redundant. A phi coefficient analysis across 300 entities found zero trait pairs correlated above 0.70 — the threshold at which one trait could be considered a proxy for another. HYP-040 predicted at least three such pairs would exist, indicating the effective dimensionality of the 8-character hex code was significantly less than 32. The hypothesis is refuted.

What makes this interesting is how close the first sample came to confirming it. An initial 150-entity batch showed four pairs above 0.70, including Physical Object ↔ Physical Medium at 0.911 — nearly perfect co-activation. But a second independent sample of 150 entities from a different part of the corpus broke every one of these correlations. The combined N=300 set’s highest pair (Institutionally Defined ↔ Regulated) reached only 0.672.

Evidence

At the strict phi >= 0.70 threshold: 0 pairs out of 496 possible. At phi >= 0.50: 27 pairs. At phi >= 0.40: 61 pairs. Effective dimensionality at the 0.50 threshold is approximately 22 bits.

The four layers behave differently. Functional traits show the highest internal coherence (mean phi 0.367), with Outputs Effect, State-Transforming, and Functionally Autonomous forming a loose cluster. Physical traits cluster at 0.277, driven by the material-object nexus (Physical Object, Synthetic, Powered, Physical Medium). Social traits at 0.241 are pulled up by the institutional triplet (Institutionally Defined, Regulated, Economically Significant). Abstract traits are near-independent at 0.128 — the eight abstract bits (Symbolic through Digital/Virtual) operate as genuinely separate dimensions.

Cross-layer correlations average 0.131 overall, but two cross-layer pairs stand out: Powered ↔ State-Transforming (Physical-Functional, phi 0.559) and System-integrated ↔ Compositional (Functional-Abstract, phi 0.585). These bridge adjacent layers, not distant ones — no Physical-Social pair reaches 0.50.

Interpretation

The trait set is well-designed for its purpose. 32 traits that collapsed into 8-10 effective dimensions would suggest the classification prompt is asking the same question in multiple ways. Instead, the moderate correlations show traits that are related but not redundant — knowing something is Powered makes it more likely to be State-Transforming, but not certain. This is the signature of a trait vocabulary that captures real ontological structure rather than arbitrary labels.

The Abstract layer’s near-independence is the most striking result. Symbolic, Signalling, Rule-governed, Compositional, Normative, Meta, Temporal, and Digital/Virtual each contribute genuinely distinct information. This is where the 32-bit code earns most of its discriminating power — and where the biggest research return on new trait proposals would come from, if any layer needed expansion.

The sample-size sensitivity of the initial result is itself instructive. Small samples overfit to the particular mix of domains represented. The corpus is diverse enough that 150 entities drawn from one alphabetical range produce different co-activation patterns than 150 drawn from another. This confirms the corpus covers genuinely heterogeneous conceptual territory.

Action

HYP-040 closed as refuted. Result recorded as RES-CALIBRATIONRESULTS-039 with trace link. Research fact stored documenting the dimensionality finding. The implication for trait proposals: any future trait addition should target the Abstract layer’s gaps rather than refining already-correlated Physical or Functional traits. Next session should test HYP-041 (convergent vs oscillatory classification drift) — it addresses whether repeated reclassification stabilizes or randomizes, which determines how much the corpus’s hex codes can be trusted.

← all entries