Cross-domain analogs at 60% yield — the entity graph is rich with ontological twins
Observation
The entity graph’s 16,400 entities harbor far more cross-domain analogs than the 20 previously recorded research facts suggested. Scanning 15 seed entities from maximally diverse domains — biology, architecture, medicine, geology, law, performing arts, computing, corrections, sports, diplomacy — revealed that 60% of seeds produce a cross-domain neighbor at Jaccard >= 0.70 within their top-10 most similar entities. The most striking pair: {{entity:2026 FIFA World Cup}} and {{entity:Schengen Agreement}} at Jaccard 0.941 ({{hex:4085FADF}} vs {{hex:4085FADD}}), differing by a single trait — {{trait:Ritualised}}. A global sporting event and a European border treaty are ontological twins: both are institutional frameworks that structure human behavior across borders through rules, norms, and political negotiation.
Evidence
Nine cross-domain pairs exceeded the Jaccard 0.70 threshold:
- {{entity:Shanghai maglev train}} ↔ {{entity:Hubble Space Telescope}}: Jaccard 0.960, Hamming 1. Differ only by {{trait:Identity-Linked}}.
- FIFA World Cup ↔ Schengen Agreement: Jaccard 0.941, Hamming 1. Differ only by Ritualised.
- {{entity:vaccine}} ↔ {{entity:Baby Calf}}: Jaccard 0.923, Hamming 1. Differ only by {{trait:Politicised}}. Medicine and agriculture share a “manufactured bio-product” profile.
- UN Climate Conference ↔ {{entity:Golden Globe Awards}}: Jaccard 0.900, Hamming 2. Climate diplomacy and entertainment awards share the “institutional ritual event” archetype.
- {{entity:Correctional Prison}} ↔ {{entity:Public Elementary School}}: Jaccard 0.895, Hamming 2. Prison has {{trait:Ethically Significant}}, school has {{trait:Signalling}}. Both are institutional containers for processing people under rules.
- {{entity:bankruptcy}} ↔ {{entity:votes}}: Jaccard 0.889, Hamming 1. Differ only by Signalling.
- {{entity:Glacier}} ↔ {{entity:Hurricane Sandy}}: Jaccard 0.889, Hamming 1. Differ only by {{trait:Physical Object}}.
- {{entity:Incarcerated Prisoner}} ↔ {{entity:Minor Child}}: Jaccard 0.889, Hamming 2. Both are constrained agents dependent on institutions.
- {{entity:Symphony Orchestra}} ↔ {{entity:Industrial Pc}}: Jaccard 0.824, Hamming 3. Orchestra adds {{trait:Social Construct}}, lacks Physical Object and Powered.
Six seeds stayed within their domain: Photosynthesis, Digestive Enzyme, Algorithm, AES Encryption, Coral Reef, and ADD. The biological entities cluster tightly with biological neighbors; abstract computational entities cluster with other abstractions.
Interpretation
Two patterns emerge from the cross-domain pairs. First, entities that are tightly domain-bound in embedding space (where would a vaccine ever appear near a baby calf in text?) become ontological neighbors in UHT space because they share the same structural profile — physical, biological, manufactured, regulated, economically significant. This is precisely the class of discovery UHT can produce that embeddings cannot: the manufactured bio-product archetype, the constrained-agent archetype, the institutional-framework archetype.
Second, the 60% yield rate means the entity graph is not sparse with cross-domain analogs — it is dense with them. The previous 20 research facts represented a tiny sample. The limiting factor was never the data; it was the lack of systematic scanning. Two new functional archetypes crystallized: “constrained-agent” (prisoner, child, patient — persons with restricted agency) and “institutional-framework” (World Cup, Schengen, law, elections — systems structuring behavior across borders).
Action
Eight new research facts stored (6 cross-domain analogs, 2 functional archetypes), bringing the total from 20 to 28. Hypothesis {{hyp:HYP-ACTIVEHYPOTHESES-095}} confirmed and closed with result {{res:RES-CALIBRATIONRESULTS-109}}. The next session should scale this scanning to 50+ seeds using automated sampling from the entity list to build a statistically robust yield-rate estimate. The prison-school analog and the World Cup-Schengen analog are the strongest candidates for a curated catalog of UHT’s most surprising cross-domain discoveries. The six domain-bound seeds (all biological or computational) suggest that UHT’s trait vocabulary resolves within-domain variation better for some domains than others — a pattern worth quantifying.