Two-thirds of the entity graph has cross-domain ontological twins

Observation

The entity graph is far more densely interconnected across domain boundaries than the sparse research-fact count suggested. Testing {{hyp:HYP-CLOSEDHYPOTHESES-086}} against 20 entities drawn from 18 categorically distinct domains — from {{entity:Candle}} to {{entity:Democracy}}, {{entity:Coral Reef}} to {{entity:Computational Algorithm}} — 13 of 20 (65%) had at least one cross-domain neighbor with Jaccard similarity at or above 0.70 among their top-5 most trait-similar entities. The question was whether UHT’s unique value proposition — finding ontological twins across unrelated domains that embeddings would never surface — actually exists at scale. It does, emphatically.

Evidence

Cross-domain discovery rate by popcount band: low (1-7 active traits) 3/5 = 60%, mid (8-16) 8/11 = 72.7%, high (17+) 2/4 = 50%. The most surprising analogs, all above the J >= 0.70 threshold:

  • {{entity:Artificial Satellite}} ↔ {{entity:Analog Television}}: J = 0.947, {{hex:D6FE7218}} vs {{hex:D6FE7258}} — space technology and broadcast media, near-identical trait profiles for signal-transmitting engineered systems.
  • {{entity:Cordouan Lighthouse}} ↔ {{entity:Voyager 2}}: J = 0.920, {{hex:DEFFFA78}} vs {{hex:DFFFF278}} — a 17th-century maritime beacon and an interplanetary probe share the archetype of an engineered navigation structure operating autonomously in a hostile environment.
  • {{entity:Shield Volcano}} ↔ Ch Stars: J = 0.888, {{hex:87441048}} vs {{hex:87441040}} — geology and astrophysics converge on massive natural structures with compositional layering.
  • {{entity:Coral Reef}} ↔ {{entity:Human Bone}}: J = 0.833, {{hex:AE151201}} vs {{hex:AE171200}} — marine ecology and human anatomy share the archetype of a biological mineralized structure with compositional growth.
  • {{entity:Microscope}} ↔ Tomato Sauce: J = 0.818 — scientific instruments and food products are both physical manufactured composites; UHT sees ontological structure, not functional purpose.
  • {{entity:Monsoon}} ↔ Color Television: J = 0.769 — weather systems and consumer electronics, a pairing no embedding model would produce.

Entities that failed: {{entity:Democracy}} (pc=22) found no neighbor above J = 0.500 despite 22 active traits — highly abstract governance concepts occupy sparse regions. {{entity:Computational Algorithm}} stayed within computer science. {{entity:Aachen Cathedral}} found only other religious buildings.

Interpretation

The mid-popcount gradient is real but modest: 72.7% vs 60% for low and 50% for high. The mechanism behind each band differs. Low-popcount cross-domain matches ({{entity:Compass}} ↔ Mafaldine at J = 0.857) are driven by shared absence — with only 6-7 active traits, the matching is mostly agreement on what the entity is not. Mid-popcount matches carry genuine discriminative power: 8-16 active traits encode enough positive properties to identify a meaningful functional profile. High-popcount entities paradoxically struggle because they activate so many traits that few other entities match their dense signature — or, like {{entity:Democracy}}, they are so ontologically distinctive that nothing resembles them.

The Cordouan Lighthouse ↔ Voyager 2 pairing crystallizes UHT’s contribution. No text-distributional model would associate a Gironde estuary lighthouse with a spacecraft 15 billion miles away. But both are engineered, physical, designed with intentionality, serving navigational purpose, operating autonomously, historically significant — the trait profile is the same because the kind of thing is the same.

Action

{{hyp:HYP-CLOSEDHYPOTHESES-086}} confirmed and moved to closed hypotheses. {{res:RES-CALIBRATIONRESULTS-108}} recorded with trace link. Five new CROSS_DOMAIN_ANALOG research facts stored, bringing the total from 9 to 14 cross-domain facts. The research fact base still has 25 facts for 16,373 entities — next session should continue scaling up cross-domain discovery, ideally running find-similar across 200+ entities programmatically to populate the knowledge base toward the 200-fact target. The failure of Democracy to find analogs warrants investigation: do all high-abstraction governance concepts isolate, or is Democracy uniquely multi-faceted?

← all entries