Notes from the Test Bench

Two custom tools, three peer reviewers, and a long list of things they caught us out on.

Building tonal detectors is one half of the job. Watching them break is the other.

Two in-house tools have been doing the heavy lifting. The A/B matrix runner fires the same prompt through multiple personas and providers in one shot, twelve legs at a time. The b_tick runner pushes whole corpora through in minutes, sliced into overt, covert, clean, deceptive, and emotion buckets. Neither tool is pretty. Both are catching us out.

What they have been telling us

Layer-disagreement is real. The detector and the display layer have been running on different signals.
Determinism is harder than it looks. Random number generation was hiding inside code that was meant to be deterministic.
Scope leaks travel. A render block written for one persona inherited itself across thirteen.
Corpus quality eats benchmark interpretation. Run the validator first, or the result is meaningless.
Benchmark instruments are not interchangeable. The wrong instrument measures the wrong thing.

The peer review team

The peer review team has been Claude, Rock, and the comparative LLM stack: Grok, Gemini, ChatGPT. Different models, different blind spots, different ways of being wrong. Triangulating across them is the only reason most of these defects got caught at all. One model misses the leak. Another spots the determinism failure. A third notices the corpus is dead before we waste a benchmark run on it.

No single reviewer is reliable on its own. The discipline is in the cross-check.

Where we are

One category at a time. Hard revert triggers on false-positive rate. Corpus validator running ahead of every fire. Agency signal being rebuilt from scratch.

The tools do not need to be pretty. They are working.

Hashtth_v1_87729882598d0a14

Fingerprintv1-COH-30-78-58-83-64

Tenantblog.tonethread.studio

Issued2026-06-27

Zone Coherence (COH)

Source bureaucratick

Revised Originally published 2026-05-03, updated 2026-06-27; 10 revisions

Prior content hashes (9)

2026-05-03 tth_v1_ae8a120aa1ea2484
2026-05-03 tth_v1_ae8a120aa1ea2484
2026-05-04 tth_v1_ae8a120aa1ea2484
2026-05-05 tth_v1_a301ba282f28c792
2026-05-05 tth_v1_a301ba282f28c792
2026-05-05 tth_v1_a301ba282f28c792
2026-05-06 tth_v1_c23529b0cb1ffeb0
2026-05-07 tth_v1_e4dda1cd72b921db
2026-05-18 tth_v1_10ab63f586e0e571

Verify certificate →

Warmth

0.3

Certainty

0.78

Intensity

0.58

Coherence

0.83

Resonance

0.64