Validator metascore (design)

On Konnex, validators score miners: they rank physical work against the task instruction and subnet policy using replays, sensors, benchmarks, and—where the workload requires it—AI-assisted scoring (e.g. kinematic smoothness, grasp success, VLA behaviour). Choosing a good scoring function is therefore security-critical.

Plain smart-contract checks alone do not close the loop: AI scoring can hallucinate, and validators can collude. The protocol treats “who validates the validators” as a first-class problem and implements a hybrid, three-layer metascore on the Substrate-based chain.

Layer 1 — Weight alignment (peer consensus on scores)

The primary cryptoeconomic signal for validator quality is agreement with the rest of the validator set on the same miner outcomes.

Each validator submits a weight vector (a list of scores) over a batch of miners or tasks. The chain builds a global consensus vector—the same dimensions, aggregated across validators—using a governance-tunable rule such as median per dimension or a stake-weighted blend. Call validator i’s vector W_i and the network aggregate W-bar. When W_i tracks W-bar, that validator’s standing improves; systematic drift away from the aggregate cuts validator trust (vTrust) and lowers effective APY. The consensus term in the metascore (below) is the formal similarity between W_i and W-bar.

Lazy copy-paste consensus (everyone mirroring the same scores without doing work) is mitigated by pseudorandom hidden spot checks: validators cannot assume which submissions are audited only by alignment, so blind copying remains risky.

Layer 2 — Honeypots (deterministic reference tasks)

Robotics workloads have a structural advantage: simulators (e.g. ManiSkill, Isaac Sim, subnet-specific benches) can produce deterministic ground-truth metrics.

The network (or a governed oracle module) injects honeypots into the task mix—runs whose correct grades are known before validators vote. Validators do not know which jobs are honeypots.

If a validator’s AI stack awards high scores to an objectively failed reference (e.g. dropped object marked as success), the protocol applies a hard penalty, up to slashing on severity tiers defined by governance.

Layer 3 — Two-tier scoring inside the validator node

Each validator’s local scoring pipeline is designed as two independent channels that must agree before a vote is safe to publish:

  1. VLA / LLM layer — High-level task understanding (did the miner follow the instruction semantically?).

  2. Deterministic layer — Lightweight heuristics or closed-form checks: torque limits, timing windows, joint limits, replay deltas against sim logs, etc.

If the two layers conflict, the validator abstains or requests an extended check (subnet-defined escalation path). Publishing a confident vote despite internal disagreement is scored against the validator under Layer 1 and Layer 2.

Onchain metascore (L1)

At the base layer the protocol combines alignment, honeypots, and penalties into a single validator metascore for each validator (written S(V_i) in the figure):

Validator metascore: S of V sub i equals alpha times C of W sub i and W-bar, plus beta times H of V sub i, minus gamma times P sub i

Read the formula left to right:

  • S(V_i) — The scalar metascore for validator i; higher is better after governance normalization.

  • C(W_i, W-bar)Consensus term: how close validator i’s score vector W_i is to the network’s consensus vector W-bar (for example cosine similarity, or another distance mapped to a score on the closed unit interval from 0 to 1).

  • H(V_i)Honeypot accuracy: share (or count) of hidden reference tasks where i’s grades match the known-good labels.

  • P_iPenalties for operational faults: high latency, missed voting epochs, broken commitments, and similar slashing-adjacent behaviour.

  • α, β, γ (alpha, beta, gamma) — Positive governance weights that set how much consensus, honeypots, and penalties move S(V_i). They are tuned per subnet phase and mainnet rollout.

Relation to mining and PoPW

Miner rewards and PoPW acceptance still flow from subnet verifiers and task semantics. The metascore does not replace workload-specific scoring; it governs who is trusted to emit those scores and at what economic weight.

See also

Last updated