Core component · 01-02
Router & gating
The router scores how well each expert fits a task; the gate turns those scores into a concrete plan - who runs, how many, and with what weight. Together they are the part of the system that keeps execution sparse instead of broadcasting every task to every agent.
Router design
A router maps a TaskSignature to a list[Score] - one affinity logit per registered expert, plus an eligibility flag from the capability mask. The router only scores; it never decides who actually runs. That separation lets you tune scoring and selection independently.
The default learned-softmax strategy scores by cosine affinity between the task embedding and each expert's capability vector, nudged by the expert's rolling success rate, with ineligible experts forced to -inf before normalization.
# Router: TaskSignature -> sparse routing distribution.from dataclasses import dataclass, fieldfrom typing import Protocolimport math@dataclass(frozen=True)class Score: expert_id: str logit: float # raw affinity before normalization eligible: bool # passes the capability mask?class RoutingStrategy(Protocol): def score(self, sig: "TaskSignature", experts: list["Expert"]) -> list[Score]: ...@dataclassclass LearnedSoftmaxRouter: """Scores experts by dot-product affinity, masks the ineligible, then normalizes the survivors with a temperature-scaled softmax.""" temperature: float = 0.7 def score(self, sig, experts) -> list[Score]: scores = [] for e in experts: eligible = set(sig.capabilities) <= set(e.capabilities) affinity = _affinity(sig, e) if eligible else -math.inf scores.append(Score(e.id, affinity, eligible)) return scoresdef _affinity(sig, expert) -> float: # cosine of task embedding against the expert's capability vector, # nudged by the expert's rolling success rate. base = _cosine(sig.embedding, expert.vector) return base + 0.15 * expert.success_rate| Field | Type | Description |
|---|---|---|
| expert_id | str | Identifier of the scored expert. |
| logit | float | Raw affinity before softmax; -inf if ineligible. |
| eligible | bool | Whether the expert's capabilities cover the task's requested tags. |
Gating mechanism
The gate is where sparsity happens. It ranks eligible experts, applies a load-balancing penalty so a busy expert yields to a comparably-scored idle peer, takes the top-k under a global capacity cap, and softmax-normalizes the survivors into routing weights that the arbiter later uses.
solid = engaged · faint = eligible but cut by top-k
# Gate: scores -> an executable plan (who runs, with what weight).from dataclasses import dataclass@dataclass(frozen=True)class Plan: engaged: list[str] # expert ids that will run weights: dict[str, float] # normalized routing weight per engaged expert dropped: list[str] # eligible but cut by top-k / capacity@dataclassclass TopKGate: top_k: int = 2 capacity: int = 8 # max in-flight experts across the pool load_penalty: float = 0.10 # subtracted per unit of current load def select(self, scores, load: dict[str, float]) -> Plan: # 1. drop ineligible, then 2. apply a load-balancing penalty so a hot # expert yields to an idle peer of comparable affinity. ranked = sorted( (s for s in scores if s.eligible), key=lambda s: s.logit - self.load_penalty * load.get(s.expert_id, 0), reverse=True, ) chosen = ranked[: min(self.top_k, self.capacity)] weights = _softmax({s.expert_id: s.logit for s in chosen}) return Plan( engaged=[s.expert_id for s in chosen], weights=weights, dropped=[s.expert_id for s in ranked[self.top_k :]], )def _softmax(logits: dict[str, float]) -> dict[str, float]: import math m = max(logits.values()) exp = {k: math.exp(v - m) for k, v in logits.items()} z = sum(exp.values()) return {k: v / z for k, v in exp.items()}Go data-plane gate
On the hot path the same selection runs in Go, where the gate is called once per dispatch under the worker pool's lock. The logic mirrors the Python reference exactly so plans are identical across planes.
// Go data-plane gate: the same selection, built for the hot path.package gateimport "sort"type Score struct { ExpertID string Logit float64 Eligible bool}type Plan struct { Engaged []string Weights map[string]float64}type TopK struct { K int Capacity int LoadPenalty float64}func (g TopK) Select(scores []Score, load map[string]float64) Plan { eligible := scores[:0] for _, s := range scores { if s.Eligible { eligible = append(eligible, s) } } sort.SliceStable(eligible, func(i, j int) bool { return g.adj(eligible[i], load) > g.adj(eligible[j], load) }) k := min(g.K, g.Capacity) if k > len(eligible) { k = len(eligible) } chosen := eligible[:k] return Plan{Engaged: ids(chosen), Weights: softmax(chosen)}}func (g TopK) adj(s Score, load map[string]float64) float64 { return s.Logit - g.LoadPenalty*load[s.ExpertID]}Capacity is global, not per-dispatch
capacity bounds total in-flight experts across all concurrent dispatches. Under load the gate can return fewer than top_k experts; downstream arbitration must tolerate a short plan rather than assume exactly k proposals.Custom strategies
Both RoutingStrategy and the gate are plain protocols. Implement score() / select() to ship a hand-tuned router (keyword rules, a cost-aware gate, a sticky-session gate for stateful experts) without touching the rest of the pipeline. The hub validates that every engaged expert is eligible before fan-out, so a buggy gate fails fast instead of running the wrong expert.