Skip to content
rfml-moe-hub

Core component · 01-02

Router & gating

The router scores how well each expert fits a task; the gate turns those scores into a concrete plan - who runs, how many, and with what weight. Together they are the part of the system that keeps execution sparse instead of broadcasting every task to every agent.

Router design

A router maps a TaskSignature to a list[Score] - one affinity logit per registered expert, plus an eligibility flag from the capability mask. The router only scores; it never decides who actually runs. That separation lets you tune scoring and selection independently.

The default learned-softmax strategy scores by cosine affinity between the task embedding and each expert's capability vector, nudged by the expert's rolling success rate, with ineligible experts forced to -inf before normalization.

router.pypython
# Router: TaskSignature -> sparse routing distribution.from dataclasses import dataclass, fieldfrom typing import Protocolimport math@dataclass(frozen=True)class Score:    expert_id: str    logit: float          # raw affinity before normalization    eligible: bool        # passes the capability mask?class RoutingStrategy(Protocol):    def score(self, sig: "TaskSignature", experts: list["Expert"]) -> list[Score]:        ...@dataclassclass LearnedSoftmaxRouter:    """Scores experts by dot-product affinity, masks the ineligible,    then normalizes the survivors with a temperature-scaled softmax."""    temperature: float = 0.7    def score(self, sig, experts) -> list[Score]:        scores = []        for e in experts:            eligible = set(sig.capabilities) <= set(e.capabilities)            affinity = _affinity(sig, e) if eligible else -math.inf            scores.append(Score(e.id, affinity, eligible))        return scoresdef _affinity(sig, expert) -> float:    # cosine of task embedding against the expert's capability vector,    # nudged by the expert's rolling success rate.    base = _cosine(sig.embedding, expert.vector)    return base + 0.15 * expert.success_rate
Score
FieldTypeDescription
expert_idstrIdentifier of the scored expert.
logitfloatRaw affinity before softmax; -inf if ineligible.
eligibleboolWhether the expert's capabilities cover the task's requested tags.

Gating mechanism

The gate is where sparsity happens. It ranks eligible experts, applies a load-balancing penalty so a busy expert yields to a comparably-scored idle peer, takes the top-k under a global capacity cap, and softmax-normalizes the survivors into routing weights that the arbiter later uses.

routing weights · top-k = 2softmax(logits)
code.synthesis
0.52
retrieval.rag
0.31
planning.graph
0.12
review.static
0.05

solid = engaged · faint = eligible but cut by top-k

gate.pypython
# Gate: scores -> an executable plan (who runs, with what weight).from dataclasses import dataclass@dataclass(frozen=True)class Plan:    engaged: list[str]          # expert ids that will run    weights: dict[str, float]   # normalized routing weight per engaged expert    dropped: list[str]          # eligible but cut by top-k / capacity@dataclassclass TopKGate:    top_k: int = 2    capacity: int = 8           # max in-flight experts across the pool    load_penalty: float = 0.10  # subtracted per unit of current load    def select(self, scores, load: dict[str, float]) -> Plan:        # 1. drop ineligible, then 2. apply a load-balancing penalty so a hot        #    expert yields to an idle peer of comparable affinity.        ranked = sorted(            (s for s in scores if s.eligible),            key=lambda s: s.logit - self.load_penalty * load.get(s.expert_id, 0),            reverse=True,        )        chosen = ranked[: min(self.top_k, self.capacity)]        weights = _softmax({s.expert_id: s.logit for s in chosen})        return Plan(            engaged=[s.expert_id for s in chosen],            weights=weights,            dropped=[s.expert_id for s in ranked[self.top_k :]],        )def _softmax(logits: dict[str, float]) -> dict[str, float]:    import math    m = max(logits.values())    exp = {k: math.exp(v - m) for k, v in logits.items()}    z = sum(exp.values())    return {k: v / z for k, v in exp.items()}

Go data-plane gate

On the hot path the same selection runs in Go, where the gate is called once per dispatch under the worker pool's lock. The logic mirrors the Python reference exactly so plans are identical across planes.

gate.gogo
// Go data-plane gate: the same selection, built for the hot path.package gateimport "sort"type Score struct {    ExpertID string    Logit    float64    Eligible bool}type Plan struct {    Engaged []string    Weights map[string]float64}type TopK struct {    K           int    Capacity    int    LoadPenalty float64}func (g TopK) Select(scores []Score, load map[string]float64) Plan {    eligible := scores[:0]    for _, s := range scores {        if s.Eligible {            eligible = append(eligible, s)        }    }    sort.SliceStable(eligible, func(i, j int) bool {        return g.adj(eligible[i], load) > g.adj(eligible[j], load)    })    k := min(g.K, g.Capacity)    if k > len(eligible) {        k = len(eligible)    }    chosen := eligible[:k]    return Plan{Engaged: ids(chosen), Weights: softmax(chosen)}}func (g TopK) adj(s Score, load map[string]float64) float64 {    return s.Logit - g.LoadPenalty*load[s.ExpertID]}

Capacity is global, not per-dispatch

capacity bounds total in-flight experts across all concurrent dispatches. Under load the gate can return fewer than top_k experts; downstream arbitration must tolerate a short plan rather than assume exactly k proposals.

Custom strategies

Both RoutingStrategy and the gate are plain protocols. Implement score() / select() to ship a hand-tuned router (keyword rules, a cost-aware gate, a sticky-session gate for stateful experts) without touching the rest of the pipeline. The hub validates that every engaged expert is eligible before fan-out, so a buggy gate fails fast instead of running the wrong expert.