Skip to content
rfml-moe-hub

Reference · Architecture

System architecture

rfml-moe-hub splits responsibilities across two planes: a Python control plane that decides policy and a Go data plane that executes it. The two communicate over a single protobuf-backed envelope, so a dispatch looks identical whether it runs in-process or across a cluster.

Two planes, one envelope

The control plane owns slow, stateful decisions: which routing strategy is active, how the gate is configured, which arbitration protocol applies, and what lives in the expert registry. The data plane owns fast, stateless execution: running engaged experts concurrently, streaming their proposals back, and writing the audit log.

Control planepython
Hublifecycle · config
Routerscoring policy
Gateselection policy
Arbiterconsensus protocol
Registryexpert catalog
gRPC · protobuf v3 · streaming proposals
Data planego
Dispatcherwork queue
Expert poolworker goroutines
Proposal busfan-in channel
Audit logappend-only

You can run the control plane alone (experts execute in-process for local development) or attach the Go data plane for production throughput. The contract between them never changes.

The dispatch envelope

Everything the system needs to route, run, and arbitrate a task is carried in one Dispatch. It is the single source of truth that crosses the plane boundary.

dispatch.gogo
// Dispatch is the wire envelope shared by both planes (protobuf-backed).type Dispatch struct {    ID        string            // task#4f1c    Signature TaskSignature     // features the router scores against    Context   map[string]any    // opaque payload forwarded to experts    Budget    Budget            // token / latency / cost ceilings    Trace     string            // propagated from TaskFlow / Peer-Consult}type TaskSignature struct {    Capabilities []string  // requested capability tags    Embedding    []float32 // optional dense task embedding    Priority     uint8     // 0..9, feeds the gate's load penalty}
Dispatch
FieldTypeDescription
IDstringStable task identifier, used as the audit-log key.
SignatureTaskSignatureCapability tags and optional embedding the router scores against.
Contextmap[string]anyOpaque payload forwarded verbatim to engaged experts.
BudgetBudgetToken, latency, and cost ceilings enforced by the gate.
TracestringTrace id propagated from TaskFlow or a Peer-Consult session.

The dispatch lifecycle

A dispatch advances through five observable stages. Each stage emits a decision record into the session trace, so any committed result can be replayed and explained after the fact.

1
route

Router produces a score per registered expert from the task signature.

2
gate

Gate converts scores to a plan: top-k, capacity caps, capability mask, load penalty.

3
fan_out

Data plane dispatches the plan to worker goroutines; unselected experts never wake.

4
collect

Proposals stream back over the proposal bus with per-expert confidence.

5
arbitrate

Arbiter folds proposals into one committed result under the declared protocol.

lifecycle.pypython
from moe_hub import Hub# A dispatch walks five stages; each stage is observable and replayable.with hub.dispatch_session(task) as session:    scores   = session.route()       # 1. router scores every expert    plan     = session.gate(scores)  # 2. gate selects top-k under capacity    requests = session.fan_out(plan) # 3. data plane runs engaged experts    props    = session.collect()     # 4. proposals stream back in    result   = session.arbitrate()   # 5. arbiter commits one consensus# session.trace is an append-only record of every stage decision.audit = session.trace.as_json()

Why a session, not a single call

Exposing the stages individually lets you inspect router scores before paying for execution, inject a fixed plan for tests, or cache proposals across retries. hub.dispatch() is just this loop run end-to-end.

Concurrency & failure model

  • Bounded fan-out. The gate caps concurrency at k; the data plane never spawns more than k worker goroutines per dispatch.
  • Partial results. If an expert times out against its budget, its proposal is dropped and arbitration proceeds with the remaining quorum - it does not block the dispatch.
  • No silent majorities. If too few proposals survive to meet the arbiter's quorum, the dispatch fails loudly with an UnderQuorum error rather than committing a weak result.
  • Replayable. Given the same dispatch, seed, and proposals, the pipeline is deterministic end to end.