Reference · Architecture
System architecture
rfml-moe-hub splits responsibilities across two planes: a Python control plane that decides policy and a Go data plane that executes it. The two communicate over a single protobuf-backed envelope, so a dispatch looks identical whether it runs in-process or across a cluster.
Two planes, one envelope
The control plane owns slow, stateful decisions: which routing strategy is active, how the gate is configured, which arbitration protocol applies, and what lives in the expert registry. The data plane owns fast, stateless execution: running engaged experts concurrently, streaming their proposals back, and writing the audit log.
You can run the control plane alone (experts execute in-process for local development) or attach the Go data plane for production throughput. The contract between them never changes.
The dispatch envelope
Everything the system needs to route, run, and arbitrate a task is carried in one Dispatch. It is the single source of truth that crosses the plane boundary.
// Dispatch is the wire envelope shared by both planes (protobuf-backed).type Dispatch struct { ID string // task#4f1c Signature TaskSignature // features the router scores against Context map[string]any // opaque payload forwarded to experts Budget Budget // token / latency / cost ceilings Trace string // propagated from TaskFlow / Peer-Consult}type TaskSignature struct { Capabilities []string // requested capability tags Embedding []float32 // optional dense task embedding Priority uint8 // 0..9, feeds the gate's load penalty}| Field | Type | Description |
|---|---|---|
| ID | string | Stable task identifier, used as the audit-log key. |
| Signature | TaskSignature | Capability tags and optional embedding the router scores against. |
| Context | map[string]any | Opaque payload forwarded verbatim to engaged experts. |
| Budget | Budget | Token, latency, and cost ceilings enforced by the gate. |
| Trace | string | Trace id propagated from TaskFlow or a Peer-Consult session. |
The dispatch lifecycle
A dispatch advances through five observable stages. Each stage emits a decision record into the session trace, so any committed result can be replayed and explained after the fact.
Router produces a score per registered expert from the task signature.
Gate converts scores to a plan: top-k, capacity caps, capability mask, load penalty.
Data plane dispatches the plan to worker goroutines; unselected experts never wake.
Proposals stream back over the proposal bus with per-expert confidence.
Arbiter folds proposals into one committed result under the declared protocol.
from moe_hub import Hub# A dispatch walks five stages; each stage is observable and replayable.with hub.dispatch_session(task) as session: scores = session.route() # 1. router scores every expert plan = session.gate(scores) # 2. gate selects top-k under capacity requests = session.fan_out(plan) # 3. data plane runs engaged experts props = session.collect() # 4. proposals stream back in result = session.arbitrate() # 5. arbiter commits one consensus# session.trace is an append-only record of every stage decision.audit = session.trace.as_json()Why a session, not a single call
hub.dispatch() is just this loop run end-to-end.Concurrency & failure model
- Bounded fan-out. The gate caps concurrency at k; the data plane never spawns more than k worker goroutines per dispatch.
- Partial results. If an expert times out against its budget, its proposal is dropped and arbitration proceeds with the remaining quorum - it does not block the dispatch.
- No silent majorities. If too few proposals survive to meet the arbiter's quorum, the dispatch fails loudly with an UnderQuorum error rather than committing a weak result.
- Replayable. Given the same dispatch, seed, and proposals, the pipeline is deterministic end to end.