03 — Reference Architecture#
Project: Agentic-Native SDLC for Regulated Medical Device Engineering Document: Reference Architecture Status: Controlled — Engineering/Quality Reference Revision date: May 2026 Audience: Platform, MLOps, Security, and Quality engineering leads (1000+ developer org)
Sibling documents: 01-requirements.md · 02-maturity-model.md · 04-model-strategy-and-finetuning.md · 05-evaluation-and-validation.md · 06-agentic-workflows.md · 07-security-and-compliance.md · 08-token-and-gpu-economics.md · 09-adoption-roadmap.md
1. Architecture goals, constraints, and the layered view#
1.1 Goals#
This reference architecture is the buildable expression of the seven principles defined in 01-requirements.md. It must:
- Deliver ≥99.9% release-gate correctness as a system property (P1) by composing
Generate → Verify → Repair → Gate, not by trusting any single model output. - Wrap probabilistic model behavior in deterministic control (P2): every gate, policy decision, and promotion is reproducible from recorded inputs.
- Enforce risk-proportional autonomy (P3) keyed to IEC 62304 software safety class A/B/C, with Class C always under dual human control.
- Treat every agent action as Part 11-grade evidence (P4): immutable, attributable, time-stamped, reconstructable.
- Treat the harness as the product (P5):
Agent = Model + Harness. The architecture invests in the harness/control plane, not just the model. - Optimize cost per verified task / cost-per-green-PR (P6), making GPU and token spend a first-class, observable quantity (see 08-token-and-gpu-economics.md).
- Remain self-hosted, sovereign, and reproducible (P7): open-weight, fine-tuned models only; no external SaaS LLM APIs.
1.2 Hard constraints (non-negotiable)#
| # | Constraint | Architectural consequence |
|---|---|---|
| C1 | Open-weight, self-hosted, fine-tuned models only — no Claude/OpenAI/Gemini SaaS | All inference runs inside the cluster on the Model Fleet; no egress to LLM providers. Network policy default-deny to public LLM endpoints. |
| C2 | ≥99.9% release-gate correctness | Multi-stage verification (sandbox + policy + eval) gates every artifact; no model-only merges. |
| C3 | GPU/token cost is first-class | Tiered model fleet + routing gateway + scale-to-zero + FinOps telemetry on every span. |
| C4 | Deterministic evaluation | Pinned model digests, fixed seeds, hermetic test environments, content-addressed eval datasets. |
| C5 | Sovereign / air-gap capable | Every dependency mirrorable; no runtime dependency on internet reachability. |
1.3 The seven-plane layered view#
flowchart TB
subgraph DI["1 · Developer Interface Plane"]
IDE["IDE plugins (VS Code / JetBrains)"]
CLI["Agent CLI"]
WEB["Review & Approval Web UI"]
CICD["CI/CD triggers (Argo Events)"]
end
subgraph AO["2 · Agent / Orchestration Plane"]
RUNTIME["Agent Runtime (planner/executor)"]
A2A["A2A multi-agent bus"]
MCP["MCP Tool Plane"]
end
subgraph HC["3 · Harness / Control Plane"]
ROUTER["Model-Routing Gateway"]
POLICY["Policy Server (OPA/Gatekeeper)"]
AUTHZ["Autonomy Authorization Service"]
EVAL["Eval / Verification Service"]
SANDBOX["Sandbox Execution (gVisor/Kata)"]
end
subgraph MS["4 · Model-Serving Plane"]
VLLM["vLLM"]
TRITON["Triton + TensorRT-LLM"]
KSERVE["KServe + multi-LoRA"]
end
subgraph DK["5 · Data / Knowledge Plane"]
KG["Code Knowledge Graph"]
VEC["Vector / Embedding Store"]
REG["Regulatory Corpus"]
REGISTRY["MLflow Model Registry & Lineage"]
end
subgraph PI["6 · Platform / Infra Plane"]
K8S["Kubernetes + GPU Operator"]
RAY["Ray + Kueue"]
MESH["Istio + SPIFFE/SPIRE"]
VAULT["Vault · KEDA · Argo"]
end
subgraph GA["7 · Governance / Audit Plane"]
WORM["WORM Evidence Store (Part 11)"]
OTEL["OpenTelemetry pipeline"]
FINOPS["FinOps / cost ledger"]
SUPPLY["Sigstore/cosign · SLSA · SBOM"]
end
DI --> AO --> HC
HC --> MS
HC --> DK
AO --> DK
MS --> PI
HC --> PI
AO -.evidence.-> GA
HC -.evidence.-> GA
MS -.telemetry.-> GA
Plane responsibilities at a glance:
| Plane | Owns | Does not own |
|---|---|---|
| Developer Interface | Intent capture, review, approval surfaces | Model selection, policy |
| Agent / Orchestration | Trajectory planning, tool calls, multi-agent coordination | Inference, gating verdicts |
| Harness / Control | Routing, policy, autonomy authz, verification, gating | Model weights, business logic |
| Model-Serving | Inference, batching, LoRA hot-swap | Trajectory, gating |
| Data / Knowledge | Retrieval, lineage, registry | Generation |
| Platform / Infra | Scheduling, identity, secrets, scaling | Domain semantics |
| Governance / Audit | Immutable evidence, telemetry, supply chain | Live request handling |
2. Logical components per plane#
2.1 Developer Interface Plane#
| Component | Tech | Responsibility |
|---|---|---|
| IDE integration | VS Code / JetBrains extensions, LSP bridge | Inline intent capture, diff preview, approval prompts, trajectory visualization |
| Agent CLI | Self-hosted CLI binary (mTLS to mesh) | Headless agent invocation, batch tasks, CI usage |
| Review/Approval Web UI | Internal SPA behind Istio + OIDC | HITL review, autonomy-class approvals, evidence inspection |
| CI/CD triggers | Argo Events + Argo Workflows | Event-driven agent runs (PR opened, requirement changed) |
All interface clients are thin: they hold no model credentials and reach the cluster only through the mesh ingress with SPIFFE-issued identity.
2.2 Agent / Orchestration Plane#
| Component | Tech | Responsibility |
|---|---|---|
| Agent Runtime | Custom planner/executor on K8s Job/Pod, Ray actors for fan-out | Owns the trajectory: plan → act → observe → repair loop |
| MCP Tool Plane | Model Context Protocol servers (one per capability) | Typed, permissioned tool surface: repo.read, repo.write, test.run, kg.query, eval.submit, vault.lease |
| A2A bus | Agent-to-Agent protocol over the Istio mesh | Specialist agents (coder, reviewer, test-author, requirements-tracer) coordinate |
Tool calls never hit infrastructure directly; they are mediated by MCP servers that enforce per-tool scopes and emit evidence (P4).
2.3 Harness / Control Plane (the product, P5)#
| Component | Tech | Responsibility |
|---|---|---|
| Model-Routing Gateway | Custom gateway + classifier (Tier-S model) in front of serving | Classify request → select tier/LoRA → enforce cost budget |
| Policy Server | OPA/Gatekeeper + dedicated policy server (Rego bundles) | Admission of tool calls, autonomy decisions, write permissions |
| Autonomy Authorization Service | Custom service keyed to IEC 62304 class | Decides allowed autonomy level per task (P3); Class C → dual human |
| Eval / Verification Service | Deterministic eval harness (see 05) | Runs gate suites; emits pass/fail with evidence |
| Sandbox Execution | gVisor / Kata Containers, ephemeral namespaces | Hermetic build/test/exec of generated artifacts |
2.3.1 Model-routing gateway — classification and tier selection#
flowchart LR
REQ["Agent request<br/>(task + context budget)"] --> CLS{"Classifier<br/>(Tier-S Reflex)"}
CLS -->|"trivial / lint / format"| S["Tier-S Reflex<br/>1-8B"]
CLS -->|"bounded code edit / unit test"| M["Tier-M Worker<br/>14-34B"]
CLS -->|"design / multi-file / reasoning"| L["Tier-L Reasoner<br/>70B+/MoE"]
CLS -->|"diagram / DICOM / UI screenshot"| V["Tier-V Multimodal"]
CLS -->|"retrieval / rerank"| E["Tier-E Embed/Rerank"]
S & M & L & V & E --> BUDGET{"Cost-budget check<br/>(P6)"}
BUDGET -->|"within budget"| SERVE["Serving plane"]
BUDGET -->|"over budget"| DEGRADE["Downshift tier or queue"]
Routing inputs: task type, IEC 62304 class, context length, required latency SLO, remaining task cost budget, and the active fine-tuned LoRA adapter. The classifier itself is a cheap Tier-S model; its decision is logged as evidence so routing is auditable and reproducible.
2.4 Model-Serving Plane#
| Component | Tech | Responsibility |
|---|---|---|
| vLLM | PagedAttention, continuous batching | High-throughput text generation, Tier-S/M/L |
| Triton + TensorRT-LLM | Compiled engines, in-flight batching | Latency-critical / quantized serving |
| KServe | InferenceService CRDs, multi-LoRA hot-swap | Standardized serving surface, canary/shadow, autoscale |
2.5 Data / Knowledge Plane#
| Component | Tech | Responsibility |
|---|---|---|
| Code Knowledge Graph | Self-hosted property graph (Neo4j / JanusGraph / NebulaGraph) | Symbols, call graph, requirement→code→test traceability |
| Vector store | Self-hosted (Qdrant / Milvus / Weaviate) | ANN retrieval over code & docs using Tier-E embeddings |
| Regulatory corpus | Versioned doc store + full-text (OpenSearch) | IEC 62304 / ISO 13485 / 14971 / Part 11 reference text |
| Model Registry & Lineage | MLflow | Model versions, fine-tune lineage, signed digests, stage |
2.6 Platform / Infra Plane#
| Component | Tech | Responsibility |
|---|---|---|
| Cluster + GPU | Kubernetes + NVIDIA GPU Operator (MIG) | Scheduling, GPU lifecycle, driver/DCGM |
| Distributed compute | Ray + Kueue | Training, batch eval, fan-out inference jobs |
| Mesh & identity | Istio + SPIFFE/SPIRE | mTLS, workload identity, zero-trust east-west |
| Secrets | HashiCorp Vault | Dynamic short-lived secrets, transit, PKI |
| Autoscale | KEDA | Queue-driven scaling, scale-to-zero for idle tiers |
| Delivery | Argo CD + Argo Workflows | GitOps, pipeline orchestration |
2.7 Governance / Audit Plane#
| Component | Tech | Responsibility |
|---|---|---|
| WORM evidence store | Object store with object-lock (immutable), append-only ledger | Part 11 records, trajectory dumps, gate verdicts |
| Telemetry | OpenTelemetry collectors → metrics/traces/logs backends | End-to-end spans, cost attribution |
| FinOps ledger | Cost attribution service | Per-task GPU-seconds, tokens, cost-per-green-PR |
| Supply chain | Sigstore/cosign + SLSA provenance + SBOM | Signed images, models, and artifacts |
3. End-to-end request / trajectory flow#
sequenceDiagram
autonumber
participant Dev as Developer (IDE/CLI)
participant RT as Agent Runtime
participant KG as Knowledge/RAG
participant GW as Routing Gateway
participant MS as Model Serving
participant SB as Sandbox
participant POL as Policy Server
participant EV as Eval Service
participant AU as Autonomy Authz
participant AUD as WORM Audit
Dev->>RT: Intent (task, repo, requirement ID)
RT->>AU: Request autonomy level (IEC 62304 class)
AU-->>RT: Allowed level (Class C → dual-human required)
RT->>KG: Assemble context (graph + ANN + full-text)
KG-->>RT: Ranked context bundle (+provenance)
RT->>GW: Generation request (task + context)
GW->>GW: Classify → select tier/LoRA → budget check (P6)
GW->>MS: Route to tier
MS-->>RT: Candidate artifact (diff/code/tests)
RT->>SB: Hermetic build + test (Verify)
SB-->>RT: Build/test results
alt verification fails
RT->>GW: Repair request (failure context)
GW->>MS: Re-generate (Repair)
MS-->>RT: Revised artifact
RT->>SB: Re-verify
end
RT->>POL: Policy gate (writes, licenses, secrets)
POL-->>RT: Allow / Deny + rationale
RT->>EV: Eval gate (deterministic suite)
EV-->>RT: Pass/Fail (≥99.9% threshold)
RT->>Dev: HITL review (class-proportional)
Dev-->>RT: Approve / reject (dual for Class C)
RT->>Dev: Open PR (signed)
RT->>AUD: Write immutable evidence (P4, Part 11)
Each step emits a span and a content-addressed evidence record. The trajectory is fully reconstructable from the audit store, satisfying P4 and 21 CFR Part 11.
4. Kubernetes deployment topology#
flowchart TB
subgraph CTRL["Control / CPU node pool"]
NS_AGENT["ns: agent-runtime"]
NS_HARNESS["ns: harness-control"]
NS_KNOW["ns: knowledge"]
NS_GOV["ns: governance-audit"]
NS_PLAT["ns: platform (Vault, Istio, Argo)"]
end
subgraph GPU["GPU node pools"]
POOL_S["pool: reflex (MIG 1g.10gb / L4)"]
POOL_M["pool: worker (A10/L40S)"]
POOL_L["pool: reasoner (H100/H200, NVLink)"]
POOL_TRAIN["pool: train/batch (Kueue-managed)"]
end
subgraph SANDBOX["Sandbox node pool (CPU, isolated)"]
NS_SB["ns: sandbox-exec (gVisor/Kata, no egress)"]
end
NS_HARNESS -->|route| POOL_S & POOL_M & POOL_L
NS_AGENT --> NS_SB
POOL_TRAIN --- RAY["Ray + Kueue queues"]
4.1 Namespaces#
| Namespace | Contents | Network policy |
|---|---|---|
agent-runtime | Agent pods, A2A bus | Egress only to MCP, gateway, knowledge |
harness-control | Gateway, policy server, autonomy authz, eval | Egress to serving + knowledge; ingress from agents |
model-serving-{s,m,l,v,e} | vLLM/Triton/KServe per tier | Ingress only from gateway |
knowledge | KG, vector store, OpenSearch, MLflow | Ingress from agents/harness |
sandbox-exec | gVisor/Kata pods | Default-deny all egress; ephemeral |
governance-audit | WORM store, OTel, FinOps | Append-only ingest |
platform | Vault, Istio control plane, Argo, SPIRE | Cluster-internal |
4.2 Node pools, MIG, and scheduling#
| Pool | Hardware (example) | MIG | Scaling |
|---|---|---|---|
| reflex (Tier-S) | L4 / A10 | 1g.10gb partitions for high pod density | KEDA, scale-to-zero off-hours |
| worker (Tier-M) | L40S / A10 | optional MIG | KEDA queue-driven |
| reasoner (Tier-L) | H100/H200, NVLink + GPUDirect | full GPU, tensor/pipeline parallel | conservative; warm pool ≥1 |
| train/batch | H100 multi-node | full GPU | Kueue gang-scheduling, preemptible |
- GPU Operator manages drivers, DCGM exporters, MIG geometry, and time-slicing where MIG is too coarse.
- Kueue provides quota-managed queues for training and batch eval, with
ClusterQueue/LocalQueueand gang scheduling for multi-node Tier-L fine-tunes. - KEDA scales serving deployments off MCP/gateway queue depth and supports scale-to-zero for idle Tier-V/Tier-L adapters — central to P6.
- NetworkPolicies enforce default-deny; sandbox namespace is fully air-gapped from cluster services and the internet.
4.3 Multi-cluster, sovereign-VPC, and air-gap#
flowchart LR
subgraph SOV["Sovereign region cluster"]
direction TB
PROD["prod (serving + audit)"]
VAL["validation"]
end
subgraph DEV["Dev cluster"]
DEVNS["dev / experimentation"]
end
MIRROR["Artifact mirror<br/>(images · models · pkgs)"]
DEV -. promote (signed) .-> VAL
VAL -. promote (signed) .-> PROD
MIRROR --> SOV
MIRROR --> DEV
- Sovereign-VPC: prod and validation run in a customer-controlled region/VPC; no cross-border data flow.
- Multi-cluster: dev separated from validation/prod clusters; promotion is signed-artifact-only (cosign verified at admission).
- Air-gap option: every dependency (base images, model weights, OS packages, eval datasets) is mirrored into an internal registry. No runtime reaches the public internet. The architecture has no hard internet dependency at request time (C5/P7).
5. Model-serving subsystem in depth#
5.1 Tier → hardware mapping#
| Tier | Models (open-weight, fine-tuned) | Hardware | Serving | Quant |
|---|---|---|---|---|
| Tier-S "Reflex" 1-8B | Qwen2.5-Coder-1.5B/7B, Llama-3.2-3B | L4 / MIG slice | Triton+TRT-LLM | FP8 / INT8 |
| Tier-M "Worker" 14-34B | Qwen2.5-Coder-32B, StarCoder2-15B, DeepSeek-Coder-V2-Lite | L40S / A10 | vLLM | FP8 / AWQ-INT4 |
| Tier-L "Reasoner" 70B+/MoE | Llama-3.3-70B, Qwen2.5-72B, DeepSeek-V3/R1-distill, Mixtral | H100/H200 NVLink | vLLM / TRT-LLM | FP8 / GPTQ |
| Tier-V "Multimodal" | Qwen2.5-VL, Llama-3.2-Vision, InternVL, Pixtral | L40S/H100 | vLLM | FP8 |
| Tier-E "Embed/Rerank" | bge, gte, jina-code, nomic | L4 / CPU | Triton / TEI | INT8 |
5.2 Serving techniques#
| Technique | Applied where | Purpose |
|---|---|---|
| Continuous / in-flight batching | vLLM, Triton | Throughput; amortize GPU (P6) |
| PagedAttention KV-cache | vLLM | Memory efficiency, longer context |
| Speculative decoding | Tier-L with Tier-S drafter | Lower latency on reasoner |
| Quantization (FP8/AWQ/GPTQ) | all tiers | Fit larger models, more density |
| Multi-LoRA hot-swap | KServe/vLLM | Many fine-tuned adapters per base; per-task adapter without reload |
| Tensor/pipeline parallel | Tier-L | Serve 70B+/MoE across GPUs |
5.3 Lifecycle: autoscale, hot-swap, canary/shadow#
flowchart LR
REGY["MLflow registry<br/>(signed digest)"] -->|promote| KS["KServe InferenceService"]
KS --> CANARY["Canary 5%"]
KS --> STABLE["Stable 95%"]
SHADOW["Shadow (mirror, no user impact)"] -.eval.-> EVAL["Eval Service"]
EVAL -->|pass ≥99.9%| ROLL["Promote canary→stable"]
EVAL -->|fail| HALT["Halt + rollback"]
- New adapters/models enter as shadow (traffic mirrored, outputs eval'd offline), then canary (small %), then stable — each gated by the deterministic eval suite.
- KEDA autoscales each tier on queue depth; idle adapters scale to zero, base engines retain a warm minimum.
- Every promotion verifies a cosign signature and a pinned model digest so prod inference is reproducible (P2, P7).
6. Data & knowledge subsystem#
flowchart TB
SRC["Sources: repos · requirements · DHF · regs"] --> ING["Ingestion + sanitization<br/>(PII/secret scrub, license tag)"]
ING --> KG["Code Knowledge Graph"]
ING --> EMB["Embeddings (Tier-E)"]
ING --> FTS["Full-text index (OpenSearch)"]
EMB --> VEC["Vector store"]
QRY["Retrieval orchestrator"] --> KG
QRY --> VEC
QRY --> FTS
KG & VEC & FTS --> FUSE["Fusion + rerank (Tier-E)"]
FUSE --> CTX["Context bundle + provenance"]
6.1 Ingestion & sanitization#
Sources (repos, requirements/DHF, regulatory corpus) pass through ingestion that scrubs secrets/PII, tags license and IEC 62304 class, and content-addresses each chunk so retrieval is reproducible and auditable.
6.2 Retrieval modes#
| Mode | Backend | Use |
|---|---|---|
| Graph traversal | Code Knowledge Graph (self-hosted property graph) | Call graph, requirement→code→test traceability, blast-radius |
| ANN (semantic) | Vector store (Qdrant/Milvus) | Similar code, prior solutions, doc semantics |
| Full-text (lexical) | OpenSearch | Exact symbols, error strings, reg clauses |
Results are fused and reranked by a Tier-E model. The provenance of every retrieved chunk (source, version, digest) travels with the context bundle into the trajectory and the audit record (P4).
The code knowledge graph is the self-hosted analogue of a managed graph-of-code service (Spanner-graph-class); it must be operable inside the sovereign/air-gapped boundary.
7. Control / governance plane#
7.1 Evidence production (P4 / Part 11)#
Every plane emits structured evidence to the governance plane:
| Evidence | Producer | Stored |
|---|---|---|
| Intent + autonomy decision | Agent runtime + autonomy authz | WORM |
| Context bundle + provenance | Knowledge plane | WORM |
| Routing/classification decision | Gateway | WORM |
| Model digest + LoRA used + tokens/GPU-s | Serving + FinOps | WORM + ledger |
| Sandbox build/test results | Sandbox | WORM |
| Policy verdict + rationale | Policy server | WORM |
| Eval verdict + dataset digest | Eval service | WORM |
| HITL approver identity + signature | Review UI | WORM |
Records are written to an immutable, append-only, object-locked (WORM) store, time-stamped and attributable, satisfying 21 CFR Part 11 electronic-records/signatures and GAMP 5 traceability.
7.2 Runtime policy & autonomy enforcement (P3)#
flowchart LR
ACT["Agent action / tool call"] --> PEP["MCP enforcement point"]
PEP --> OPA["Policy server (Rego)"]
PEP --> AZ["Autonomy authz (IEC 62304 class)"]
OPA -->|deny| BLOCK["Block + evidence"]
AZ -->|Class C| DUAL["Require dual human control"]
AZ -->|Class A/B| LEVEL["Apply allowed autonomy"]
OPA -->|allow| EXEC["Execute"]
DUAL --> EXEC
LEVEL --> EXEC
- Policy is deterministic (Rego bundles, versioned, signed) wrapping the probabilistic agent (P2).
- Autonomy is risk-proportional: Class A/B may allow higher automation; Class C always requires dual human control before any write/promote.
- Identity for every actor (human or workload) is SPIFFE/SPIRE-issued; secrets are short-lived Vault leases.
8. Reference environments & promotion#
| Environment | Purpose | Models | Data | Gate to next |
|---|---|---|---|---|
| dev | Experimentation, adapter dev | latest candidate LoRAs | synthetic/masked | unit + shadow eval pass |
| validation | Formal V&V (CSA/GAMP 5) | release-candidate, pinned digests | masked production-like | full deterministic suite ≥99.9%, signed |
| prod | Live agentic SDLC | only validated, signed models | real (sovereign) | n/a |
flowchart LR
DEV["dev"] -->|signed artifact + eval pass| VAL["validation"]
VAL -->|full V&V + cosign verify| PROD["prod"]
PROD -. rollback (pinned prior digest) .-> PROD
Promotion is GitOps + signed-artifact only: Argo CD reconciles only cosign-verified images and MLflow-registered model digests; admission control (Gatekeeper) rejects anything unsigned or off-registry. In air-gapped/regulated networks, promotion crosses the boundary as a signed, mirrored bundle — never a live pull.
9. Build-vs-buy and self-hosting rationale#
| Concern | Decision | Rationale (ties to 08) |
|---|---|---|
| LLM inference | Build/host (open-weight) | C1/P7: sovereignty, reproducibility, no PHI/IP egress; predictable cost-per-green-PR |
| Model fine-tuning | Build (Ray+Kueue) | Domain/device-specific quality; full lineage in MLflow |
| Orchestration & harness | Build | P5: the harness is the differentiator and the 99.9% lever |
| Serving runtime | Buy/adopt OSS (vLLM/Triton/KServe) | Mature, self-hostable, no SaaS lock-in |
| Knowledge graph / vector / search | Buy/adopt OSS, self-host | Operable in air-gap; avoids managed-SaaS data residency issues |
| Identity/secrets/mesh | Adopt OSS (SPIRE/Vault/Istio) | Zero-trust standard, self-hostable |
| Supply chain | Adopt OSS (Sigstore/SLSA) | Required for §524B / reproducibility |
The economic case (GPU amortization via batching, MIG density, scale-to-zero, tiered routing) is developed in 08-token-and-gpu-economics.md. The architecture is intentionally biased toward owning the harness and hosting the models, and adopting mature OSS for undifferentiated platform layers.
10. Architecture-to-maturity mapping (ASMM-Med)#
Which components must be operational at each level (see 02-maturity-model.md):
| Component / capability | L1 Governed Assistance | L2 Spec-Driven Bounded | L3 Orchestrated Agentic | L4 Validated Autonomous | L5 Self-Optimizing |
|---|---|---|---|---|---|
| Self-hosted serving (vLLM/Triton) | ● | ● | ● | ● | ● |
| Tiered fleet + routing gateway | ◐ | ● | ● | ● | ● |
| Multi-LoRA hot-swap | — | ◐ | ● | ● | ● |
| Knowledge plane (KG+vector+FTS) | ◐ | ● | ● | ● | ● |
| Sandbox verify (Generate→Verify) | — | ● | ● | ● | ● |
| Deterministic eval gate (≥99.9%) | — | ◐ | ● | ● | ● |
| Policy server + autonomy authz (P3) | ◐ | ● | ● | ● | ● |
| Agent runtime + MCP tool plane | — | ◐ | ● | ● | ● |
| A2A multi-agent | — | — | ● | ● | ● |
| WORM evidence / Part 11 (P4) | ◐ | ● | ● | ● | ● |
| FinOps cost-per-green-PR (P6) | — | ◐ | ● | ● | ● |
| Canary/shadow + auto-promotion | — | — | ◐ | ● | ● |
| Closed-loop self-optimization | — | — | — | ◐ | ● |
Legend: ● required · ◐ partial/emerging · — not yet.
Adoption sequencing for these capabilities is detailed in 09-adoption-roadmap.md.
Appendix A — Component-to-principle traceability#
| Component | P1 | P2 | P3 | P4 | P5 | P6 | P7 |
|---|---|---|---|---|---|---|---|
| Routing gateway | ● | ● | ● | ● | ● | ● | |
| Sandbox verify | ● | ● | ● | ● | ● | ||
| Eval gate | ● | ● | ● | ● | ● | ||
| Policy + autonomy authz | ● | ● | ● | ● | ● | ||
| WORM evidence store | ● | ● | ● | ||||
| Model serving (self-hosted) | ● | ● | ● | ● | ● | ● | |
| FinOps ledger | ● | ● |
End of document — 03-reference-architecture.md