03 — Reference Architecture#

Figure A — Reference Architecture (seven planes) · open SVG

Project: Agentic-Native SDLC for Regulated Medical Device Engineering Document: Reference Architecture Status: Controlled — Engineering/Quality Reference Revision date: May 2026 Audience: Platform, MLOps, Security, and Quality engineering leads (1000+ developer org)

Sibling documents: 01-requirements.md · 02-maturity-model.md · 04-model-strategy-and-finetuning.md · 05-evaluation-and-validation.md · 06-agentic-workflows.md · 07-security-and-compliance.md · 08-token-and-gpu-economics.md · 09-adoption-roadmap.md

1. Architecture goals, constraints, and the layered view#

1.1 Goals#

This reference architecture is the buildable expression of the seven principles defined in 01-requirements.md. It must:

Deliver ≥99.9% release-gate correctness as a system property (P1) by composing Generate → Verify → Repair → Gate, not by trusting any single model output.
Wrap probabilistic model behavior in deterministic control (P2): every gate, policy decision, and promotion is reproducible from recorded inputs.
Enforce risk-proportional autonomy (P3) keyed to IEC 62304 software safety class A/B/C, with Class C always under dual human control.
Treat every agent action as Part 11-grade evidence (P4): immutable, attributable, time-stamped, reconstructable.
Treat the harness as the product (P5): Agent = Model + Harness. The architecture invests in the harness/control plane, not just the model.
Optimize cost per verified task / cost-per-green-PR (P6), making GPU and token spend a first-class, observable quantity (see 08-token-and-gpu-economics.md).
Remain self-hosted, sovereign, and reproducible (P7): open-weight, fine-tuned models only; no external SaaS LLM APIs.

1.2 Hard constraints (non-negotiable)#

#	Constraint	Architectural consequence
C1	Open-weight, self-hosted, fine-tuned models only — no Claude/OpenAI/Gemini SaaS	All inference runs inside the cluster on the Model Fleet; no egress to LLM providers. Network policy default-deny to public LLM endpoints.
C2	≥99.9% release-gate correctness	Multi-stage verification (sandbox + policy + eval) gates every artifact; no model-only merges.
C3	GPU/token cost is first-class	Tiered model fleet + routing gateway + scale-to-zero + FinOps telemetry on every span.
C4	Deterministic evaluation	Pinned model digests, fixed seeds, hermetic test environments, content-addressed eval datasets.
C5	Sovereign / air-gap capable	Every dependency mirrorable; no runtime dependency on internet reachability.

1.3 The seven-plane layered view#

flowchart TB
    subgraph DI["1 · Developer Interface Plane"]
        IDE["IDE plugins (VS Code / JetBrains)"]
        CLI["Agent CLI"]
        WEB["Review & Approval Web UI"]
        CICD["CI/CD triggers (Argo Events)"]
    end
    subgraph AO["2 · Agent / Orchestration Plane"]
        RUNTIME["Agent Runtime (planner/executor)"]
        A2A["A2A multi-agent bus"]
        MCP["MCP Tool Plane"]
    end
    subgraph HC["3 · Harness / Control Plane"]
        ROUTER["Model-Routing Gateway"]
        POLICY["Policy Server (OPA/Gatekeeper)"]
        AUTHZ["Autonomy Authorization Service"]
        EVAL["Eval / Verification Service"]
        SANDBOX["Sandbox Execution (gVisor/Kata)"]
    end
    subgraph MS["4 · Model-Serving Plane"]
        VLLM["vLLM"]
        TRITON["Triton + TensorRT-LLM"]
        KSERVE["KServe + multi-LoRA"]
    end
    subgraph DK["5 · Data / Knowledge Plane"]
        KG["Code Knowledge Graph"]
        VEC["Vector / Embedding Store"]
        REG["Regulatory Corpus"]
        REGISTRY["MLflow Model Registry & Lineage"]
    end
    subgraph PI["6 · Platform / Infra Plane"]
        K8S["Kubernetes + GPU Operator"]
        RAY["Ray + Kueue"]
        MESH["Istio + SPIFFE/SPIRE"]
        VAULT["Vault · KEDA · Argo"]
    end
    subgraph GA["7 · Governance / Audit Plane"]
        WORM["WORM Evidence Store (Part 11)"]
        OTEL["OpenTelemetry pipeline"]
        FINOPS["FinOps / cost ledger"]
        SUPPLY["Sigstore/cosign · SLSA · SBOM"]
    end

    DI --> AO --> HC
    HC --> MS
    HC --> DK
    AO --> DK
    MS --> PI
    HC --> PI
    AO -.evidence.-> GA
    HC -.evidence.-> GA
    MS -.telemetry.-> GA

Plane responsibilities at a glance:

Plane	Owns	Does not own
Developer Interface	Intent capture, review, approval surfaces	Model selection, policy
Agent / Orchestration	Trajectory planning, tool calls, multi-agent coordination	Inference, gating verdicts
Harness / Control	Routing, policy, autonomy authz, verification, gating	Model weights, business logic
Model-Serving	Inference, batching, LoRA hot-swap	Trajectory, gating
Data / Knowledge	Retrieval, lineage, registry	Generation
Platform / Infra	Scheduling, identity, secrets, scaling	Domain semantics
Governance / Audit	Immutable evidence, telemetry, supply chain	Live request handling

2. Logical components per plane#

2.1 Developer Interface Plane#

Component	Tech	Responsibility
IDE integration	VS Code / JetBrains extensions, LSP bridge	Inline intent capture, diff preview, approval prompts, trajectory visualization
Agent CLI	Self-hosted CLI binary (mTLS to mesh)	Headless agent invocation, batch tasks, CI usage
Review/Approval Web UI	Internal SPA behind Istio + OIDC	HITL review, autonomy-class approvals, evidence inspection
CI/CD triggers	Argo Events + Argo Workflows	Event-driven agent runs (PR opened, requirement changed)

All interface clients are thin: they hold no model credentials and reach the cluster only through the mesh ingress with SPIFFE-issued identity.

2.2 Agent / Orchestration Plane#

Component	Tech	Responsibility
Agent Runtime	Custom planner/executor on K8s `Job`/`Pod`, Ray actors for fan-out	Owns the trajectory: plan → act → observe → repair loop
MCP Tool Plane	Model Context Protocol servers (one per capability)	Typed, permissioned tool surface: `repo.read`, `repo.write`, `test.run`, `kg.query`, `eval.submit`, `vault.lease`
A2A bus	Agent-to-Agent protocol over the Istio mesh	Specialist agents (coder, reviewer, test-author, requirements-tracer) coordinate

Tool calls never hit infrastructure directly; they are mediated by MCP servers that enforce per-tool scopes and emit evidence (P4).

2.3 Harness / Control Plane (the product, P5)#

Component	Tech	Responsibility
Model-Routing Gateway	Custom gateway + classifier (Tier-S model) in front of serving	Classify request → select tier/LoRA → enforce cost budget
Policy Server	OPA/Gatekeeper + dedicated policy server (Rego bundles)	Admission of tool calls, autonomy decisions, write permissions
Autonomy Authorization Service	Custom service keyed to IEC 62304 class	Decides allowed autonomy level per task (P3); Class C → dual human
Eval / Verification Service	Deterministic eval harness (see 05)	Runs gate suites; emits pass/fail with evidence
Sandbox Execution	gVisor / Kata Containers, ephemeral namespaces	Hermetic build/test/exec of generated artifacts

2.3.1 Model-routing gateway — classification and tier selection#

flowchart LR
    REQ["Agent request<br/>(task + context budget)"] --> CLS{"Classifier<br/>(Tier-S Reflex)"}
    CLS -->|"trivial / lint / format"| S["Tier-S Reflex<br/>1-8B"]
    CLS -->|"bounded code edit / unit test"| M["Tier-M Worker<br/>14-34B"]
    CLS -->|"design / multi-file / reasoning"| L["Tier-L Reasoner<br/>70B+/MoE"]
    CLS -->|"diagram / DICOM / UI screenshot"| V["Tier-V Multimodal"]
    CLS -->|"retrieval / rerank"| E["Tier-E Embed/Rerank"]
    S & M & L & V & E --> BUDGET{"Cost-budget check<br/>(P6)"}
    BUDGET -->|"within budget"| SERVE["Serving plane"]
    BUDGET -->|"over budget"| DEGRADE["Downshift tier or queue"]

Routing inputs: task type, IEC 62304 class, context length, required latency SLO, remaining task cost budget, and the active fine-tuned LoRA adapter. The classifier itself is a cheap Tier-S model; its decision is logged as evidence so routing is auditable and reproducible.

2.4 Model-Serving Plane#

Component	Tech	Responsibility
vLLM	PagedAttention, continuous batching	High-throughput text generation, Tier-S/M/L
Triton + TensorRT-LLM	Compiled engines, in-flight batching	Latency-critical / quantized serving
KServe	InferenceService CRDs, multi-LoRA hot-swap	Standardized serving surface, canary/shadow, autoscale

2.5 Data / Knowledge Plane#

Component	Tech	Responsibility
Code Knowledge Graph	Self-hosted property graph (Neo4j / JanusGraph / NebulaGraph)	Symbols, call graph, requirement→code→test traceability
Vector store	Self-hosted (Qdrant / Milvus / Weaviate)	ANN retrieval over code & docs using Tier-E embeddings
Regulatory corpus	Versioned doc store + full-text (OpenSearch)	IEC 62304 / ISO 13485 / 14971 / Part 11 reference text
Model Registry & Lineage	MLflow	Model versions, fine-tune lineage, signed digests, stage

2.6 Platform / Infra Plane#

Component	Tech	Responsibility
Cluster + GPU	Kubernetes + NVIDIA GPU Operator (MIG)	Scheduling, GPU lifecycle, driver/DCGM
Distributed compute	Ray + Kueue	Training, batch eval, fan-out inference jobs
Mesh & identity	Istio + SPIFFE/SPIRE	mTLS, workload identity, zero-trust east-west
Secrets	HashiCorp Vault	Dynamic short-lived secrets, transit, PKI
Autoscale	KEDA	Queue-driven scaling, scale-to-zero for idle tiers
Delivery	Argo CD + Argo Workflows	GitOps, pipeline orchestration

2.7 Governance / Audit Plane#

Component	Tech	Responsibility
WORM evidence store	Object store with object-lock (immutable), append-only ledger	Part 11 records, trajectory dumps, gate verdicts
Telemetry	OpenTelemetry collectors → metrics/traces/logs backends	End-to-end spans, cost attribution
FinOps ledger	Cost attribution service	Per-task GPU-seconds, tokens, cost-per-green-PR
Supply chain	Sigstore/cosign + SLSA provenance + SBOM	Signed images, models, and artifacts

3. End-to-end request / trajectory flow#

sequenceDiagram
    autonumber
    participant Dev as Developer (IDE/CLI)
    participant RT as Agent Runtime
    participant KG as Knowledge/RAG
    participant GW as Routing Gateway
    participant MS as Model Serving
    participant SB as Sandbox
    participant POL as Policy Server
    participant EV as Eval Service
    participant AU as Autonomy Authz
    participant AUD as WORM Audit

    Dev->>RT: Intent (task, repo, requirement ID)
    RT->>AU: Request autonomy level (IEC 62304 class)
    AU-->>RT: Allowed level (Class C → dual-human required)
    RT->>KG: Assemble context (graph + ANN + full-text)
    KG-->>RT: Ranked context bundle (+provenance)
    RT->>GW: Generation request (task + context)
    GW->>GW: Classify → select tier/LoRA → budget check (P6)
    GW->>MS: Route to tier
    MS-->>RT: Candidate artifact (diff/code/tests)
    RT->>SB: Hermetic build + test (Verify)
    SB-->>RT: Build/test results
    alt verification fails
        RT->>GW: Repair request (failure context)
        GW->>MS: Re-generate (Repair)
        MS-->>RT: Revised artifact
        RT->>SB: Re-verify
    end
    RT->>POL: Policy gate (writes, licenses, secrets)
    POL-->>RT: Allow / Deny + rationale
    RT->>EV: Eval gate (deterministic suite)
    EV-->>RT: Pass/Fail (≥99.9% threshold)
    RT->>Dev: HITL review (class-proportional)
    Dev-->>RT: Approve / reject (dual for Class C)
    RT->>Dev: Open PR (signed)
    RT->>AUD: Write immutable evidence (P4, Part 11)

Each step emits a span and a content-addressed evidence record. The trajectory is fully reconstructable from the audit store, satisfying P4 and 21 CFR Part 11.

4. Kubernetes deployment topology#

flowchart TB
    subgraph CTRL["Control / CPU node pool"]
        NS_AGENT["ns: agent-runtime"]
        NS_HARNESS["ns: harness-control"]
        NS_KNOW["ns: knowledge"]
        NS_GOV["ns: governance-audit"]
        NS_PLAT["ns: platform (Vault, Istio, Argo)"]
    end
    subgraph GPU["GPU node pools"]
        POOL_S["pool: reflex (MIG 1g.10gb / L4)"]
        POOL_M["pool: worker (A10/L40S)"]
        POOL_L["pool: reasoner (H100/H200, NVLink)"]
        POOL_TRAIN["pool: train/batch (Kueue-managed)"]
    end
    subgraph SANDBOX["Sandbox node pool (CPU, isolated)"]
        NS_SB["ns: sandbox-exec (gVisor/Kata, no egress)"]
    end

    NS_HARNESS -->|route| POOL_S & POOL_M & POOL_L
    NS_AGENT --> NS_SB
    POOL_TRAIN --- RAY["Ray + Kueue queues"]

4.1 Namespaces#

Namespace	Contents	Network policy
`agent-runtime`	Agent pods, A2A bus	Egress only to MCP, gateway, knowledge
`harness-control`	Gateway, policy server, autonomy authz, eval	Egress to serving + knowledge; ingress from agents
`model-serving-{s,m,l,v,e}`	vLLM/Triton/KServe per tier	Ingress only from gateway
`knowledge`	KG, vector store, OpenSearch, MLflow	Ingress from agents/harness
`sandbox-exec`	gVisor/Kata pods	Default-deny all egress; ephemeral
`governance-audit`	WORM store, OTel, FinOps	Append-only ingest
`platform`	Vault, Istio control plane, Argo, SPIRE	Cluster-internal

4.2 Node pools, MIG, and scheduling#

Pool	Hardware (example)	MIG	Scaling
reflex (Tier-S)	L4 / A10	`1g.10gb` partitions for high pod density	KEDA, scale-to-zero off-hours
worker (Tier-M)	L40S / A10	optional MIG	KEDA queue-driven
reasoner (Tier-L)	H100/H200, NVLink + GPUDirect	full GPU, tensor/pipeline parallel	conservative; warm pool ≥1
train/batch	H100 multi-node	full GPU	Kueue gang-scheduling, preemptible

GPU Operator manages drivers, DCGM exporters, MIG geometry, and time-slicing where MIG is too coarse.
Kueue provides quota-managed queues for training and batch eval, with ClusterQueue/LocalQueue and gang scheduling for multi-node Tier-L fine-tunes.
KEDA scales serving deployments off MCP/gateway queue depth and supports scale-to-zero for idle Tier-V/Tier-L adapters — central to P6.
NetworkPolicies enforce default-deny; sandbox namespace is fully air-gapped from cluster services and the internet.

4.3 Multi-cluster, sovereign-VPC, and air-gap#

flowchart LR
    subgraph SOV["Sovereign region cluster"]
        direction TB
        PROD["prod (serving + audit)"]
        VAL["validation"]
    end
    subgraph DEV["Dev cluster"]
        DEVNS["dev / experimentation"]
    end
    MIRROR["Artifact mirror<br/>(images · models · pkgs)"]
    DEV -. promote (signed) .-> VAL
    VAL -. promote (signed) .-> PROD
    MIRROR --> SOV
    MIRROR --> DEV

Sovereign-VPC: prod and validation run in a customer-controlled region/VPC; no cross-border data flow.
Multi-cluster: dev separated from validation/prod clusters; promotion is signed-artifact-only (cosign verified at admission).
Air-gap option: every dependency (base images, model weights, OS packages, eval datasets) is mirrored into an internal registry. No runtime reaches the public internet. The architecture has no hard internet dependency at request time (C5/P7).

5. Model-serving subsystem in depth#

5.1 Tier → hardware mapping#

Tier	Models (open-weight, fine-tuned)	Hardware	Serving	Quant
Tier-S "Reflex" 1-8B	Qwen2.5-Coder-1.5B/7B, Llama-3.2-3B	L4 / MIG slice	Triton+TRT-LLM	FP8 / INT8
Tier-M "Worker" 14-34B	Qwen2.5-Coder-32B, StarCoder2-15B, DeepSeek-Coder-V2-Lite	L40S / A10	vLLM	FP8 / AWQ-INT4
Tier-L "Reasoner" 70B+/MoE	Llama-3.3-70B, Qwen2.5-72B, DeepSeek-V3/R1-distill, Mixtral	H100/H200 NVLink	vLLM / TRT-LLM	FP8 / GPTQ
Tier-V "Multimodal"	Qwen2.5-VL, Llama-3.2-Vision, InternVL, Pixtral	L40S/H100	vLLM	FP8
Tier-E "Embed/Rerank"	bge, gte, jina-code, nomic	L4 / CPU	Triton / TEI	INT8

5.2 Serving techniques#

Technique	Applied where	Purpose
Continuous / in-flight batching	vLLM, Triton	Throughput; amortize GPU (P6)
PagedAttention KV-cache	vLLM	Memory efficiency, longer context
Speculative decoding	Tier-L with Tier-S drafter	Lower latency on reasoner
Quantization (FP8/AWQ/GPTQ)	all tiers	Fit larger models, more density
Multi-LoRA hot-swap	KServe/vLLM	Many fine-tuned adapters per base; per-task adapter without reload
Tensor/pipeline parallel	Tier-L	Serve 70B+/MoE across GPUs

5.3 Lifecycle: autoscale, hot-swap, canary/shadow#

flowchart LR
    REGY["MLflow registry<br/>(signed digest)"] -->|promote| KS["KServe InferenceService"]
    KS --> CANARY["Canary 5%"]
    KS --> STABLE["Stable 95%"]
    SHADOW["Shadow (mirror, no user impact)"] -.eval.-> EVAL["Eval Service"]
    EVAL -->|pass ≥99.9%| ROLL["Promote canary→stable"]
    EVAL -->|fail| HALT["Halt + rollback"]

New adapters/models enter as shadow (traffic mirrored, outputs eval'd offline), then canary (small %), then stable — each gated by the deterministic eval suite.
KEDA autoscales each tier on queue depth; idle adapters scale to zero, base engines retain a warm minimum.
Every promotion verifies a cosign signature and a pinned model digest so prod inference is reproducible (P2, P7).

6. Data & knowledge subsystem#

flowchart TB
    SRC["Sources: repos · requirements · DHF · regs"] --> ING["Ingestion + sanitization<br/>(PII/secret scrub, license tag)"]
    ING --> KG["Code Knowledge Graph"]
    ING --> EMB["Embeddings (Tier-E)"]
    ING --> FTS["Full-text index (OpenSearch)"]
    EMB --> VEC["Vector store"]
    QRY["Retrieval orchestrator"] --> KG
    QRY --> VEC
    QRY --> FTS
    KG & VEC & FTS --> FUSE["Fusion + rerank (Tier-E)"]
    FUSE --> CTX["Context bundle + provenance"]

6.1 Ingestion & sanitization#

Sources (repos, requirements/DHF, regulatory corpus) pass through ingestion that scrubs secrets/PII, tags license and IEC 62304 class, and content-addresses each chunk so retrieval is reproducible and auditable.

6.2 Retrieval modes#

Mode	Backend	Use
Graph traversal	Code Knowledge Graph (self-hosted property graph)	Call graph, requirement→code→test traceability, blast-radius
ANN (semantic)	Vector store (Qdrant/Milvus)	Similar code, prior solutions, doc semantics
Full-text (lexical)	OpenSearch	Exact symbols, error strings, reg clauses

Results are fused and reranked by a Tier-E model. The provenance of every retrieved chunk (source, version, digest) travels with the context bundle into the trajectory and the audit record (P4).

The code knowledge graph is the self-hosted analogue of a managed graph-of-code service (Spanner-graph-class); it must be operable inside the sovereign/air-gapped boundary.

7. Control / governance plane#

7.1 Evidence production (P4 / Part 11)#

Every plane emits structured evidence to the governance plane:

Evidence	Producer	Stored
Intent + autonomy decision	Agent runtime + autonomy authz	WORM
Context bundle + provenance	Knowledge plane	WORM
Routing/classification decision	Gateway	WORM
Model digest + LoRA used + tokens/GPU-s	Serving + FinOps	WORM + ledger
Sandbox build/test results	Sandbox	WORM
Policy verdict + rationale	Policy server	WORM
Eval verdict + dataset digest	Eval service	WORM
HITL approver identity + signature	Review UI	WORM

Records are written to an immutable, append-only, object-locked (WORM) store, time-stamped and attributable, satisfying 21 CFR Part 11 electronic-records/signatures and GAMP 5 traceability.

7.2 Runtime policy & autonomy enforcement (P3)#

flowchart LR
    ACT["Agent action / tool call"] --> PEP["MCP enforcement point"]
    PEP --> OPA["Policy server (Rego)"]
    PEP --> AZ["Autonomy authz (IEC 62304 class)"]
    OPA -->|deny| BLOCK["Block + evidence"]
    AZ -->|Class C| DUAL["Require dual human control"]
    AZ -->|Class A/B| LEVEL["Apply allowed autonomy"]
    OPA -->|allow| EXEC["Execute"]
    DUAL --> EXEC
    LEVEL --> EXEC

Policy is deterministic (Rego bundles, versioned, signed) wrapping the probabilistic agent (P2).
Autonomy is risk-proportional: Class A/B may allow higher automation; Class C always requires dual human control before any write/promote.
Identity for every actor (human or workload) is SPIFFE/SPIRE-issued; secrets are short-lived Vault leases.

8. Reference environments & promotion#

Environment	Purpose	Models	Data	Gate to next
dev	Experimentation, adapter dev	latest candidate LoRAs	synthetic/masked	unit + shadow eval pass
validation	Formal V&V (CSA/GAMP 5)	release-candidate, pinned digests	masked production-like	full deterministic suite ≥99.9%, signed
prod	Live agentic SDLC	only validated, signed models	real (sovereign)	n/a

flowchart LR
    DEV["dev"] -->|signed artifact + eval pass| VAL["validation"]
    VAL -->|full V&V + cosign verify| PROD["prod"]
    PROD -. rollback (pinned prior digest) .-> PROD

Promotion is GitOps + signed-artifact only: Argo CD reconciles only cosign-verified images and MLflow-registered model digests; admission control (Gatekeeper) rejects anything unsigned or off-registry. In air-gapped/regulated networks, promotion crosses the boundary as a signed, mirrored bundle — never a live pull.

9. Build-vs-buy and self-hosting rationale#

Concern	Decision	Rationale (ties to 08)
LLM inference	Build/host (open-weight)	C1/P7: sovereignty, reproducibility, no PHI/IP egress; predictable cost-per-green-PR
Model fine-tuning	Build (Ray+Kueue)	Domain/device-specific quality; full lineage in MLflow
Orchestration & harness	Build	P5: the harness is the differentiator and the 99.9% lever
Serving runtime	Buy/adopt OSS (vLLM/Triton/KServe)	Mature, self-hostable, no SaaS lock-in
Knowledge graph / vector / search	Buy/adopt OSS, self-host	Operable in air-gap; avoids managed-SaaS data residency issues
Identity/secrets/mesh	Adopt OSS (SPIRE/Vault/Istio)	Zero-trust standard, self-hostable
Supply chain	Adopt OSS (Sigstore/SLSA)	Required for §524B / reproducibility

The economic case (GPU amortization via batching, MIG density, scale-to-zero, tiered routing) is developed in 08-token-and-gpu-economics.md. The architecture is intentionally biased toward owning the harness and hosting the models, and adopting mature OSS for undifferentiated platform layers.

10. Architecture-to-maturity mapping (ASMM-Med)#

Which components must be operational at each level (see 02-maturity-model.md):

Component / capability	L1 Governed Assistance	L2 Spec-Driven Bounded	L3 Orchestrated Agentic	L4 Validated Autonomous	L5 Self-Optimizing
Self-hosted serving (vLLM/Triton)	●	●	●	●	●
Tiered fleet + routing gateway	◐	●	●	●	●
Multi-LoRA hot-swap	—	◐	●	●	●
Knowledge plane (KG+vector+FTS)	◐	●	●	●	●
Sandbox verify (Generate→Verify)	—	●	●	●	●
Deterministic eval gate (≥99.9%)	—	◐	●	●	●
Policy server + autonomy authz (P3)	◐	●	●	●	●
Agent runtime + MCP tool plane	—	◐	●	●	●
A2A multi-agent	—	—	●	●	●
WORM evidence / Part 11 (P4)	◐	●	●	●	●
FinOps cost-per-green-PR (P6)	—	◐	●	●	●
Canary/shadow + auto-promotion	—	—	◐	●	●
Closed-loop self-optimization	—	—	—	◐	●

Legend: ● required · ◐ partial/emerging · — not yet.

Adoption sequencing for these capabilities is detailed in 09-adoption-roadmap.md.

Appendix A — Component-to-principle traceability#

Component	P1	P2	P3	P4	P5	P6	P7
Routing gateway		●	●	●	●	●	●
Sandbox verify	●	●		●	●		●
Eval gate	●	●		●	●		●
Policy + autonomy authz	●	●	●	●	●
WORM evidence store		●		●			●
Model serving (self-hosted)	●	●		●	●	●	●
FinOps ledger				●		●

End of document — 03-reference-architecture.md