← Unovie.AI Agentic-Native SDLC · Regulated MedTech

03 — Reference Architecture#

Reference Architecture — seven planes (Kubernetes-native, self-hosted)Developer InterfaceIDE pluginCLI / terminal agentMCP clientReview UIAgent & OrchestrationPlannerCoderTestReviewIntegrator · A2AArgo WorkflowsHarness & ControlRouting GatewayPolicy ServerSandbox MgrHooksSessions / MemoryModel Serving · self-hostedTier-STier-MTier-LTier-VTier-EvLLM·Triton·KServeData & KnowledgeCode GraphVector StoreReg. CorpusMLflow RegistryContext StorePlatform & InfrastructureKubernetesGPU OperatorIstio + SPIREVaultKEDA · KueueGovernance & Audit · 21 CFR Part 11 · WORM evidence
Figure A — Reference Architecture (seven planes)  ·  open SVG

Project: Agentic-Native SDLC for Regulated Medical Device Engineering Document: Reference Architecture Status: Controlled — Engineering/Quality Reference Revision date: May 2026 Audience: Platform, MLOps, Security, and Quality engineering leads (1000+ developer org)

Sibling documents: 01-requirements.md · 02-maturity-model.md · 04-model-strategy-and-finetuning.md · 05-evaluation-and-validation.md · 06-agentic-workflows.md · 07-security-and-compliance.md · 08-token-and-gpu-economics.md · 09-adoption-roadmap.md


1. Architecture goals, constraints, and the layered view#

1.1 Goals#

This reference architecture is the buildable expression of the seven principles defined in 01-requirements.md. It must:

  • Deliver ≥99.9% release-gate correctness as a system property (P1) by composing Generate → Verify → Repair → Gate, not by trusting any single model output.
  • Wrap probabilistic model behavior in deterministic control (P2): every gate, policy decision, and promotion is reproducible from recorded inputs.
  • Enforce risk-proportional autonomy (P3) keyed to IEC 62304 software safety class A/B/C, with Class C always under dual human control.
  • Treat every agent action as Part 11-grade evidence (P4): immutable, attributable, time-stamped, reconstructable.
  • Treat the harness as the product (P5): Agent = Model + Harness. The architecture invests in the harness/control plane, not just the model.
  • Optimize cost per verified task / cost-per-green-PR (P6), making GPU and token spend a first-class, observable quantity (see 08-token-and-gpu-economics.md).
  • Remain self-hosted, sovereign, and reproducible (P7): open-weight, fine-tuned models only; no external SaaS LLM APIs.

1.2 Hard constraints (non-negotiable)#

#ConstraintArchitectural consequence
C1Open-weight, self-hosted, fine-tuned models only — no Claude/OpenAI/Gemini SaaSAll inference runs inside the cluster on the Model Fleet; no egress to LLM providers. Network policy default-deny to public LLM endpoints.
C2≥99.9% release-gate correctnessMulti-stage verification (sandbox + policy + eval) gates every artifact; no model-only merges.
C3GPU/token cost is first-classTiered model fleet + routing gateway + scale-to-zero + FinOps telemetry on every span.
C4Deterministic evaluationPinned model digests, fixed seeds, hermetic test environments, content-addressed eval datasets.
C5Sovereign / air-gap capableEvery dependency mirrorable; no runtime dependency on internet reachability.

1.3 The seven-plane layered view#

flowchart TB
    subgraph DI["1 · Developer Interface Plane"]
        IDE["IDE plugins (VS Code / JetBrains)"]
        CLI["Agent CLI"]
        WEB["Review & Approval Web UI"]
        CICD["CI/CD triggers (Argo Events)"]
    end
    subgraph AO["2 · Agent / Orchestration Plane"]
        RUNTIME["Agent Runtime (planner/executor)"]
        A2A["A2A multi-agent bus"]
        MCP["MCP Tool Plane"]
    end
    subgraph HC["3 · Harness / Control Plane"]
        ROUTER["Model-Routing Gateway"]
        POLICY["Policy Server (OPA/Gatekeeper)"]
        AUTHZ["Autonomy Authorization Service"]
        EVAL["Eval / Verification Service"]
        SANDBOX["Sandbox Execution (gVisor/Kata)"]
    end
    subgraph MS["4 · Model-Serving Plane"]
        VLLM["vLLM"]
        TRITON["Triton + TensorRT-LLM"]
        KSERVE["KServe + multi-LoRA"]
    end
    subgraph DK["5 · Data / Knowledge Plane"]
        KG["Code Knowledge Graph"]
        VEC["Vector / Embedding Store"]
        REG["Regulatory Corpus"]
        REGISTRY["MLflow Model Registry & Lineage"]
    end
    subgraph PI["6 · Platform / Infra Plane"]
        K8S["Kubernetes + GPU Operator"]
        RAY["Ray + Kueue"]
        MESH["Istio + SPIFFE/SPIRE"]
        VAULT["Vault · KEDA · Argo"]
    end
    subgraph GA["7 · Governance / Audit Plane"]
        WORM["WORM Evidence Store (Part 11)"]
        OTEL["OpenTelemetry pipeline"]
        FINOPS["FinOps / cost ledger"]
        SUPPLY["Sigstore/cosign · SLSA · SBOM"]
    end

    DI --> AO --> HC
    HC --> MS
    HC --> DK
    AO --> DK
    MS --> PI
    HC --> PI
    AO -.evidence.-> GA
    HC -.evidence.-> GA
    MS -.telemetry.-> GA

Plane responsibilities at a glance:

PlaneOwnsDoes not own
Developer InterfaceIntent capture, review, approval surfacesModel selection, policy
Agent / OrchestrationTrajectory planning, tool calls, multi-agent coordinationInference, gating verdicts
Harness / ControlRouting, policy, autonomy authz, verification, gatingModel weights, business logic
Model-ServingInference, batching, LoRA hot-swapTrajectory, gating
Data / KnowledgeRetrieval, lineage, registryGeneration
Platform / InfraScheduling, identity, secrets, scalingDomain semantics
Governance / AuditImmutable evidence, telemetry, supply chainLive request handling

2. Logical components per plane#

2.1 Developer Interface Plane#

ComponentTechResponsibility
IDE integrationVS Code / JetBrains extensions, LSP bridgeInline intent capture, diff preview, approval prompts, trajectory visualization
Agent CLISelf-hosted CLI binary (mTLS to mesh)Headless agent invocation, batch tasks, CI usage
Review/Approval Web UIInternal SPA behind Istio + OIDCHITL review, autonomy-class approvals, evidence inspection
CI/CD triggersArgo Events + Argo WorkflowsEvent-driven agent runs (PR opened, requirement changed)

All interface clients are thin: they hold no model credentials and reach the cluster only through the mesh ingress with SPIFFE-issued identity.

2.2 Agent / Orchestration Plane#

ComponentTechResponsibility
Agent RuntimeCustom planner/executor on K8s Job/Pod, Ray actors for fan-outOwns the trajectory: plan → act → observe → repair loop
MCP Tool PlaneModel Context Protocol servers (one per capability)Typed, permissioned tool surface: repo.read, repo.write, test.run, kg.query, eval.submit, vault.lease
A2A busAgent-to-Agent protocol over the Istio meshSpecialist agents (coder, reviewer, test-author, requirements-tracer) coordinate

Tool calls never hit infrastructure directly; they are mediated by MCP servers that enforce per-tool scopes and emit evidence (P4).

2.3 Harness / Control Plane (the product, P5)#

ComponentTechResponsibility
Model-Routing GatewayCustom gateway + classifier (Tier-S model) in front of servingClassify request → select tier/LoRA → enforce cost budget
Policy ServerOPA/Gatekeeper + dedicated policy server (Rego bundles)Admission of tool calls, autonomy decisions, write permissions
Autonomy Authorization ServiceCustom service keyed to IEC 62304 classDecides allowed autonomy level per task (P3); Class C → dual human
Eval / Verification ServiceDeterministic eval harness (see 05)Runs gate suites; emits pass/fail with evidence
Sandbox ExecutiongVisor / Kata Containers, ephemeral namespacesHermetic build/test/exec of generated artifacts

2.3.1 Model-routing gateway — classification and tier selection#

flowchart LR
    REQ["Agent request<br/>(task + context budget)"] --> CLS{"Classifier<br/>(Tier-S Reflex)"}
    CLS -->|"trivial / lint / format"| S["Tier-S Reflex<br/>1-8B"]
    CLS -->|"bounded code edit / unit test"| M["Tier-M Worker<br/>14-34B"]
    CLS -->|"design / multi-file / reasoning"| L["Tier-L Reasoner<br/>70B+/MoE"]
    CLS -->|"diagram / DICOM / UI screenshot"| V["Tier-V Multimodal"]
    CLS -->|"retrieval / rerank"| E["Tier-E Embed/Rerank"]
    S & M & L & V & E --> BUDGET{"Cost-budget check<br/>(P6)"}
    BUDGET -->|"within budget"| SERVE["Serving plane"]
    BUDGET -->|"over budget"| DEGRADE["Downshift tier or queue"]

Routing inputs: task type, IEC 62304 class, context length, required latency SLO, remaining task cost budget, and the active fine-tuned LoRA adapter. The classifier itself is a cheap Tier-S model; its decision is logged as evidence so routing is auditable and reproducible.

2.4 Model-Serving Plane#

ComponentTechResponsibility
vLLMPagedAttention, continuous batchingHigh-throughput text generation, Tier-S/M/L
Triton + TensorRT-LLMCompiled engines, in-flight batchingLatency-critical / quantized serving
KServeInferenceService CRDs, multi-LoRA hot-swapStandardized serving surface, canary/shadow, autoscale

2.5 Data / Knowledge Plane#

ComponentTechResponsibility
Code Knowledge GraphSelf-hosted property graph (Neo4j / JanusGraph / NebulaGraph)Symbols, call graph, requirement→code→test traceability
Vector storeSelf-hosted (Qdrant / Milvus / Weaviate)ANN retrieval over code & docs using Tier-E embeddings
Regulatory corpusVersioned doc store + full-text (OpenSearch)IEC 62304 / ISO 13485 / 14971 / Part 11 reference text
Model Registry & LineageMLflowModel versions, fine-tune lineage, signed digests, stage

2.6 Platform / Infra Plane#

ComponentTechResponsibility
Cluster + GPUKubernetes + NVIDIA GPU Operator (MIG)Scheduling, GPU lifecycle, driver/DCGM
Distributed computeRay + KueueTraining, batch eval, fan-out inference jobs
Mesh & identityIstio + SPIFFE/SPIREmTLS, workload identity, zero-trust east-west
SecretsHashiCorp VaultDynamic short-lived secrets, transit, PKI
AutoscaleKEDAQueue-driven scaling, scale-to-zero for idle tiers
DeliveryArgo CD + Argo WorkflowsGitOps, pipeline orchestration

2.7 Governance / Audit Plane#

ComponentTechResponsibility
WORM evidence storeObject store with object-lock (immutable), append-only ledgerPart 11 records, trajectory dumps, gate verdicts
TelemetryOpenTelemetry collectors → metrics/traces/logs backendsEnd-to-end spans, cost attribution
FinOps ledgerCost attribution servicePer-task GPU-seconds, tokens, cost-per-green-PR
Supply chainSigstore/cosign + SLSA provenance + SBOMSigned images, models, and artifacts

3. End-to-end request / trajectory flow#

sequenceDiagram
    autonumber
    participant Dev as Developer (IDE/CLI)
    participant RT as Agent Runtime
    participant KG as Knowledge/RAG
    participant GW as Routing Gateway
    participant MS as Model Serving
    participant SB as Sandbox
    participant POL as Policy Server
    participant EV as Eval Service
    participant AU as Autonomy Authz
    participant AUD as WORM Audit

    Dev->>RT: Intent (task, repo, requirement ID)
    RT->>AU: Request autonomy level (IEC 62304 class)
    AU-->>RT: Allowed level (Class C → dual-human required)
    RT->>KG: Assemble context (graph + ANN + full-text)
    KG-->>RT: Ranked context bundle (+provenance)
    RT->>GW: Generation request (task + context)
    GW->>GW: Classify → select tier/LoRA → budget check (P6)
    GW->>MS: Route to tier
    MS-->>RT: Candidate artifact (diff/code/tests)
    RT->>SB: Hermetic build + test (Verify)
    SB-->>RT: Build/test results
    alt verification fails
        RT->>GW: Repair request (failure context)
        GW->>MS: Re-generate (Repair)
        MS-->>RT: Revised artifact
        RT->>SB: Re-verify
    end
    RT->>POL: Policy gate (writes, licenses, secrets)
    POL-->>RT: Allow / Deny + rationale
    RT->>EV: Eval gate (deterministic suite)
    EV-->>RT: Pass/Fail (≥99.9% threshold)
    RT->>Dev: HITL review (class-proportional)
    Dev-->>RT: Approve / reject (dual for Class C)
    RT->>Dev: Open PR (signed)
    RT->>AUD: Write immutable evidence (P4, Part 11)

Each step emits a span and a content-addressed evidence record. The trajectory is fully reconstructable from the audit store, satisfying P4 and 21 CFR Part 11.


4. Kubernetes deployment topology#

flowchart TB
    subgraph CTRL["Control / CPU node pool"]
        NS_AGENT["ns: agent-runtime"]
        NS_HARNESS["ns: harness-control"]
        NS_KNOW["ns: knowledge"]
        NS_GOV["ns: governance-audit"]
        NS_PLAT["ns: platform (Vault, Istio, Argo)"]
    end
    subgraph GPU["GPU node pools"]
        POOL_S["pool: reflex (MIG 1g.10gb / L4)"]
        POOL_M["pool: worker (A10/L40S)"]
        POOL_L["pool: reasoner (H100/H200, NVLink)"]
        POOL_TRAIN["pool: train/batch (Kueue-managed)"]
    end
    subgraph SANDBOX["Sandbox node pool (CPU, isolated)"]
        NS_SB["ns: sandbox-exec (gVisor/Kata, no egress)"]
    end

    NS_HARNESS -->|route| POOL_S & POOL_M & POOL_L
    NS_AGENT --> NS_SB
    POOL_TRAIN --- RAY["Ray + Kueue queues"]

4.1 Namespaces#

NamespaceContentsNetwork policy
agent-runtimeAgent pods, A2A busEgress only to MCP, gateway, knowledge
harness-controlGateway, policy server, autonomy authz, evalEgress to serving + knowledge; ingress from agents
model-serving-{s,m,l,v,e}vLLM/Triton/KServe per tierIngress only from gateway
knowledgeKG, vector store, OpenSearch, MLflowIngress from agents/harness
sandbox-execgVisor/Kata podsDefault-deny all egress; ephemeral
governance-auditWORM store, OTel, FinOpsAppend-only ingest
platformVault, Istio control plane, Argo, SPIRECluster-internal

4.2 Node pools, MIG, and scheduling#

PoolHardware (example)MIGScaling
reflex (Tier-S)L4 / A101g.10gb partitions for high pod densityKEDA, scale-to-zero off-hours
worker (Tier-M)L40S / A10optional MIGKEDA queue-driven
reasoner (Tier-L)H100/H200, NVLink + GPUDirectfull GPU, tensor/pipeline parallelconservative; warm pool ≥1
train/batchH100 multi-nodefull GPUKueue gang-scheduling, preemptible
  • GPU Operator manages drivers, DCGM exporters, MIG geometry, and time-slicing where MIG is too coarse.
  • Kueue provides quota-managed queues for training and batch eval, with ClusterQueue/LocalQueue and gang scheduling for multi-node Tier-L fine-tunes.
  • KEDA scales serving deployments off MCP/gateway queue depth and supports scale-to-zero for idle Tier-V/Tier-L adapters — central to P6.
  • NetworkPolicies enforce default-deny; sandbox namespace is fully air-gapped from cluster services and the internet.

4.3 Multi-cluster, sovereign-VPC, and air-gap#

flowchart LR
    subgraph SOV["Sovereign region cluster"]
        direction TB
        PROD["prod (serving + audit)"]
        VAL["validation"]
    end
    subgraph DEV["Dev cluster"]
        DEVNS["dev / experimentation"]
    end
    MIRROR["Artifact mirror<br/>(images · models · pkgs)"]
    DEV -. promote (signed) .-> VAL
    VAL -. promote (signed) .-> PROD
    MIRROR --> SOV
    MIRROR --> DEV
  • Sovereign-VPC: prod and validation run in a customer-controlled region/VPC; no cross-border data flow.
  • Multi-cluster: dev separated from validation/prod clusters; promotion is signed-artifact-only (cosign verified at admission).
  • Air-gap option: every dependency (base images, model weights, OS packages, eval datasets) is mirrored into an internal registry. No runtime reaches the public internet. The architecture has no hard internet dependency at request time (C5/P7).

5. Model-serving subsystem in depth#

5.1 Tier → hardware mapping#

TierModels (open-weight, fine-tuned)HardwareServingQuant
Tier-S "Reflex" 1-8BQwen2.5-Coder-1.5B/7B, Llama-3.2-3BL4 / MIG sliceTriton+TRT-LLMFP8 / INT8
Tier-M "Worker" 14-34BQwen2.5-Coder-32B, StarCoder2-15B, DeepSeek-Coder-V2-LiteL40S / A10vLLMFP8 / AWQ-INT4
Tier-L "Reasoner" 70B+/MoELlama-3.3-70B, Qwen2.5-72B, DeepSeek-V3/R1-distill, MixtralH100/H200 NVLinkvLLM / TRT-LLMFP8 / GPTQ
Tier-V "Multimodal"Qwen2.5-VL, Llama-3.2-Vision, InternVL, PixtralL40S/H100vLLMFP8
Tier-E "Embed/Rerank"bge, gte, jina-code, nomicL4 / CPUTriton / TEIINT8

5.2 Serving techniques#

TechniqueApplied wherePurpose
Continuous / in-flight batchingvLLM, TritonThroughput; amortize GPU (P6)
PagedAttention KV-cachevLLMMemory efficiency, longer context
Speculative decodingTier-L with Tier-S drafterLower latency on reasoner
Quantization (FP8/AWQ/GPTQ)all tiersFit larger models, more density
Multi-LoRA hot-swapKServe/vLLMMany fine-tuned adapters per base; per-task adapter without reload
Tensor/pipeline parallelTier-LServe 70B+/MoE across GPUs

5.3 Lifecycle: autoscale, hot-swap, canary/shadow#

flowchart LR
    REGY["MLflow registry<br/>(signed digest)"] -->|promote| KS["KServe InferenceService"]
    KS --> CANARY["Canary 5%"]
    KS --> STABLE["Stable 95%"]
    SHADOW["Shadow (mirror, no user impact)"] -.eval.-> EVAL["Eval Service"]
    EVAL -->|pass ≥99.9%| ROLL["Promote canary→stable"]
    EVAL -->|fail| HALT["Halt + rollback"]
  • New adapters/models enter as shadow (traffic mirrored, outputs eval'd offline), then canary (small %), then stable — each gated by the deterministic eval suite.
  • KEDA autoscales each tier on queue depth; idle adapters scale to zero, base engines retain a warm minimum.
  • Every promotion verifies a cosign signature and a pinned model digest so prod inference is reproducible (P2, P7).

6. Data & knowledge subsystem#

flowchart TB
    SRC["Sources: repos · requirements · DHF · regs"] --> ING["Ingestion + sanitization<br/>(PII/secret scrub, license tag)"]
    ING --> KG["Code Knowledge Graph"]
    ING --> EMB["Embeddings (Tier-E)"]
    ING --> FTS["Full-text index (OpenSearch)"]
    EMB --> VEC["Vector store"]
    QRY["Retrieval orchestrator"] --> KG
    QRY --> VEC
    QRY --> FTS
    KG & VEC & FTS --> FUSE["Fusion + rerank (Tier-E)"]
    FUSE --> CTX["Context bundle + provenance"]

6.1 Ingestion & sanitization#

Sources (repos, requirements/DHF, regulatory corpus) pass through ingestion that scrubs secrets/PII, tags license and IEC 62304 class, and content-addresses each chunk so retrieval is reproducible and auditable.

6.2 Retrieval modes#

ModeBackendUse
Graph traversalCode Knowledge Graph (self-hosted property graph)Call graph, requirement→code→test traceability, blast-radius
ANN (semantic)Vector store (Qdrant/Milvus)Similar code, prior solutions, doc semantics
Full-text (lexical)OpenSearchExact symbols, error strings, reg clauses

Results are fused and reranked by a Tier-E model. The provenance of every retrieved chunk (source, version, digest) travels with the context bundle into the trajectory and the audit record (P4).

The code knowledge graph is the self-hosted analogue of a managed graph-of-code service (Spanner-graph-class); it must be operable inside the sovereign/air-gapped boundary.


7. Control / governance plane#

7.1 Evidence production (P4 / Part 11)#

Every plane emits structured evidence to the governance plane:

EvidenceProducerStored
Intent + autonomy decisionAgent runtime + autonomy authzWORM
Context bundle + provenanceKnowledge planeWORM
Routing/classification decisionGatewayWORM
Model digest + LoRA used + tokens/GPU-sServing + FinOpsWORM + ledger
Sandbox build/test resultsSandboxWORM
Policy verdict + rationalePolicy serverWORM
Eval verdict + dataset digestEval serviceWORM
HITL approver identity + signatureReview UIWORM

Records are written to an immutable, append-only, object-locked (WORM) store, time-stamped and attributable, satisfying 21 CFR Part 11 electronic-records/signatures and GAMP 5 traceability.

7.2 Runtime policy & autonomy enforcement (P3)#

flowchart LR
    ACT["Agent action / tool call"] --> PEP["MCP enforcement point"]
    PEP --> OPA["Policy server (Rego)"]
    PEP --> AZ["Autonomy authz (IEC 62304 class)"]
    OPA -->|deny| BLOCK["Block + evidence"]
    AZ -->|Class C| DUAL["Require dual human control"]
    AZ -->|Class A/B| LEVEL["Apply allowed autonomy"]
    OPA -->|allow| EXEC["Execute"]
    DUAL --> EXEC
    LEVEL --> EXEC
  • Policy is deterministic (Rego bundles, versioned, signed) wrapping the probabilistic agent (P2).
  • Autonomy is risk-proportional: Class A/B may allow higher automation; Class C always requires dual human control before any write/promote.
  • Identity for every actor (human or workload) is SPIFFE/SPIRE-issued; secrets are short-lived Vault leases.

8. Reference environments & promotion#

EnvironmentPurposeModelsDataGate to next
devExperimentation, adapter devlatest candidate LoRAssynthetic/maskedunit + shadow eval pass
validationFormal V&V (CSA/GAMP 5)release-candidate, pinned digestsmasked production-likefull deterministic suite ≥99.9%, signed
prodLive agentic SDLConly validated, signed modelsreal (sovereign)n/a
flowchart LR
    DEV["dev"] -->|signed artifact + eval pass| VAL["validation"]
    VAL -->|full V&V + cosign verify| PROD["prod"]
    PROD -. rollback (pinned prior digest) .-> PROD

Promotion is GitOps + signed-artifact only: Argo CD reconciles only cosign-verified images and MLflow-registered model digests; admission control (Gatekeeper) rejects anything unsigned or off-registry. In air-gapped/regulated networks, promotion crosses the boundary as a signed, mirrored bundle — never a live pull.


9. Build-vs-buy and self-hosting rationale#

ConcernDecisionRationale (ties to 08)
LLM inferenceBuild/host (open-weight)C1/P7: sovereignty, reproducibility, no PHI/IP egress; predictable cost-per-green-PR
Model fine-tuningBuild (Ray+Kueue)Domain/device-specific quality; full lineage in MLflow
Orchestration & harnessBuildP5: the harness is the differentiator and the 99.9% lever
Serving runtimeBuy/adopt OSS (vLLM/Triton/KServe)Mature, self-hostable, no SaaS lock-in
Knowledge graph / vector / searchBuy/adopt OSS, self-hostOperable in air-gap; avoids managed-SaaS data residency issues
Identity/secrets/meshAdopt OSS (SPIRE/Vault/Istio)Zero-trust standard, self-hostable
Supply chainAdopt OSS (Sigstore/SLSA)Required for §524B / reproducibility

The economic case (GPU amortization via batching, MIG density, scale-to-zero, tiered routing) is developed in 08-token-and-gpu-economics.md. The architecture is intentionally biased toward owning the harness and hosting the models, and adopting mature OSS for undifferentiated platform layers.


10. Architecture-to-maturity mapping (ASMM-Med)#

Which components must be operational at each level (see 02-maturity-model.md):

Component / capabilityL1 Governed AssistanceL2 Spec-Driven BoundedL3 Orchestrated AgenticL4 Validated AutonomousL5 Self-Optimizing
Self-hosted serving (vLLM/Triton)
Tiered fleet + routing gateway
Multi-LoRA hot-swap
Knowledge plane (KG+vector+FTS)
Sandbox verify (Generate→Verify)
Deterministic eval gate (≥99.9%)
Policy server + autonomy authz (P3)
Agent runtime + MCP tool plane
A2A multi-agent
WORM evidence / Part 11 (P4)
FinOps cost-per-green-PR (P6)
Canary/shadow + auto-promotion
Closed-loop self-optimization

Legend: ● required · ◐ partial/emerging · — not yet.

Adoption sequencing for these capabilities is detailed in 09-adoption-roadmap.md.


Appendix A — Component-to-principle traceability#

ComponentP1P2P3P4P5P6P7
Routing gateway
Sandbox verify
Eval gate
Policy + autonomy authz
WORM evidence store
Model serving (self-hosted)
FinOps ledger

End of document — 03-reference-architecture.md