Security & Compliance — Agentic SDLC for Regulated Medical Device Engineering#

Audience: CISO and security architecture, Quality/Regulatory (QA/RA), MLOps/Platform, and an FDA / Notified Body auditor. Scope: The security and compliance control set for AI agents that build, test, document, and maintain regulated medical-device software, running Kubernetes-native with self-hosted, fine-tuned open-weight models only (no Claude/OpenAI/Gemini SaaS APIs — hard-blocked at the network layer). Date: May 2026. All numeric thresholds in this document are placeholders pending organizational calibration. Companion docs: 01-requirements · 02-maturity-model · 03-reference-architecture · 04-model-strategy-and-finetuning · 05-evaluation-and-validation · 06-agentic-workflows · 08-token-and-gpu-economics · 09-adoption-roadmap

0. Purpose and framing#

This document is the security and compliance register for the agentic SDLC. It assumes the seven principles defined in 02-maturity-model — in particular P3 (risk-proportional autonomy by IEC 62304 class), P4 (everything an agent does is evidence, 21 CFR Part 11-grade), P5 (the harness is the product), and P7 (self-hosted, sovereign, reproducible).

The governing security thesis: a probabilistic agent is an untrusted actor inside a zero-trust system. We do not trust the model to behave; we constrain what any model can do via deterministic structural controls, sandboxing, and signed supply-chain assurance, and we make every action attributable and replayable so it survives an audit or a recall investigation.

Two regulatory and security framings run in parallel and must never be conflated (see §9):

Track A — AI that BUILDS the device: production/Quality-System tooling. Validated under FDA Computer Software Assurance (CSA), ISO 13485/QMSR, 21 CFR Part 11. This framework's primary focus.
Track B — AI shipped INSIDE the device: SaMD / AI-enabled device function, submission-bearing, governed by EU AI Act high-risk obligations, FDA premarket cybersecurity, and a Predetermined Change Control Plan (PCCP).

Threat frameworks of record: OWASP Top 10 for LLM Applications (2025) and MITRE ATLAS. Standards anchors: 21 CFR Part 11, IEC 62304, ISO 13485/QMSR, ISO 14971, ISO/IEC 42001 (AI management system), IEC 62443 (cyber for medical/industrial), FDA premarket cybersecurity guidance, FD&C Act §524B (SBOM + vulnerability management), EU MDR + EU AI Act.

1. Threat model for agentic dev in a regulated org#

1.1 Assets under protection#

Asset	Why it matters	Loss class
Source IP (regulated product source, DHF, algorithms)	Core competitive + regulated artifact	Confidentiality, Integrity
PHI/PII	May appear in test fixtures, bug repros, logs, support data	Confidentiality (HIPAA/GDPR), regulatory breach
Model weights + adapters (fleet S/M/L/V/E)	Fine-tuned on proprietary corpora; theft = IP loss + cloneable behavior	Confidentiality, Integrity
Signing keys (Sigstore/cosign, Vault transit, SPIRE CA)	Root of all provenance trust; compromise forges everything downstream	Integrity (catastrophic)
The regulated product itself	A malicious or erroneous agent commit can injure patients	Safety, Integrity
Audit/evidence store (WORM logs)	The Part 11 record of truth; if tamperable, nothing is defensible	Integrity, Non-repudiation
Eval gold sets (see 05)	Poisoning the eval = silently lowering the release gate	Integrity

1.2 Adversaries#

Adversary	Capability	Primary objective
External attacker	Network probing, supply-chain injection, poisoned public repos/docs	Exfiltration, foothold, weight theft
Malicious insider	Authenticated dev/operator access	IP theft, sabotage, gate bypass
Compromised dependency / model	Trojaned open-weight base model, poisoned dataset, malicious package	Backdoor in shipped product
The agent itself (untrusted-by-design)	Whatever tools it is granted; subject to injection/poisoning	Unintended/rogue action, data exfil
Negligent user	Over-broad prompts, pasting PHI, approving without review	Accidental leak, gate erosion

1.3 Agent-specific attack surface → framework mapping#

Threat	Description in our context	OWASP LLM Top 10	MITRE ATLAS
Prompt injection	Malicious instructions in an issue, code comment, PR body, or doc the agent reads	LLM01	AML.T0051 (LLM Prompt Injection)
Context poisoning	Tainted retrieval source / repo seeds the agent's working context	LLM01, LLM08	AML.T0070 (RAG poisoning)
Tool misuse	Agent invokes a granted tool with harmful args (e.g., mass email, prod write)	LLM06 (Excessive Agency)	AML.T0053 (LLM Plugin Compromise)
Data exfiltration	Source/PHI/weights leak via tool output, egress, or model channel	LLM02 (Sensitive Info Disclosure)	AML.T0024, AML.T0057
Model supply-chain	Trojaned base weights / poisoned adapter / malicious dataset	LLM03 (Supply Chain)	AML.T0010 (ML Supply Chain Compromise)
Rogue autonomous action	High-autonomy agent takes an unbounded irreversible action	LLM06	AML.T0048 (External Harms)
Training-data poisoning	Corrupted fine-tune corpus embeds a backdoor or skews behavior	LLM04 (Data/Model Poisoning)	AML.T0020 (Poison Training Data)
Insecure output handling	Unsanitized agent output executed downstream (e.g., shell, SQL)	LLM05	AML.T0050
System prompt / config leak	Harness policy/secrets exposed via model	LLM07	AML.T0056

1.4 Trust-boundary diagram#

flowchart TB
  subgraph EXT["UNTRUSTED — external"]
    PUB[Public repos / docs / packages]
    SAAS[(External LLM APIs — HARD BLOCKED)]
    ADV([External attacker])
  end

  subgraph EDGE["TB1: Org perimeter — egress deny-by-default"]
    EG{{Egress allow-list proxy}}
    SAAS -. "BLOCKED at L3/L7" .-x EG
  end

  subgraph MESH["TB2: K8s + Istio mTLS mesh — SPIFFE/SPIRE identity"]
    direction TB
    subgraph CTRL["Control plane (trusted)"]
      POL[Policy Server\nstructural + semantic gating]
      OPA[OPA / Gatekeeper admission]
      VAULT[(HashiCorp Vault)]
      AUD[(WORM audit / evidence store)]
      REG[(Signed model + artifact registry)]
    end
    subgraph SBX["TB3: Agent sandbox — ephemeral, low-priv"]
      AGENT([Dev agent\ngVisor/Kata])
      TOOLS[Tools: git, build, test, browser, term]
    end
    INF[[Self-hosted inference\nfleet S/M/L/V/E]]
  end

  subgraph REG_ASSETS["TB4: Regulated assets (highest trust)"]
    SRC[(Source IP / DHF)]
    PROD[(Regulated product / release branch)]
    KEYS[(Signing keys / SPIRE CA)]
  end

  PUB --> EG --> AGENT
  AGENT -- "every tool call" --> POL
  POL -- allow/deny/sanitize --> TOOLS
  POL --> AUD
  AGENT <-->|mTLS| INF
  TOOLS -->|Vault-brokered, short-TTL| VAULT
  AGENT -. "no default write" .-x PROD
  POL -- "Class C: dual human control" --> PROD
  REG --> OPA --> SBX
  ADV -.-x EDGE

Trust boundaries: TB1 perimeter (egress control), TB2 mesh (identity + mTLS), TB3 sandbox (blast-radius containment), TB4 regulated assets (signed, dual-controlled). An agent never crosses from TB3 to TB4 except through the Policy Server and (for Class B/C) human authorization.

2. Zero-trust architecture#

Zero trust here means: no implicit trust by network location; every workload authenticates; every call is authorized; deny by default.

Control	Implementation	What it enforces
Workload identity	SPIFFE/SPIRE — every agent, tool, model server gets a SPIFFE ID (SVID), attested at startup, short-TTL, auto-rotated	No shared service accounts; every action attributable to a cryptographic identity (feeds P4 attribution)
mTLS everywhere	Istio service mesh; `PeerAuthentication: STRICT` mesh-wide	No cleartext intra-mesh traffic; no spoofed peers
Least-privilege RBAC	K8s RBAC + AuthorizationPolicy keyed on SPIFFE ID; agents get only the namespaces/tools their role requires	An agent role cannot reach services outside its task scope
Network deny-by-default	Default-deny NetworkPolicy (Cilium); explicit allow per workload pair	Lateral movement blocked; sandbox cannot reach the audit store directly
Egress allow-list	L3/L7 egress proxy; allow-list of approved internal endpoints only	Exfiltration channel closed
Hard block on external LLM endpoints	Egress proxy + DNS sinkhole deny `api.openai.com`, `.anthropic.com`, `generativelanguage.googleapis.com`, etc.; alert + auto-quarantine on attempt*	Enforces P7 sovereignty; an injected agent cannot phone an external model out

The external-LLM block is both a control and a detector: any attempt is treated as a potential prompt-injection/exfil indicator and raises a security incident (§11).

3. The Policy Server in depth#

The Policy Server is the deterministic chokepoint through which every tool call passes before execution. It realizes principle P2 (determinism wraps probabilism) at the action boundary. It has two stages.

3.1 Structural gating (deterministic, policy-as-code)#

Rules are pure functions of (role, environment, tool, args, safety_class) — no model in the loop, fully testable, version-controlled, signed. This is the authoritative, non-bypassable layer.

yaml

# policies.yaml  (illustrative — versioned, cosign-signed, loaded read-only)
version: 2026.05
defaults:
  effect: deny            # deny-by-default
roles:
  dev-agent-classA:
    allow:
      - tool: git.read
      - tool: build.run
      - tool: test.run
      - tool: pr.propose          # propose only, never merge
      - tool: browser.fetch
        constraints: { url_allowlist: ["internal-docs", "approved-mirror"] }
    deny:
      - tool: email.send           # structurally forbidden for any dev agent
      - tool: prod.write
  dev-agent-classC:
    allow:
      - tool: git.read
      - tool: test.run
      - tool: pr.propose
    require:
      - control: dual_human_authorization   # see §10 matrix
    deny:
      - tool: "*.write"
      - tool: email.send
      - tool: browser.fetch        # no external context for Class C work
environments:
  prod:
    deny:
      - tool: "*.write"
        unless_role: ["release-bot-signed"]
        and_control: dual_human_authorization
guards:
  - id: secret-egress
    match: { args_contains_secret: true }
    effect: deny
  - id: pii-in-args
    match: { semantic.pii_detected: true }   # result from semantic stage
    effect: sanitize                          # mask then allow, or escalate

Representative invariants (canonical examples reused across docs): role email.send is never granted to a dev agent; prod.write requires a signed release role and dual human authorization; Class C work forbids external browser context.

3.2 Semantic gating (intent / PII check before tool execution)#

A secondary self-hosted LLM (a small fleet-tier model, see 04) inspects the intent and content of a tool call that passed structural rules but warrants judgment — e.g., does this pr.propose body or browser.fetch argument carry PHI/PII, exfiltration intent, or signs of injection? Semantic gating is advisory and escalating, never a sole gate (P2): it can downgrade (sanitize) or escalate (require human), but it can never grant something structural rules denied.

3.3 Interception pseudo-flow#

text

on tool_call(agent_id, role, env, tool, args):
    record = open_evidence_span(agent_id, role, tool, args_hash)   # P4

    # STAGE 1 — structural (deterministic, authoritative)
    s = structural_eval(role, env, tool, args, safety_class)
    if s == DENY:
        emit_evidence(record, decision=DENY, stage=structural); return BLOCKED

    # STAGE 2 — semantic (judgment; PII/intent/injection)
    sem = semantic_model.assess(tool, args, retrieval_provenance)
    if sem.pii or sem.exfil_intent or sem.injection_signal:
        if policy.allows_sanitize(tool):
            args = mask_placeholders(args)        # [[VAR]] injection, §5
            emit_evidence(record, decision=SANITIZE, findings=sem)
        else:
            emit_evidence(record, decision=ESCALATE, findings=sem)
            return REQUIRE_HUMAN(record)

    # STAGE 3 — control requirements (autonomy matrix, §10)
    if requires_dual_control(role, env, safety_class):
        emit_evidence(record, decision=PENDING_DUAL_CONTROL)
        return REQUIRE_HUMAN(record, control=DUAL)

    emit_evidence(record, decision=ALLOW)
    return EXECUTE(tool, args)

Every branch emits evidence: input hash, structural verdict, semantic findings, sanitization diff, human-decision pointer, model+policy versions. This record is the Part 11 artifact (§7).

4. Agent sandboxing & blast-radius control#

The agent runs untrusted-by-design; the sandbox guarantees that even a fully-compromised agent has a small, recoverable blast radius.

Control	Implementation
Ephemeral runtime	One agent run = one fresh ephemeral namespace + pod, torn down on completion; no persistence across runs
Kernel-isolated sandbox	gVisor (default) or Kata Containers (stronger isolation for V/E tiers or external-content tasks) — syscall surface contained
No prod write by default	Sandbox SVID has zero write capability to release branches / prod; writes only via Policy Server + signed release role + human control
Egress control	Per-sandbox egress allow-list (§2); browser/term tools route through the inspecting proxy
Terminal & browser isolation	`term` and `browser.fetch` tools run in a separate isolation domain; fetched content is untrusted input subject to context hygiene (§5) and cannot self-execute
Secrets via Vault	No long-lived secrets in env/image; HashiCorp Vault brokers short-TTL, narrowly-scoped, dynamic credentials; Vault audit log feeds the WORM store
Kill-switches	Per-agent and fleet-wide kill-switch: revoke SVID (SPIRE) → mesh denies all calls instantly; circuit-breakers on anomalous tool-call rate; "freeze on novel egress" tripwire
Resource bounds	CPU/GPU/wall-clock/tool-call quotas — caps runaway loops (cost + blast radius; see 08)

5. Prompt-injection, context hygiene & PII/PHI protection#

Treat all model-facing content not authored by the harness as hostile input.

Input sanitization & provenance. Retrieved context is wrapped with provenance and trust labels; instructions embedded in data (issues, comments, fetched docs) are demarcated and not treated as commands. Retrieval is restricted to a source allow-list (approved internal repos/doc stores); arbitrary web/repos are off the path for regulated work, eliminating most context-poisoning vectors.

Placeholder injection — the [[VAR]] pattern (context hygiene middleware). Before any PHI/PII/secret-bearing content enters a prompt, a deterministic middleware masks sensitive spans into typed placeholders and keeps the mapping in a secure side-table the model never sees:

text

RAW:    Patient John Doe (MRN 55512) reports error E13 at 10.0.4.7
MASKED: Patient [[NAME_1]] (MRN [[ID_1]]) reports error E13 at [[IP_1]]
        side-table (Vault-sealed): NAME_1→"John Doe", ID_1→"55512", IP_1→"10.0.4.7"

The model reasons over placeholders; on output, only authorized placeholders are re-hydrated, and only into allow-listed sinks. PHI never reaches the model, never lands in logs/eval sets in cleartext, and cannot leak through the model channel.

Output sanitization & insecure-output-handling defense. Agent output destined for a downstream interpreter (shell, SQL, code) is schema-validated and never auto-executed without passing the Policy Server; outputs are scanned for residual PII and for re-injection patterns.

Defense against poisoned repos/docs. Source allow-listing + signed dependencies (§6) + semantic injection detection (§3.2) + the rule that data is never instruction. A poisoned README cannot redirect the agent's authority because authority lives in structural policy, not in text.

The "rogue agent emails 50 colleagues" failure class. Worked example of defense-in-depth:

Structural deny: email.send is not in any dev-agent role (§3.1) — the tool literally cannot be invoked.
Even if a privileged role had it: semantic gate flags bulk-recipient/exfil intent → escalate.
Egress allow-list: the SMTP endpoint is not reachable from the sandbox.
Kill-switch: anomalous tool-call burst trips the circuit-breaker and revokes the SVID.
Evidence: the attempt is recorded as a security incident (§11).

No single control is trusted; the action requires all of them to fail simultaneously.

6. Supply-chain assurance#

Provenance is required for code, models, AND datasets — models and data are first-class regulated supply-chain artifacts. Aligns to FD&C Act §524B and FDA premarket cybersecurity.

Artifact	Signing	Provenance	SBOM	Admission check
Code / container images	Sigstore/cosign	SLSA build provenance (L3 target)	CycloneDX SBOM	Gatekeeper verifies signature + provenance
Model weights + adapters	cosign-signed digest	Build/train provenance (base model lineage, fine-tune run ID)	Model SBOM (base model, datasets, hyperparams, eval hash)	Unsigned/unknown model rejected at admission
Datasets	cosign-signed manifest + hash	Source lineage, consent/PHI-handling attestation	Dataset card / data SBOM	Untrusted dataset cannot enter a training run
Eval gold sets	signed, version-pinned	provenance to authoring QA	included	tamper = gate integrity incident (05)

Admission enforcement. OPA/Gatekeeper admission policy: no pod runs a container or loads a model whose cosign signature and SLSA provenance do not verify against the trusted key set (Vault/SPIRE-rooted). Reproducible builds (P7) mean any shipped artifact — code or model — can be regenerated bit-for-bit and defended in an audit or recall. Vulnerability management (continuous SBOM scanning, KEV/CVE feeds) satisfies the §524B postmarket obligation for Track B artifacts and the QS obligation for Track A tooling.

7. Records, audit & 21 CFR Part 11#

Per P4, every agent action is evidence. The evidence record is the regulatory product of the agent, not a byproduct.

What is recorded for every step (immutable, attributable, replayable):

Field	Source	Part 11 role
Prompt + full context bundle (hashed; PHI masked)	harness	reconstructs what the agent saw
Model tier + weights digest + adapter version	registry (§6)	"which software produced this"
Tool call + args (sanitized) + Policy Server verdict	Policy Server (§3)	authorization record
Verifier/eval results	gates (05)	objective evidence of correctness
Human decision + e-signature (who, when, meaning)	review system	21 CFR Part 11 §11.50/11.70
SPIFFE identity of every actor	SPIRE	attribution / non-repudiation
Policy version + semantic-model version	Policy Server	change-control linkage

Storage: WORM / immutable store, hash-chained (append-only, tamper-evident), time-synced. Replayability: because weights, adapters, prompts, and policy are all versioned and signed, any decision can be deterministically re-derived for an investigator. e-signatures bind a human's identity, timestamp, and the meaning of their action (reviewed / approved / authorized) to the record.

Audit & recall use: in a recall or FDA inspection, the WORM store answers "show me everything the agent did to this Class C module, who authorized it, what it saw, and prove it wasn't tampered" — with cryptographic non-repudiation. This is the evidentiary backbone the CSA validation (§8) certifies.

8. Validating the agent as regulated software (CSA)#

Under FDA Computer Software Assurance, the agentic harness is production/Quality-System software and is validated risk-proportionately — not exhaustively, but where it matters.

CSA element	Application here
Intended use	Defined per agent role (e.g., "propose unit tests for Class A modules"); autonomy bounded by §10 matrix
Risk-based assurance	Test effort scales with the impact of the agent's failure; Class C-touching agents get the deepest scrutiny (P3)
Security testing	Threat-led: each §1.3 threat has corresponding adversarial tests and red-team coverage (§11)
Threat-led validation	Validation cases derived from the threat model + OWASP LLM / ATLAS mappings, not just happy-path
Objective evidence	The §7 WORM record + the 05 eval evidence constitute validation evidence
Change control	Harness, policies, and model versions are controlled items; ISO/IEC 42001 governs the AI management system

The assurance argument ties directly to 05-evaluation-and-validation: deterministic eval + ≥99.9% release-gate correctness is the functional assurance; this document supplies the security assurance. Together they form the CSA validation package.

9. Two regulated tracks#

This is the distinction most frequently muddled — and the one an auditor will test.

Dimension	Track A — AI that BUILDS the device (this framework)	Track B — AI shipped INSIDE the device (SaMD / AI function)
What it is	Dev/test/doc agents = production & QS tooling	The model is part of the medical device / its output is a device function
Submission-bearing?	No — not in the 510(k)/PMA submission as a function	Yes — part of premarket submission
Primary regime	CSA, ISO 13485/QMSR, 21 CFR Part 11	IEC 62304, ISO 14971, FDA premarket cyber, EU AI Act high-risk, PCCP
Change control	QS change control; ISO/IEC 42001	Predetermined Change Control Plan (PCCP) — pre-authorized model-update envelope
Clinical evidence	Not required	Required (clinical validation of the AI function)
Failure consequence	Bad tooling → defective product (caught by gates)	Bad model → direct patient harm in the field

Shared assurance muscles (build once, apply to both): self-hosted signed model supply chain (§6), immutable evidence + Part 11 records (§7), threat-led validation (§8), drift/anomaly monitoring (§11), ISO/IEC 42001 AI governance. Where obligations diverge: Track B additionally owns clinical validation, a PCCP, premarket cybersecurity documentation, and EU AI Act high-risk conformity. This document governs Track A; it deliberately reuses controls that a Track B program will also need, but Track B's submission obligations are out of scope here.

10. Autonomy Authorization Matrix (canonical)#

This is the canonical autonomy matrix. It is referenced by 02-maturity-model and 06-agentic-workflows. It maps (ASMM-Med governing level × IEC 62304 safety class) → permitted agent action and required human control. Per P3, Class C is ALWAYS dual human control regardless of maturity level.

Action legend: Suggest (advisory only) · Propose-PR (opens a PR, no merge authority) · Auto-bounded (autonomous within signed, pre-authorized bounds) · Forbidden. Human control legend: None · Single review · Dual control (two qualified humans; author ≠ approver).

ASMM-Med level ↓ / IEC 62304 class →	Class A (no injury)	Class B (non-serious injury)	Class C (death / serious injury)
L0 Ad-hoc	Suggest / None	Suggest / Single review	Suggest / Dual control
L1 Governed Assistance	Suggest / None	Propose-PR / Single review	Propose-PR / Dual control
L2 Spec-Driven Bounded	Propose-PR / Single review	Propose-PR / Single review	Propose-PR / Dual control
L3 Orchestrated Agentic	Auto-bounded / Single review (post-hoc)	Propose-PR / Single review	Propose-PR / Dual control
L4 Validated Autonomous	Auto-bounded / None within bounds	Auto-bounded / Single review	Propose-PR / Dual control
L5 Self-Optimizing	Auto-bounded / None within bounds; sampled audit	Auto-bounded / Single review	Propose-PR / Dual control

Reading the matrix:

The leash lengthens with maturity (rows) but is capped by safety class (columns).
Class C never reaches Auto-bounded or "None." The highest Class C autonomy is Propose-PR under dual control — the agent proposes and evidences; two qualified humans author the merge decision (P3).
"Auto-bounded" requires the bounds to be signed, version-controlled policy enforced by the Policy Server (§3); outside the bounds, the agent escalates.
Every cell's enforcement is mechanical: the Policy Server reads (role→level, target→safety_class) and applies the corresponding require: control (§3.1).

11. Continuous security#

Security is a steady-state operation, not a one-time gate.

Capability	Implementation
Red-team agents	Standing adversarial agents continuously attempt prompt injection, context poisoning, tool misuse, and exfil against the live harness; findings feed §8 validation and §3 policy
Adversarial eval	OWASP-LLM / ATLAS-derived adversarial suites run in the deterministic eval pipeline (05); regressions block model/harness promotion
Drift & anomaly response	Monitor tool-call distributions, egress patterns, semantic-gate hit rates, and model-output drift; anomalies trip circuit-breakers (§4) and open incidents
Incident handling	Defined runbooks: SVID revocation, namespace freeze, fleet kill-switch, WORM-log forensic replay; incidents link to QMS CAPA
Secure model-update path	New weights/adapters: signed → SBOM'd → SLSA-provenanced (§6) → adversarial + functional eval gates (05) → Gatekeeper admission → staged rollout with rollback. For Track B models, this path executes within the PCCP envelope; for Track A, under QS change control + ISO/IEC 42001
Vulnerability management	Continuous SBOM/CVE scanning of code + model dependencies; §524B-aligned triage and disclosure

Appendix A — Control-to-standard traceability#

Control (this doc)	Standard / framework anchor
Policy Server, autonomy matrix (§3, §10)	IEC 62304 §5–§9, P3; CSA
Evidence / WORM records, e-signature (§7)	21 CFR Part 11, ISO 13485/QMSR
Zero-trust, mTLS, egress, sandboxing (§2, §4)	IEC 62443, FDA premarket cybersecurity
Supply chain: signing/SBOM/SLSA (§6)	FD&C Act §524B, SLSA, Sigstore
Threat model, red-team, adversarial eval (§1, §11)	OWASP LLM Top 10, MITRE ATLAS
AI management system, change control (§8, §11)	ISO/IEC 42001
PII/PHI masking, context hygiene (§5)	HIPAA, GDPR, ISO 14971 (risk)
Track A vs B, PCCP, high-risk (§9)	EU AI Act, EU MDR, FDA PCCP guidance

Cross-references: autonomy bounds and maturity levels — 02-maturity-model; harness/sandbox architecture — 03-reference-architecture; model/adapter signing and fleet — 04-model-strategy-and-finetuning; deterministic eval and validation evidence — 05-evaluation-and-validation; workflow-level human controls — 06-agentic-workflows; cost of controls — 08-token-and-gpu-economics.