01 — Requirements (Normative "Shall" Document)#
Project: Agentic-Native SDLC for Regulated Medical Device Engineering Status: Baseline v1.0 · Date context: May 2026 Classification: Internal Engineering / Quality Reference Related docs: 02-maturity-model.md · 03-reference-architecture.md · 04-model-strategy-and-finetuning.md · 05-evaluation-and-validation.md · 06-agentic-workflows.md · 07-security-and-compliance.md · 08-token-and-gpu-economics.md · 09-adoption-roadmap.md
1. Purpose, Scope, and How to Read Requirement IDs#
1.1 Purpose#
This document is the normative requirements baseline for an agentic-native software development lifecycle (SDLC) serving a 1000+ developer medical-device engineering organization. It defines, in testable "shall" form, what the platform, its model fleet, and its agents must do. It is the contract against which the architecture (03-reference-architecture.md), evaluation system (05-evaluation-and-validation.md), and compliance posture (07-security-and-compliance.md) are judged.
1.2 Scope#
| In scope | Out of scope |
|---|
| AI-assisted and AI-autonomous activities across the IEC 62304 software lifecycle (requirements → design → code → test → docs → review → maintenance) | Hardware design, electrical/mechanical CAD outside software-controlled subsystems |
| Self-hosted open-weight model fleet, serving, fine-tuning, evaluation, and orchestration | Procurement of SaaS LLM services (explicitly prohibited — see §11) |
| Governance, traceability, audit, and validation of the agentic tooling itself (CSA / GAMP 5) | Clinical trial design, regulatory submission authoring beyond software evidence |
| Security, observability, and FinOps for the agent platform | General IT, HR, or non-engineering enterprise systems |
1.3 Requirement ID scheme#
Each requirement has the form PREFIX-NNN, a MoSCoW priority, and a mapping to one ASMM-Med maturity level (L0–L5) and one or more of the eight dimensions (D1–D8).
| Prefix | Domain | Primary owner |
|---|
| FR | Functional — what agents do across the SDLC | Eng + QA/RA |
| NFR | Non-functional — performance, scale, determinism | MLOps/Platform |
| REG | Regulatory & quality | QA/RA |
| DATA | Data & knowledge governance | MLOps + Security |
| MODEL | Model fleet requirements | MLOps |
| EVAL | Evaluation & assurance | QA/RA + Eng |
| SEC | Security & zero-trust | Security |
| COST | FinOps & cost guardrails | Finance + Platform |
| OPS | Observability & operations | Platform |
MoSCoW priority: M = Must (release-blocking), S = Should, C = Could, W = Won't (this baseline). Conventions: "shall" = mandatory; "should" = recommended; numeric thresholds are org-set placeholders, not vendor claims, and are owned by the named accountable function. Every requirement is verifiable by inspection, demonstration, test, or analysis (method noted in §12).
Governing principles (referenced throughout): P1 99.9% is a system property (Generate→Verify→Repair→Gate); P2 determinism wraps probabilism; P3 risk-proportional autonomy by IEC 62304 class A/B/C; P4 everything an agent does is evidence; P5 the harness is the product; P6 cost per verified task; P7 self-hosted, sovereign, reproducible.
2. Stakeholders & Concerns#
| Stakeholder | Primary concerns | Key requirement families | Veto authority |
|---|
| Engineering (Dev) | Productivity, low latency, correct suggestions, low friction, not babysitting bad output | FR, NFR, OPS | No |
| QA / Regulatory Affairs (RA) | IEC 62304 conformance, traceability, validated tools, audit-ready records, escape rate | REG, EVAL, FR | Yes (release gate) |
| Security | Zero-trust, no data egress, supply-chain integrity, prompt-injection defense, secrets | SEC, DATA, MODEL | Yes (deploy gate) |
| MLOps / Platform | Reproducibility, model lifecycle, serving SLOs, multi-LoRA, GPU efficiency | MODEL, NFR, OPS, DATA | No |
| Clinical / Product | Safety-class correctness, requirement intent fidelity, time-to-market | FR, REG, EVAL | Yes (intent) |
| Finance | Cost-per-green-PR, GPU capex/opex, budget predictability | COST, OPS | Yes (budget) |
A requirement that any veto-holding stakeholder rejects cannot be marked "Accepted" in §12.
3. Functional Requirements (FR)#
All FRs are bounded by IEC 62304 safety class (A/B/C) and a defined review posture per P3. "Dual human control" = two qualified humans (author-reviewer separation) for Class C.
| ID | Requirement (shall) | Safety-class bounding | Review posture | MoSCoW | ASMM-Med | Dim |
|---|
| FR-001 | Agents shall assist requirements analysis: decompose, classify, detect ambiguity/conflict, and propose acceptance criteria from source specs (incl. PDF/diagram via Tier-V). | A/B/C: proposal only | Human approves all generated/edited requirements | M | L2 | D3,D5 |
| FR-002 | Agents shall generate design support artifacts (interface specs, sequence/architecture sketches, design-decision rationale) traceable to requirements. | A/B: draft; C: draft + dual review | Human-of-record signs design | M | L3 | D3,D5 |
| FR-003 | Agents shall perform code generation scoped to a spec/work item, emitting diffs, not silent edits. | A: auto-PR allowed; B: PR + 1 review; C: PR + dual review, no autonomous merge | Per class | M | L2→L4 | D5 |
| FR-004 | Agents shall perform test generation (unit/integration/property/boundary) mapped to requirements and risk controls (ISO 14971). | All classes: tests are evidence, human-confirmed coverage intent | Reviewer confirms adequacy | M | L2 | D4,D5 |
| FR-005 | Agents shall generate documentation (design history, SDS, API docs, traceability narratives) from code+spec, marked AI-authored. | A/B/C: draft | QA/RA approves controlled docs | M | L2 | D3 |
| FR-006 | Agents shall perform code review producing findings with severity, location, and rationale; review output is advisory, not a gate by itself (P1). | All classes | Augments, never replaces, human reviewer | M | L3 | D4,D5 |
| FR-007 | Agents shall perform refactoring with behavior-preservation evidence (test pass, diff semantics) attached. | A: auto; B: review; C: dual review | Per class | S | L3 | D5 |
| FR-008 | Agents shall perform code/dependency/platform migration with before/after equivalence evidence and rollback plan. | B/C: human-gated cutover | Migration plan signed by lead | S | L3 | D5 |
| FR-009 | Agents shall generate and maintain traceability links (requirement↔design↔code↔test↔risk) and flag gaps. | All classes | QA/RA owns final trace matrix | M | L3 | D1,D3,D4 |
| FR-010 | Every agent action shall produce a Generate→Verify→Repair→Gate record; an action with no verifier shall not pass the gate (P1). | All classes | System-enforced | M | L2 | D4 |
| FR-011 | Agents shall abstain ("I cannot safely complete this") and escalate when confidence/coverage thresholds are unmet, rather than emit low-assurance output. | All classes | Escalation routed to human | M | L1 | D4,D5 |
| FR-012 | Agents shall be orchestrated via the MCP tool plane and A2A for multi-agent workflows with declared, least-privilege tool scopes. | All classes | Policy-bounded | M | L3 | D5 |
| FR-013 | Agents shall produce risk-analysis support (hazard identification candidates, traceable to ISO 14971), human-confirmed. | A/B/C: proposal only | Risk owner confirms | S | L3 | D1 |
| FR-014 | The system shall support human-in-the-loop interrupt/override at any step, with reason captured. | All classes | Always available | M | L1 | D5,D8 |
| FR-015 | Agents shall route tasks across the tiered fleet (Reflex/Worker/Reasoner/Multimodal/Embedding) by task class and cost (P6). | All classes | System-enforced | S | L3 | D2,D5 |
4. Non-Functional Requirements (NFR)#
| ID | Requirement (shall) | Target (org-set placeholder) | MoSCoW | ASMM-Med | Dim |
|---|
| NFR-001 | Inline/autocomplete (Tier-S) latency shall meet p95 budget. | p95 ≤ 300 ms | M | L1 | D2,D7 |
| NFR-002 | Interactive agent step (Tier-M) first-token latency shall meet p95 budget. | p95 ≤ 2 s | M | L2 | D2 |
| NFR-003 | Reasoning/planning task (Tier-L) end-to-end latency shall meet budget for batch-acceptable workloads. | p95 ≤ 60 s | S | L3 | D2 |
| NFR-004 | Serving plane shall sustain org-wide concurrent throughput at peak. | ≥ 1000 concurrent dev sessions | M | L2 | D2,D7 |
| NFR-005 | Control/gate-path availability shall meet SLO. | ≥ 99.9% monthly | M | L2 | D7 |
| NFR-006 | Gate evaluation shall be deterministic and reproducible: identical inputs + pinned model/LoRA/seed/config → identical gate verdict (P2). | 100% verdict reproducibility | M | L4 | D4 |
| NFR-007 | Any generated artifact shall be reproducible from recorded {model digest, LoRA, prompt, context snapshot, params, seed} (P7). | 100% replayable | M | L4 | D2,D4 |
| NFR-008 | Platform shall scale to 1000+ developers via K8s horizontal scaling and KEDA autoscale; idle model pools shall scale to zero. | Linear cost-to-load to defined ceiling | M | L2 | D2,D7 |
| NFR-009 | Multi-LoRA hot-swap shall serve N task-specialized adapters per base without per-adapter cold redeploy. | ≥ defined adapters/base online | S | L3 | D2 |
| NFR-010 | Probabilistic model calls shall be wrapped by deterministic harness logic (validators, parsers, policy) so non-determinism cannot reach a gate verdict (P2). | No stochastic path to verdict | M | L2 | D4,D5 |
| NFR-011 | Recovery: on serving node/GPU failure, in-flight tasks shall be re-queued without evidence loss. | RTO ≤ defined; zero record loss | S | L2 | D2,D7 |
| NFR-012 | The harness (not just the model) shall be versioned and treated as the product unit (P5); harness changes shall be release-controlled. | 100% harness versioned | M | L3 | D5 |
5. Regulatory & Quality Requirements (REG)#
| ID | Requirement (shall) | Regulatory anchor | MoSCoW | ASMM-Med | Dim |
|---|
| REG-001 | The platform shall enforce the IEC 62304 software safety classification (A/B/C) as a first-class attribute gating autonomy (P3). | IEC 62304 | M | L2 | D1 |
| REG-002 | All AI-assisted lifecycle activities shall be validated under a risk-based Computer Software Assurance (CSA) approach proportional to intended use and risk. | FDA CSA, GAMP 5 (2nd ed) | M | L2 | D1,D4 |
| REG-003 | Every agent action shall be recorded as 21 CFR Part 11-grade evidence: attributable, immutable, time-stamped, and replayable (P4). | 21 CFR Part 11 | M | L2 | D1,D6 |
| REG-004 | The platform shall maintain end-to-end traceability (user need → requirement → design → code → test → risk control) and surface coverage gaps. | IEC 62304, ISO 13485/QMSR | M | L3 | D1,D3 |
| REG-005 | Each AI tool used in the lifecycle shall be subject to tool validation / qualification with documented intended use, acceptance, and re-validation triggers. | CSA, GAMP 5, 21 CFR 820 (QMSR, eff. Feb 2026) | M | L2 | D1,D4 |
| REG-006 | Risk management activities shall integrate ISO 14971; AI-proposed hazards/controls shall be human-confirmed before becoming controlled records. | ISO 14971 | M | L3 | D1 |
| REG-007 | The AI management system governing the fleet shall conform to an AI management system standard. | ISO/IEC 42001 | S | L3 | D1 |
| REG-008 | AI-enabled-device change management shall support a Predetermined Change Control Plan (PCCP) where models influence device behavior. | FDA AI-enabled device guidance + PCCP | S | L4 | D1 |
| REG-009 | The platform shall meet applicable EU MDR / EU AI Act obligations for high-risk AI used in device engineering. | EU MDR, EU AI Act | S | L3 | D1 |
| REG-010 | All controlled records shall have defined retention, version, and signature controls under the QMS. | ISO 13485 / FDA QMSR | M | L2 | D1 |
| REG-011 | Human accountability shall be preserved: a named qualified human of record shall sign every controlled output; AI is never the signer. | 21 CFR Part 11, IEC 62304 | M | L1 | D1,D8 |
6. Data & Knowledge Requirements (DATA)#
| ID | Requirement (shall) | MoSCoW | ASMM-Med | Dim |
|---|
| DATA-001 | Training, fine-tuning, and RAG corpora shall be governed: cataloged, licensed, owned, and approved before use. | M | L2 | D3 |
| DATA-002 | The platform shall enforce no external egress of source, specs, or model traffic; all inference, training, and storage are self-hosted (P7). | M | L1 | D3,D6 |
| DATA-003 | PII/PHI shall be detected (Tier-S redactor) and excluded/masked from corpora, prompts, logs, and evidence stores unless explicitly authorized and controlled. | M | L1 | D3,D6 |
| DATA-004 | Every datum used by an agent shall carry provenance (source, version, hash, retrieval timestamp) recorded in the action evidence. | M | L2 | D3,D4 |
| DATA-005 | Knowledge bases shall be versioned and snapshot-able so a retrieval context is reproducible for replay (links NFR-007). | M | L3 | D3 |
| DATA-006 | Corpus and evidence retention shall follow QMS record-retention policy; deletion shall be controlled and logged. | M | L2 | D1,D3 |
| DATA-007 | Corpus quality shall be monitored for drift, staleness, and poisoning; suspect sources shall be quarantined. | S | L4 | D3,D6 |
| DATA-008 | Embeddings/reranking (Tier-E) indices shall be access-controlled per project and safety class. | S | L3 | D3,D6 |
7. Model Requirements (MODEL)#
| ID | Requirement (shall) | MoSCoW | ASMM-Med | Dim |
|---|
| MODEL-001 | Only self-hosted open-weight models shall be used; no SaaS LLM API (Claude/OpenAI/Gemini or equivalent) in any lifecycle path (P7, §11). | M | L1 | D2 |
| MODEL-002 | All fleet models shall be fine-tunable in-house (full or PEFT/LoRA) on governed corpora. | M | L2 | D2 |
| MODEL-003 | Every model and adapter shall be signed and version-pinned (cosign), registered in MLflow with an immutable digest. | M | L2 | D2,D6 |
| MODEL-004 | Serving shall support multi-LoRA hot-swap across task-specialized adapters on shared base weights (vLLM/Triton+TensorRT-LLM/KServe). | M | L3 | D2 |
| MODEL-005 | Models shall support calibrated abstention — emitting a refusal/low-confidence signal the harness can act on (links FR-011). | M | L2 | D2,D4 |
| MODEL-006 | The fleet shall be tiered (Tier-S Reflex 1–8B, Tier-M Worker 14–34B, Tier-L Reasoner 70B+/MoE, Tier-V Multimodal, Tier-E Embedding/Rerank) with documented task→tier routing. | M | L3 | D2,D5 |
| MODEL-007 | Multimodal capability (Tier-V) shall ingest diagrams, imaging, PDF specs, and UI for FR-001/FR-002. | S | L3 | D2,D3 |
| MODEL-008 | Each model version shall pass acceptance evaluation before promotion to a serving channel (links EVAL-001). | M | L4 | D2,D4 |
| MODEL-009 | Model lineage (base → fine-tune dataset → adapter → deployed digest) shall be fully reproducible and recorded (P7). | M | L4 | D2 |
| MODEL-010 | Quantization/optimization (TensorRT-LLM) shall not degrade a model below its gated acceptance thresholds without re-validation. | S | L4 | D2,D4 |
8. Evaluation & Assurance Requirements (EVAL)#
| ID | Requirement (shall) | Target (org-set placeholder) | MoSCoW | ASMM-Med | Dim |
|---|
| EVAL-001 | Release gates shall be deterministic and produce a binary, reproducible verdict (P1, P2). | 100% reproducible | M | L4 | D4 |
| EVAL-002 | The system release-gate acceptance correctness shall meet the org threshold as a system property via Generate→Verify→Repair→Gate, not from any single model (P1). | ≥ 99.9% | M | L4 | D4 |
| EVAL-003 | Escape rate (defects passing the gate into controlled artifacts) shall be measured and bounded. | ≤ org-set ceiling | M | L4 | D4 |
| EVAL-004 | Golden datasets per task/safety-class shall exist, be versioned, and gate model/harness promotion. | 100% coverage of gated tasks | M | L4 | D4 |
| EVAL-005 | An LLM shall never be the sole gate; gates shall combine deterministic verifiers (build/test/static analysis/policy) with optional model judgment as advisory only (§11). | Enforced | M | L2 | D4 |
| EVAL-006 | Gate verifiers shall include compile/build, test execution, static analysis, and policy (OPA/Gatekeeper) checks. | All present | M | L3 | D1,D4 |
| EVAL-007 | Evaluation results shall be evidence (P4): stored immutable, attributable to model/harness digests, replayable. | 100% | M | L4 | D4 |
| EVAL-008 | Continuous evaluation shall detect model/behavioral drift post-deployment and trigger re-validation. | Monitored | S | L5 | D4 |
| EVAL-009 | Repair loops shall be bounded (max iterations/budget); on exhaustion the task shall escalate to human (links FR-011, COST). | Bounded | M | L2 | D4,D7 |
9. Security Requirements (SEC)#
| ID | Requirement (shall) | Mechanism | MoSCoW | ASMM-Med | Dim |
|---|
| SEC-001 | The platform shall be zero-trust: every workload identity authenticated and authorized per call. | Istio mesh + SPIFFE/SPIRE | M | L2 | D6 |
| SEC-002 | Agent code execution shall run in isolated sandboxes with no ambient credentials or network. | gVisor/Kata | M | L2 | D6 |
| SEC-003 | Supply chain shall be signed and attested: artifacts, models, containers via Sigstore/cosign + SLSA provenance + SBOM. | cosign/SLSA/SBOM | M | L2 | D6 |
| SEC-004 | The platform shall implement prompt-injection and tool-abuse defenses (input sanitization, tool allow-lists, output schema validation, least privilege). | MCP scopes + validators | M | L3 | D5,D6 |
| SEC-005 | Secrets shall be managed centrally and never appear in prompts, logs, or evidence. | HashiCorp Vault | M | L1 | D6 |
| SEC-006 | The audit/evidence store shall be immutable and tamper-evident (append-only, hash-chained) (P4). | Append-only + cosign | M | L2 | D1,D6 |
| SEC-007 | Policy shall be enforced at admission and runtime via a policy server (OPA/Gatekeeper); no policy bypass path. | OPA/Gatekeeper | M | L2 | D6 |
| SEC-008 | The platform shall meet medical-device cybersecurity obligations and network segmentation. | IEC 62443, FDA §524B | S | L3 | D6 |
| SEC-009 | Tool plane (MCP) and multi-agent (A2A) calls shall enforce least-privilege, declared scopes, audited per invocation. | MCP/A2A policy | M | L3 | D5,D6 |
| SEC-010 | Agents shall never autonomously merge or release Class C changes (links FR-003, §11). | Branch policy + gate | M | L3 | D1,D6 |
10. Observability & Cost Requirements (OPS / COST)#
| ID | Requirement (shall) | Target (org-set placeholder) | MoSCoW | ASMM-Med | Dim |
|---|
| OPS-001 | Every agent action and model call shall emit OpenTelemetry traces correlatable end-to-end (request→tools→model→gate→evidence). | 100% traced | M | L2 | D7 |
| OPS-002 | The platform shall expose GPU utilization, queue depth, and tokens/sec per tier and per tenant. | Dashboards live | M | L2 | D7 |
| OPS-003 | Drift, error-rate, abstention-rate, and escape-rate shall be observable in near-real-time. | SLO dashboards | S | L4 | D4,D7 |
| COST-001 | The platform shall compute cost-per-green-PR (cost per verified task) as the primary efficiency KPI (P6). | Reported per team | M | L3 | D7 |
| COST-002 | Budget guardrails shall enforce per-team/per-task token & GPU ceilings; overruns throttle or escalate, never silently spend. | Hard ceilings | M | L2 | D7 |
| COST-003 | Routing shall prefer the cheapest tier that meets the quality gate (links FR-015, MODEL-006). | Enforced | S | L3 | D2,D7 |
| COST-004 | Idle GPU pools shall scale to zero; cost attribution shall be tenant-accurate. | KEDA + chargeback | S | L2 | D7 |
| COST-005 | Repair/retry loops shall be cost-bounded (links EVAL-009); runaway loops shall halt and escalate. | Bounded | M | L2 | D4,D7 |
11. Constraints & Explicit Non-Goals#
11.1 Hard constraints (shall)#
| ID | Constraint |
|---|
| CON-001 | No SaaS/hosted LLM APIs (Claude, OpenAI, Gemini, or equivalent) in any lifecycle path. Open-weight, self-hosted only. |
| CON-002 | No external network egress of source, specs, PHI/PII, or model traffic. |
| CON-003 | No LLM as a sole gate: a deterministic verifier set must back every release decision (EVAL-005). |
| CON-004 | No autonomous merge or release of Class C software by an agent; Class C requires dual human control (SEC-010, FR-003). |
| CON-005 | No agent action without replayable Part 11-grade evidence (REG-003). |
| CON-006 | No model/adapter deployment without signing, registry entry, and acceptance evaluation (MODEL-003, MODEL-008). |
| CON-007 | No non-deterministic path may reach a gate verdict (NFR-010, P2). |
11.2 Explicit non-goals (this baseline)#
| Non-goal | Rationale |
|---|
| Fully unattended Class C autonomy | Prohibited by P3; revisit only with regulatory precedent. |
| "Vibe coding" / unbounded freeform generation | Counter to spec-driven, evidence-bound philosophy. |
| General-purpose chatbot assistant outside the SDLC | Out of scope; no validation basis. |
| Vendor-managed model hosting | Conflicts with P7 sovereignty. |
| Replacing human reviewers/signers | AI augments; humans remain accountable (REG-011). |
12. Acceptance Criteria Summary#
Verification methods: I = Inspection, D = Demonstration, T = Test, A = Analysis.
| Requirement set | Acceptance criterion (must pass for baseline sign-off) | Method | Priority |
|---|
| FR-001…015 | Each SDLC activity demonstrated with safety-class bounding and correct review posture; abstention and human override exercised. | D, T | M |
| NFR-001…012 | Latency/throughput/availability SLOs met under load test; gate verdict + artifact reproducibility shown bit-stable on replay. | T, A | M |
| REG-001…011 | Traceability matrix complete with no orphan links; CSA/tool-validation dossiers present; Part 11 evidence replayed; human-of-record signatures verified. | I, A | M |
| DATA-001…008 | No-egress proven by network policy test; PHI/PII redaction validated; provenance present on sampled actions; corpus versioning replayable. | T, I | M |
| MODEL-001…010 | Fleet confirmed open-weight/self-hosted; signatures and registry digests verified; multi-LoRA hot-swap demonstrated; abstention signal observed; lineage reproducible. | I, D, T | M |
| EVAL-001…009 | Gate determinism proven (identical inputs → identical verdict); system acceptance ≥ 99.9% on golden sets; escape rate within ceiling; no LLM-sole-gate path exists. | T, A | M |
| SEC-001…010 | Zero-trust identity enforced; sandbox isolation tested; SLSA/SBOM/cosign present; prompt-injection suite passed; Class C autonomous-merge attempt blocked. | T, I | M |
| OPS/COST-001…005 | End-to-end traces present; cost-per-green-PR reported; budget guardrail throttle demonstrated; scale-to-zero and chargeback verified. | D, T | M |
| CON-001…007 | Each hard constraint shown enforced (negative tests: egress blocked, SaaS call blocked, LLM-sole-gate rejected, Class C auto-merge rejected). | T | M |
Baseline sign-off requires all Must requirements Accepted with no open veto from any §2 veto-holder (QA/RA, Security, Finance, Clinical/Product), and traceability of every requirement to at least one verification record per P4.
End of 01-requirements.md — proceed to 02-maturity-model.md for the ASMM-Med level definitions that scope phased rollout in 09-adoption-roadmap.md.