Executive Summary: AWS AI Infrastructure

AWS is the first vendor in this assessment series that makes a credible claim across every layer of the 4+1 model — including Layer 2C. The structural difference between AWS and every on-prem vendor (Dell, HPE, VAST) is the direction of authority. On-prem vendors build upward from hardware, attempting to extend authority into orchestration and runtime layers. AWS builds downward from managed services, extending authority into custom silicon (Trainium, Inferentia, Graviton), custom networking (EFA/SRD, Nitro), and now on-prem infrastructure (AWS AI Factories).

The DAPM classification for AWS is structurally inverted compared to on-prem vendors. The enterprise architect using AWS retains less direct authority at every layer — but gains operational leverage that on-prem vendors cannot match. The question is not whether AWS has the capabilities. The question is whether the enterprise architect has made the authority delegation explicit, and whether they understand what borrowed judgment they inherit when they adopt AWS’s Reasoning Plane as their own.

AWS already has the control plane everyone else is trying to build. The problem is that customers do not always see where AWS’s control plane ends and their own authority begins. When Bedrock routes inference, SageMaker auto-scales, or Karpenter provisions nodes — those are Layer 2C functions operating invisibly inside managed services. This is the ‘DGX Realization’ that birthed the 4+1 model: the cloud operates an invisible Reasoning Plane that becomes visible only when you try to replicate it on bare metal.

The OpenAI partnership (2GW of Trainium capacity, Stateful Runtime on Bedrock) and NVIDIA deepened collaboration (1M+ GPUs including Blackwell and Rubin) demonstrate AWS positioning as the substrate on which multiple AI ecosystems converge — creating one of the broadest Layer 3 ecosystems and one of the most complex borrowed judgment landscapes.

Layer-by-layer status: Layer 0 (Ceded to AWS), Layer 1A (Delegated), Layer 1B (Delegated), Layer 1C (Delegated), Layer 2A (Delegated / Retained), Layer 2B (Delegated / Retained), Layer 2C (Intelligence 2C: Delegated | Infra 2C: Implicit), Layer 3 (+1) (Broadest Ecosystem).

Assessment framework: 4+1 Layer AI Infrastructure Model. Scoring model: Decision Authority Placement Model (DAPM) — Retained, Delegated, Ceded, or Absent. Published by The CTO Advisor LLC. Author: Keith Townsend. Date assessed: May 21, 2026. Version: v1.0 — Draft, Editorial Review Pending.

AWS AI Infrastructure

Mapped to the 4+1 Layer AI Infrastructure Model

v1.0 — Draft, Editorial Review Pending·Assessed May 21, 2026·Source: re:Invent 2025, GTC 2026, Bedrock AgentCore GA, AgentCore Policy GA (Mar 2026), SageMaker Unified Studio, AWS/NVIDIA collaboration, OpenAI/AWS partnership, analyst coverage

ACTIVE ASSESSMENT

Strength

Delegated

Gap

Absent

Partner

Layer 0Compute & Network FabricCeded to AWS▼

Raw compute, networking, and acceleration fabric

Vendor-Provided

AWS Custom Silicon (Annapurna Labs)Ceded

Trainium3 GA: EC2 Trn3 UltraServers, 144 chips/UltraServer, 4.4x compute vs Trn2, 4x energy efficiency, 4x memory bandwidth. Sub-10µs chip-to-chip latency. Designed for agentic AI, MoE models, large-scale RL. Trainium4 expected 2027. Inferentia2 for inference. Graviton for ARM CPU. AWS-owned silicon IP.

NVIDIA GPUs on AWSCeded

Broadest NVIDIA GPU collection of any cloud. P5 (H100), P5e (H200), P6 (B200), P6e (GB200). 1M+ GPUs added 2026 including Blackwell and Rubin.

Nitro System + EFA + SRDCeded

Custom hardware/firmware for I/O offload. Hardware-enforced security isolation. EFA: OS-bypass with 3,200 Gbps bandwidth. SRD: AWS custom multi-path, fault-tolerant transport. EC2 UltraClusters: petabit-scale, 20,000 GPUs, 16% latency reduction (v2.0).

AWS AI Factories (On-Prem)Ceded

Dedicated on-prem environments as private AWS Region. Customer provides space/power; AWS deploys and manages Trainium, NVIDIA GPUs, networking, storage, and full managed services (Bedrock, SageMaker). Inverted vs Dell/HPE: AWS operates infrastructure the customer houses.

NVIDIA-Provided

NVIDIA GPU Silicon + NIXL

1M+ GPUs including Blackwell and Rubin. NIXL support with EFA for disaggregated LLM inference.

◆ Gap Analysis

The enterprise has no authority over Layer 0 hardware beyond choosing instance types. The multi-accelerator marketplace (Trainium, NVIDIA, AMD, Intel) creates a workload-to-silicon matching problem that is itself a Layer 2C function. This problem doesn’t exist in on-prem (accelerator choice made once at procurement) but recurs with every cloud workload placement decision. AWS is the only vendor that owns accelerator silicon IP (Annapurna Labs). Dell and HPE brand third-party silicon. VAST has no Layer 0 silicon. AWS AI Factories invert the on-prem model: Ceded infrastructure even when physically in the customer’s facility.

◆ Borrowed Judgment

Inverted. The enterprise Cedes Layer 0 entirely — AWS makes all silicon, networking, and infrastructure decisions. The enterprise selects from AWS’s menu but does not influence underlying hardware design, networking topology, or physical infrastructure. The trade-off: loss of direct hardware authority in exchange for operational leverage (no procurement lead time, per-workload silicon selection, managed scaling).

◆ Working Notes

The switching cost / decision frequency distinction between cloud and on-prem at Layer 0 is structural. On-prem: capital decision at procurement. Cloud: per-workload decision at runtime. That fluidity itself requires Layer 2C.

Layer 1AData Storage & GovernanceDelegated▼

Durable, governed data foundation — the Governance Catalog that Layer 2C queries

Vendor-Provided

Amazon S3 + S3 TablesCeded

De facto object storage standard. S3 Tables (re:Invent 2025): Iceberg-native table storage. S3 Express One Zone: single-digit ms latency.

AWS Glue Data Catalog + Lake FormationDelegated

Centralized metadata with catalog federation to remote Iceberg catalogs. Fine-grained access control, cross-account sharing, column/row-level security. SageMaker and Bedrock inherit IAM/Lake Formation context. The primitives for 1A→2C exist; the composition is customer-built.

SageMaker CatalogDelegated

Discovery, subscription, governed sharing of data assets within SageMaker Unified Studio.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

Most mature governance catalog in this assessment. Glue + Lake Formation metadata is API-accessible to higher layers. The 1A→2C connection (Reasoning Plane querying governance metadata for placement decisions) is not a product today — the primitives exist, composition is customer-built. Catalog federation to remote Iceberg catalogs is unmatched within this series. Hybrid gap: federated catalog covers S3 and Iceberg-compatible catalogs but not proprietary on-prem storage metadata.

◆ Borrowed Judgment

Delegated with customer-retained policy. Lake Formation policies are customer-defined; enforcement is AWS-managed. Cleaner DAPM than Dell (MetadataIQ indexes Dell-only) or VAST (governance catalog is proprietary).

◆ Working Notes

The ‘Governance Enables Autonomy’ principle from the 4+1 model is achievable on AWS but requires the enterprise architect to build the governance-to-placement linkage.

Layer 1BContext Management & RetrievalDelegated▼

Low-latency retrieval for RAG — vector/hybrid search, context windows

Vendor-Provided

Amazon Bedrock Knowledge BasesDelegated

Managed RAG: ingest → chunk → embed → index → retrieve. Supports OpenSearch, Aurora/pgvector, Pinecone, Redis. Vertically integrates 1B+1C within managed boundary.

Amazon OpenSearch ServerlessDelegated

Vector search with HNSW/FAISS. Default Bedrock Knowledge Bases backend. Serverless scaling.

Amazon NeptuneDelegated

Graph database for relationship-aware retrieval. Multi-hop reasoning for agentic workloads.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

Bedrock Knowledge Bases erases the 1B/1C boundary within its managed surface — borrowed judgment, not a gap. AWS makes chunking, embedding, retrieval strategy decisions on the customer’s behalf. The enterprise should ask whether defaults suit their domain. Interoperability gap: no unified retrieval abstraction across Bedrock Knowledge Bases + self-hosted Weaviate + Neptune. Routing logic between backends is a Layer 2C function living in application code.

◆ Borrowed Judgment

Moderate. Bedrock Knowledge Bases makes retrieval quality decisions the enterprise inherits without explicit governance. Compare to VAST (InsightEngine — tighter but VAST-controlled) or Dell (Elastic — separate ISV).

◆ Working Notes

Neptune graph-based retrieval is increasingly relevant for agentic workloads needing relationship-aware context — a pattern neither Dell’s Elastic nor VAST’s InsightEngine natively provides.

Layer 1CData Movement & PipelinesDelegated▼

Move/transform data — ETL/ELT, lineage, governed data preparation

Vendor-Provided

AWS Glue + SageMaker Unified StudioDelegated

Glue: serverless ETL with Spark 3.5.6, Iceberg 1.10. SageMaker Unified Studio: horizontal integration across 1A/1C/2B with one-click onboarding. Single governed environment collapsing organizational boundaries across data engineering, data science, ML engineering.

Amazon MWAA + Step FunctionsDelegated

Managed Airflow for complex DAGs. Step Functions for serverless multi-step workflows in ML pipeline reference architectures.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

AWS horizontal integration vs VAST vertical integration: same functional coverage, different authority models. VAST = one authority boundary, fewer choices, fewer seams. AWS = many services sharing governance via Lake Formation/IAM, more policy control, more operational complexity. SageMaker Unified Studio addresses fragmentation but underlying services remain distinct.

◆ Borrowed Judgment

Delegated with customer-retained configuration. AWS provides pipeline services; customer defines transformations and flows. Operational complexity of maintaining consistent governance across many AWS accounts is the trade-off for flexibility.

Layer 2AInfrastructure OrchestrationDelegated / Retained▼

GPU scheduling, capacity management, autoscaling

Vendor-Provided

Amazon EKS Auto Mode + KarpenterDelegated

Managed K8s with GPU-aware scheduling. EKS Auto Mode automates cluster/compute management. Karpenter: open-source autoscaler provisioning exact instance types. Mixed compute (NVIDIA, Trainium, Inferentia, Graviton). Note: no DRA support, Capacity Blocks negate scale-to-zero.

Capacity ManagementCeded

Capacity Block Reservations, Flex Start, Savings Plans, Spot. All AWS-controlled allocation. These are capacity acquisition mechanisms, not workload placement reasoning.

NVIDIA-Provided

NVIDIA GPU Operator (on EKS)

Available but optional. AWS controls GPU scheduling through EKS Auto Mode and Karpenter. Run:ai available but not required.

◆ Gap Analysis

For Dell/HPE, Layer 2A is where authority slips to NVIDIA. For AWS, Layer 2A is where AWS retains authority through managed services while integrating NVIDIA optionally. GPU scheduling primitives are AWS-controlled. Governance choice: Retain 2A by running self-managed EKS, or Cede 2A by consuming Bedrock (no EKS, no Karpenter — AWS handles 2A invisibly). Both legitimate; the choice is a governance decision with DAPM implications.

◆ Borrowed Judgment

Low to moderate depending on path. Self-managed EKS: Retained. Bedrock consumption: Ceded. NVIDIA dependency is optional, structurally different from Dell/HPE where Run:ai is the primary GPU scheduler.

◆ Working Notes

Capacity Blocks sometimes described as proto-2C but cost-optimized capacity acquisition is not multi-objective placement reasoning.

Layer 2BApplication Runtime & ExecutionDelegated / Retained▼

Model serving, agent execution, inference APIs, distributed inference

Vendor-Provided

Amazon BedrockDelegated

Foundation model access: Anthropic Claude, Amazon Nova, Meta Llama, OpenAI (Stateful Runtime), Mistral, Cohere, NVIDIA Nemotron. Unified API. Fine-tuning including RFT.

Bedrock AgentCore RuntimeDelegated

Serverless agent runtime. Framework-agnostic (Strands, LangChain, CrewAI). Protocol-agnostic (MCP, A2A). Model-agnostic. MicroVM session isolation. 2M+ SDK downloads in 5 months.

SageMaker AI + Self-HostedDelegated / Retained

Training, fine-tuning, inference endpoints. LMI containers with vLLM. Multi-LoRA. Supports Trainium + NVIDIA. vLLM on EKS and Ray on EKS for fully self-hosted (Retained).

Strands Agents SDKRetained

AWS open-source agentic framework. Model-first, native AgentCore/Guardrails/OpenTelemetry integration. Multi-agent patterns with A2A.

NVIDIA-Provided

NVIDIA GPU Instances + NIM

P5/P6 instances. NVIDIA Nemotron via Bedrock. NVIDIA dependency optional — Trainium-only inference is architecturally possible.

◆ Gap Analysis

AWS owns multiple 2B surfaces (Bedrock, SageMaker, AgentCore, EKS). NVIDIA dependency is optional in a way it’s not for Dell/HPE. Agent frameworks blur 2B/2C/3 boundaries — AgentCore bundles Runtime (2B) + Policy (2C) + agent logic (3). Product boundary ≠ architectural boundary. Borrowed judgment: using Bedrock to access Anthropic Claude or Meta Llama means the model provider’s alignment decisions become part of the enterprise’s AI system. Guardrails constrain output but reasoning in model weights is not customer-configurable.

◆ Borrowed Judgment

Varies by path. Bedrock: Delegated + model provider borrowed judgment. SageMaker self-hosted: Retained. AgentCore: Delegated. Self-hosted EKS: fully Retained. Runtime proliferation is itself a 2C decision AWS doesn’t automate.

◆ Working Notes

Product boundary (AgentCore = Runtime + Policy + Evaluations + Memory + Registry) doesn’t align with 4+1 architectural boundary (Runtime = 2B, Policy = 2C, agent logic = 3). Same cross-layer bundling seen in Google’s and VAST’s products.

Layer 2CAgentic Infrastructure — The Reasoning PlaneIntelligence 2C: Delegated | Infra 2C: Implicit▼

Policy-driven placement and resource coordination — the Autonomy Layer

Vendor-Provided

Bedrock GuardrailsDelegated

Content filtering, PII protection, topic blocking. ApplyGuardrail API works with any model. Cross-account organizational safeguards (GA Apr 2026).

AgentCore Policy (GA Mar 2026)Delegated

Centralized governance outside agent code. Natural language → Cedar policy. Intercepts every tool call before execution. 13 AWS regions.

AgentCore Evaluations + MemoryDelegated

Built-in evaluators for correctness, safety, adherence, consistency. Episodic Memory for stateful reasoning across sessions.

AWS Agent Registry (Preview)Delegated

Governed catalog for agents, tools, skills, MCP servers. Works across AWS, other cloud, on-prem.

NVIDIA-Provided

No NVIDIA Layer 2C Dependency

All Layer 2C components are AWS IP. NVIDIA does not control governance, policy, or reasoning in the AWS stack.

◆ Gap Analysis

Intelligence Layer 2C (partially present): AgentCore Policy + Guardrails + Evaluations govern agent behavior. Real and productized. Natural-language-to-Cedar conversion is the most accessible policy authoring in this assessment. Infrastructure Layer 2C (not built): No service answers ‘given data residency, cost, latency, and compliance, should this run on Trainium in us-east-1 or NVIDIA in eu-west-1?’ Capacity primitives are building blocks, not a policy-driven placement engine querying 1A governance metadata. The structural insight: AWS already has the control plane everyone else is trying to build — but it’s implicit. Dell’s 2C gap is product absence. AWS’s 2C gap is visibility and authority — the capability exists but is implicit, managed, and Ceded. Five-vendor Layer 2C comparison: • Dell: Absent. • HPE: Retained (IT ops) + Delegated (Kamiwaza). • VAST: Retained/Emerging (PolicyEngine + Polaris, GA end 2026). • AWS: Intelligence 2C Delegated (productized). Infrastructure 2C implicit (inside managed services). The question is not ‘Does AWS have Layer 2C?’ but ‘How much can the enterprise configure, audit, and control — and how much has been Ceded without explicit classification?’

◆ Borrowed Judgment

Intelligence 2C: Low — AgentCore Policy, Guardrails, Evaluations are AWS IP. Customer defines policies; AWS enforces. Infrastructure 2C: Ceded (implicit) — placement decisions inside managed services without explicit customer policy input. When SageMaker auto-scales or Bedrock routes, those are 2C functions the enterprise has Ceded without classification. DAPM discipline demands: for every managed service placement decision, classify as Delegated (customer sets policy) or Ceded (AWS decides).

◆ Working Notes

The re:Invent 2025 and AgentCore announcements are the strongest vendor validation of the 4+1 model’s Layer 2C thesis. The ‘invisible Reasoning Plane’ observation is the conceptual foundation of the 4+1 model itself.

Layer 3 (+1)AI Application Layer — The Value PlaneBroadest Ecosystem▼

AI-powered business capabilities — business logic, workflow automation

Vendor-Provided

Bedrock Agents + Strands SDKDelegated / Retained

No-code (Bedrock Agents) and full-code (Strands) agent construction. Bedrock Agents: Delegated behavior. Strands: Retained authority. Both span 2B/2C/3.

Amazon QDelegated

AWS AI assistant for business and development. Enterprise Delegates application behavior to AWS.

Model + ISV EcosystemDelegated

Anthropic Claude, Amazon Nova, Meta Llama, OpenAI, Mistral, Cohere, NVIDIA Nemotron. Thousands of ISV applications. 11,000+ government agencies.

AWS Kiro (Agentic IDE Platform)Delegated

Spec-driven agentic development platform replacing Amazon Q Developer (new signups ended May 2026). Three surfaces: VS Code-compatible IDE, CLI, and autonomous cloud agent. Spec-driven development generates requirements.md, design.md, and tasks.md before code — specs are source-of-truth, code is build artifact. Hooks system: 17 automated quality gates (security, linting, testing, validation) firing on file save and PR events. Multi-model routing: Claude Sonnet for reasoning-heavy specs, Amazon Nova for high-throughput code generation, Bedrock as unified model plane. 50+ Powers (MCP integrations: Figma, Terraform, Stripe, Datadog). Autonomous agent executes backlog tasks and opens PRs without developer in the loop. Deep AWS context: native Powers for AWS pricing, docs, Well-Architected, cost analysis. The most opinionated developer AI surface from any cloud vendor — enforces structured development discipline rather than freeform 'vibe coding.'

NVIDIA-Provided

NVIDIA NIM on Bedrock

NVIDIA models via Bedrock API alongside all other providers.

◆ Gap Analysis

Broadest Layer 3 in this assessment — different category than Dell ISV partnerships, HPE Unleash AI, or VAST Cosmos. Each Layer 3 application brings its own governance domain. AgentCore Policy and Guardrails provide cross-agent governance primitives; whether they compose into enterprise-wide agent governance remains an implementation question. The Retained/Delegated boundary is not uniform. Custom Strands on self-hosted EKS: fully Retained. Bedrock Agents / Q / partner apps: substantially Delegated. Same enterprise may have both patterns simultaneously. Kiro represents AWS's strongest Layer 3 opinion: spec-driven development enforces structured requirements before code generation. This is an opinionated development methodology embedded in tooling — the enterprise Delegates development workflow decisions to AWS's architectural opinions about how AI-assisted software should be built. The autonomous agent (cloud agent executing tasks and opening PRs without human in the loop) creates a new DAPM question: when Kiro's agent writes and ships code autonomously, who owns the judgment embedded in that code? The developer who assigned the task, or Kiro's multi-model routing logic that chose which model to apply? Compare to Google Antigravity 2.0 (agent orchestration platform, multi-agent parallel execution, Gemini-native) and GitHub Copilot (IDE-embedded coding agent with cloud agent for autonomous PR creation). All three clouds now have agentic developer surfaces that span Layer 2B (execution) and Layer 3 (application). The competitive dynamics are shifting from 'which cloud has the best models' to 'which cloud has the most productive developer surface.'

◆ Borrowed Judgment

Distributed and complex. Model providers bring training data, alignment, safety decisions as inherited borrowed judgment. AWS platform defaults shape application behavior. DAPM Action 3 applies with force: when you move off AWS, what judgment doesn’t move with you? Answer: almost everything above Layer 0.

◆ Working Notes

OpenAI partnership (Stateful Runtime on Bedrock) + NVIDIA (1M+ GPUs) position AWS as convergence substrate for multiple AI ecosystems. Broadest possibilities, most complex borrowed judgment landscape. The Q Developer → Kiro transition (new Q Developer signups ended May 2026) is a significant strategic signal. AWS is consolidating its developer AI surface into a single opinionated platform rather than maintaining parallel tools. Kiro's spec-driven approach is the inverse of 'vibe coding' — it imposes engineering discipline through AI tooling. Whether enterprise development teams accept this opinionated workflow or prefer the freeform approach of Cursor/Copilot is the adoption question. Red Hat Summit 2026 announced OpenShift Dev Spaces support for Kiro alongside Microsoft Copilot, Claude CLI, Cline, Continue, and Roo — meaning Kiro can run inside IBM's governed platform. This cross-vendor interoperability matters for the 4+1 model: the developer tool (Layer 3) can be decoupled from the infrastructure platform (Layer 2A). An enterprise could run Kiro on OpenShift on Dell hardware — three vendors' authority at three different layers.

✦ Summary Finding

4+1 Layer AI Infrastructure Model · Vendor Assessment Series · The CTO Advisor LLC · thectoadvisor.com