Executive Summary: NVIDIA AI Platform

NVIDIA is the only vendor in this assessment series that appears inside every other vendor's assessment. Dell's Layer 2A GPU orchestration is NVIDIA Run:ai. HPE's Layer 2B runtime is NVIDIA AI Enterprise. VMware's GPU integration is NVIDIA vGPU Manager. AWS, Google Cloud, and Azure all run NVIDIA GPUs alongside their own silicon. VAST embeds NVIDIA GPUs, NICs, DPUs, and switches into its data platform. Mapping NVIDIA as a standalone vendor inverts the perspective: instead of asking 'where does Dell cede authority to NVIDIA,' the question becomes 'where does NVIDIA claim authority, and from whom?'

The 4+1 mapping reveals NVIDIA as a vendor with deep authority at Layers 0, 2A, and 2B — and emerging ambitions at Layer 2C. NVIDIA designs the accelerator silicon that every other vendor depends on (Layer 0), provides the GPU orchestration platform that on-prem vendors brand as their own (Layer 2A via Run:ai), and controls the inference runtime and model lifecycle stack that sits between the enterprise's infrastructure and its AI applications (Layer 2B via NIM, NeMo, Dynamo, TensorRT-LLM). At Layers 1A through 1C, NVIDIA is an accelerator — it makes other vendors' storage and data pipelines faster without providing those capabilities directly.

The structural tension is between NVIDIA as a silicon supplier and NVIDIA as a platform vendor. When NVIDIA was only selling GPUs, its interests aligned with every OEM and hyperscaler: more GPU adoption meant more revenue for everyone. As NVIDIA extends into GPU orchestration (Run:ai), inference runtime (NIM/Dynamo), agent governance (OpenShell/NemoClaw), and cloud infrastructure (DGX Cloud), it competes with the same customers who buy its silicon. Dell's AI Factory runs NVIDIA software. But DGX Cloud is NVIDIA competing with Dell for the same enterprise workload.

The DAPM classification for NVIDIA is inverted from every other assessment: the enterprise consuming NVIDIA through an OEM (Dell, HPE, VMware) has already Ceded authority to the OEM, which has Ceded authority to NVIDIA. The enterprise consuming NVIDIA directly (DGX Cloud, NIM API) Cedes authority to NVIDIA without an intermediary. The enterprise self-hosting NVIDIA open-source software (Dynamo, OpenShell, Nemotron) Retains authority — but still runs on NVIDIA silicon. NVIDIA is the only vendor where every deployment path, at every layer, eventually depends on NVIDIA hardware.

More than half of NVIDIA's engineers work on software. That statistic from NVIDIA's FY2026 annual report is the key to understanding the 4+1 mapping: NVIDIA is a software company that happens to sell the hardware its software requires. The assessment series has been documenting where NVIDIA's software authority appears inside other vendors' stacks. This assessment makes that authority explicit.

Layer-by-layer status: Layer 0 (NVIDIA Strength — Silicon Authority), Layer 1A (Accelerator Only), Layer 1B (Acceleration + Model Enablement), Layer 1C (Inference Data Movement), Layer 2A (NVIDIA Authority via Run:ai), Layer 2B (NVIDIA Authority — Inference + Agent Runtime), Layer 2C (Runtime Governance Only — Not a Reasoning Plane), Layer 3 (+1) (Model + Blueprint Enablement).

Assessment framework: 4+1 Layer AI Infrastructure Model. Scoring model: Decision Authority Placement Model (DAPM) — Retained, Delegated, Ceded, or Absent. Published by The CTO Advisor LLC. Author: Keith Townsend. Date assessed: May 22, 2026. Version: v1.0 — Draft, Editorial Review Pending.

NVIDIA AI Platform

A Components Company Becoming a Platform Vendor — Mapped to the 4+1 Layer AI Infrastructure Model

v1.0 — Draft, Editorial Review Pending·Assessed May 22, 2026·Source: GTC 2025, GTC 2026, Dynamo 1.0 GA, NemoClaw/OpenShell announcement, Run:ai acquisition, DGX Cloud, NVIDIA AI Enterprise, NIM GA, analyst coverage, SEC FY2026 annual report

ACTIVE ASSESSMENT

Strength

Delegated

Gap

Absent

Partner

Layer 0Compute & Network FabricNVIDIA Strength — Silicon Authority▼

Raw compute, networking, and acceleration fabric

Vendor-Provided

GPU Accelerator Silicon (Blackwell, Vera Rubin)Ceded

Blackwell B200/B300/GB200 (current generation). Vera Rubin NVL72 (next generation, deploying at hyperscalers). The accelerator silicon that every other vendor in this assessment depends on. Dell builds PowerEdge around it. HPE builds ProLiant and Cray around it. AWS offers it as P5/P6 instances. Azure offers it as ND-series. Google offers it alongside TPUs. VAST embeds it in CNode-X. No enterprise AI infrastructure exists without NVIDIA GPU silicon — or a deliberate decision to use an alternative (AWS Trainium, Google TPU, AMD Instinct).

Networking Silicon + InterconnectCeded

NVLink/NVSwitch (intra-node GPU interconnect). Spectrum-X Ethernet switches. ConnectX-7/8 SmartNICs. BlueField-3 DPUs. InfiniBand for GPU cluster fabric. NIXL for disaggregated inference data movement. Dell brands Spectrum switches as PowerSwitch. HPE integrates ConnectX into ProLiant. VAST uses ConnectX/BlueField for NVMe-over-Fabrics. The networking silicon is as structurally embedded as the GPU silicon.

DGX Platform (On-Prem Systems)Ceded

DGX SuperPOD: leadership-class AI infrastructure for on-prem and hybrid. DGX Station: workgroup-scale AI compute. DGX Spark: desktop AI workstation. Pre-configured systems with NVIDIA software stack pre-installed. Competes directly with Dell PowerEdge, HPE ProLiant, and OEM AI server configurations — NVIDIA sells the assembled system, not just the components.

DGX Cloud (Hosted Infrastructure)Ceded

GPU supercomputing as a service, hosted on AWS, Azure, GCP, and OCI. Includes NVIDIA AI Enterprise software and Base Command Platform. The enterprise accesses NVIDIA infrastructure through a hyperscaler substrate — Ceding to both NVIDIA (software/GPU) and the hyperscaler (facility/network). DGX Cloud competes with the hyperscalers' own GPU instance offerings while running on their infrastructure.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

Layer 0 is NVIDIA's foundational authority. Every other vendor in this assessment depends on NVIDIA silicon at this layer — the only exceptions are AWS (Trainium/Inferentia), Google (TPU), Azure (Maia), and AMD Instinct instances on hyperscalers. The DGX Platform creates a structural tension with OEM partners. When NVIDIA sells DGX SuperPOD directly to an enterprise, that enterprise is NOT buying Dell PowerEdge or HPE ProLiant. NVIDIA is simultaneously its OEM partners' most critical supplier and their direct competitor. Dell's 'AI Factory with NVIDIA' branding and HPE's 'NVIDIA AI Computing by HPE' branding are attempts to keep the enterprise buying through the OEM rather than going to NVIDIA directly. DGX Cloud adds a second tension: NVIDIA competes with hyperscalers while running on their infrastructure. AWS, Azure, and GCP host DGX Cloud while simultaneously offering their own GPU instances. The enterprise choosing between Azure ND-series VMs and DGX Cloud on Azure is choosing between Microsoft-managed and NVIDIA-managed access to the same GPU hardware. The NVIDIA dependency at Layer 0 is the one dependency shared by every on-prem vendor assessed. Dell, HPE, VAST, and VMware all depend on NVIDIA GPU silicon. The difference is the scope of that dependency: at Layer 0, NVIDIA provides silicon. At Layer 2A, NVIDIA provides orchestration. At Layer 2B, NVIDIA provides runtime. The silicon dependency is structural and shared. The software dependency is where NVIDIA's authority claims create tension.

◆ Borrowed Judgment

The enterprise consuming NVIDIA silicon inherits NVIDIA's GPU architecture decisions (memory bandwidth, interconnect topology, power/thermal profile), NVIDIA's driver and CUDA runtime decisions, and NVIDIA's product lifecycle and pricing decisions. This borrowed judgment is structural — it exists for every vendor in the assessment and for every enterprise running AI workloads on NVIDIA hardware. The DGX Platform adds system-level borrowed judgment: NVIDIA's hardware integration, thermal design, and rack architecture decisions. The DGX Cloud adds hyperscaler-layered borrowed judgment: NVIDIA software decisions on top of hyperscaler infrastructure decisions.

◆ Working Notes

NVIDIA's FY2026 annual report segments its business into Compute & Networking (data center accelerated computing, networking, AI solutions, software, automotive) and Graphics (GeForce, Quadro/RTX). The Data Center platform — GPUs, DPUs, networking, DGX, software — is the revenue engine. The company's strategic direction is to expand from silicon supplier to platform vendor without alienating the OEM and hyperscaler customers who drive GPU volume. The NVIDIA dependency column in every other vendor's assessment can now be read as 'authority NVIDIA claims at this layer.' The standalone assessment makes that authority explicit and measurable.

Layer 1AData Storage & GovernanceAccelerator Only▼

Durable, governed data foundation — the Governance Catalog that Layer 2C queries

Vendor-Provided

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

NVIDIA provides no storage, no data governance, and no data platform. Zero components at this layer. NVIDIA accelerates other vendors' storage with GPU libraries (cuVS for vector search, RAPIDS for data processing) and networking hardware (BlueField DPUs, ConnectX NICs) — but those are acceleration functions assessed at their functional layers (1B for retrieval, 1C for data processing, Layer 0 for networking silicon). The storage platforms, governance catalogs, and data architectures are entirely owned by other vendors: Dell (PowerScale, ObjectScale, MetadataIQ), HPE (Alletra, Data Fabric), VAST (DataStore, DataBase, Catalog), AWS (S3, Glue, Lake Formation), Google (BigQuery, Knowledge Catalog), Azure (Blob, Fabric, Purview). The absence of NVIDIA-owned storage or governance is structurally significant for Layer 2C: a Reasoning Plane needs governance metadata — which data is sensitive, which models are approved, which compliance requirements apply. NVIDIA has no Layer 1A metadata to feed into a Layer 2C reasoning plane. Every other vendor's 2C ambition is anchored in governance metadata from 1A. NVIDIA's emerging Layer 2C (OpenShell/NemoClaw) operates without governance context because NVIDIA doesn't own the data layer.

◆ Borrowed Judgment

None. NVIDIA has no data layer authority to lend or borrow. The enterprise's storage and governance judgment comes entirely from the storage vendor.

◆ Working Notes

The STX Architecture observation from the Dell assessment applies here: STX is available to every storage vendor. It does not differentiate any OEM's storage offering — it raises the floor for all of them. NVIDIA's Layer 1A role is to make the data layer faster, not to provide or govern it.

Layer 1BContext Management & RetrievalAcceleration + Model Enablement▼

Low-latency retrieval for RAG — vector/hybrid search, context windows

Vendor-Provided

cuVS (GPU-Accelerated Vector Search)Delegated

GPU-accelerated vector similarity search library. 12x faster vector indexing. Used by Dell (MetadataIQ integration), VAST (CNode-X vector search), and storage vendors for retrieval acceleration. NVIDIA provides the search acceleration; the platform vendor provides the retrieval infrastructure and index.

NeMo RetrieverDelegated

GPU-accelerated retrieval pipeline for RAG. Embedding models, reranking, and retrieval optimization. Integrated into Dell's Data Search Engine (PowerScale connector), HPE's retrieval stack, and VMware's AI Enterprise RAG Stack. Provides the retrieval intelligence that OEMs brand as part of their platforms.

NIM Embedding ModelsDelegated

Pre-built inference microservices for text and multimodal embedding. Used by VAST's InsightEngine, Dell's retrieval pipeline, and hyperscaler RAG services. NVIDIA provides the embedding models; the platform vendor provides the retrieval infrastructure.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

NVIDIA provides retrieval acceleration and embedding models but not retrieval infrastructure. The retrieval engines — Azure AI Search, OpenSearch, Elasticsearch, VAST InsightEngine, Google Vertex AI Search — are owned by other vendors. NVIDIA makes retrieval faster and provides the embedding models that make vector search work, but the enterprise's retrieval architecture is determined by the platform vendor. NeMo Retriever is a meaningful capability: it provides the GPU-accelerated RAG pipeline that multiple OEMs brand as part of their offerings. When Dell advertises 'GPU-accelerated hybrid search,' the GPU acceleration is NVIDIA's. The enterprise's retrieval quality depends on NVIDIA's embedding model quality — a borrowed judgment that is rarely made explicit.

◆ Borrowed Judgment

Moderate. NeMo Retriever embedding models determine retrieval quality. The enterprise inherits NVIDIA's embedding model training decisions, architecture choices, and optimization priorities. This borrowed judgment is invisible — the enterprise interacts with Dell's search engine or VAST's InsightEngine, not with NVIDIA's embeddings directly.

◆ Working Notes

The embedding model dependency is worth tracking across the assessment series. Multiple vendors (Dell, HPE, VAST, VMware) use NVIDIA NIM embedding models for their RAG pipelines. If NVIDIA changes embedding model architecture, quality, or licensing, it affects every vendor's Layer 1B simultaneously — a shared dependency that no single vendor controls.

Layer 1CData Movement & PipelinesInference Data Movement▼

Move/transform data — ETL/ELT, lineage, cost-aware movement, KV cache tiering

Vendor-Provided

NVIDIA CMX (KV Cache Management)Delegated

Context Memory Extension for KV cache offload from GPU to CPU/SSD. Validated with Dell PowerScale (19x TTFT improvement). Enables inference scaling by treating KV cache as a data movement problem. Future integration expected with HPE Alletra and VAST CNode-X. The most architecturally significant Layer 1C capability NVIDIA provides — it solves a data movement problem that emerges specifically from inference workloads.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

NVIDIA does not provide data pipeline orchestration (Data Factory, Dataloop, DataEngine, Airflow). Its Layer 1C presence is a single capability: KV cache management via CMX. CMX is architecturally significant because it addresses a data movement problem unique to AI inference — KV cache growing beyond GPU memory. This is a Layer 1C function (data movement) that directly affects Layer 2B performance (inference latency). Dell has validated it (19x TTFT improvement on PowerScale); HPE and VAST are expected integrations. The enterprise's KV cache strategy becomes a borrowed judgment from NVIDIA's CMX design decisions once adopted. NVIDIA also provides GPU-accelerated compute libraries (RAPIDS, cuDF) used within other vendors' data pipelines, but these are computation acceleration, not data movement — they make processing faster without providing pipeline orchestration, data lineage, or movement logic. They are assessed at the layers where they functionally operate (compute acceleration at Layer 0, retrieval acceleration at Layer 1B) rather than at Layer 1C.

◆ Borrowed Judgment

Moderate for CMX — an architectural decision about KV cache management that affects inference performance and is harder to substitute once adopted. The enterprise inherits NVIDIA's decisions about cache eviction policy, offload thresholds, and storage tier targeting.

◆ Working Notes

The KV cache tiering gap identified in the Azure assessment is relevant here: Azure has no CMX integration. Dell has validated it. HPE is expected. VAST's CNode-X architecture collocates cache and compute, potentially eliminating the need for CMX-style offload. The KV cache management approach varies by vendor — NVIDIA's CMX is one solution, not the only one.

Layer 2AInfrastructure OrchestrationNVIDIA Authority via Run:ai▼

GPU scheduling, quotas, RBAC, fair-share scheduling, utilization optimization

Vendor-Provided

NVIDIA Run:ai (Acquired 2024)Ceded

GPU orchestration and workload management platform. Kubernetes-native. Dynamic GPU pooling across hybrid environments. Fractional GPU sharing (no open-source equivalent). Fair-share scheduling with team-level quotas. Multi-cluster management from a unified control plane. Now part of NVIDIA AI Enterprise ($4,500/GPU/year standalone, included with DGX). Run:ai is the Layer 2A authority that Dell brands as part of AI Factory, HPE brands as part of Private Cloud AI, and VMware integrates through NVIDIA AI Enterprise. The OEM sells the relationship; NVIDIA controls the scheduling intelligence. GPU-level infrastructure (GPU Operator for driver/plugin lifecycle, MIG for hardware partitioning) provides the substrate Run:ai orchestrates — invisible plumbing, not standalone orchestration tools.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

Layer 2A is where NVIDIA's platform ambition creates the most direct tension with its OEM partners. Run:ai is the most capable GPU-specific orchestration platform available — fractional GPU sharing, multi-cluster management, and fair-share scheduling are capabilities that no open-source alternative matches. But Run:ai is NVIDIA's product, not the OEM's. When Dell markets 'AI Factory with NVIDIA,' the GPU scheduling intelligence is Run:ai — NVIDIA's IP, NVIDIA's roadmap, NVIDIA's pricing. Dell provides the hardware, the rack integration, and the customer relationship. NVIDIA provides the scheduling brain. If NVIDIA changes Run:ai's architecture, licensing, or feature set, Dell's AI Factory Layer 2A changes with it — without Dell's input. The same dynamic applies to HPE (Private Cloud AI includes NVIDIA AI Enterprise with Run:ai) and VMware (VCF integrates NVIDIA AI Enterprise). Three OEMs, one scheduling authority. The hyperscalers avoid this dependency: AWS built Karpenter, Google built GKE Autopilot + Fluid Compute, Azure contributed DRA to upstream Kubernetes. Each hyperscaler owns its GPU scheduling intelligence. On-prem vendors do not — they consume NVIDIA's. The open-source alternatives (Kueue, KAI Scheduler, DRA) are catching up but lack Run:ai's fractional GPU sharing. The enterprise evaluating GPU orchestration choices is evaluating a NVIDIA proprietary vs. open-source trade-off — better capability (Run:ai) vs. more authority (open-source).

◆ Borrowed Judgment

High. Run:ai's scheduling decisions — which team gets which GPU, how fractional sharing is allocated, when over-quota borrowing is permitted — are NVIDIA's judgment. The enterprise configures policies; NVIDIA's scheduler executes them. If Run:ai makes a scheduling decision that impacts training job completion time or inference latency, that's NVIDIA's borrowed judgment affecting business outcomes. NVIDIA AI Enterprise licensing adds commercial borrowed judgment: the enterprise's production deployment timeline depends on NVIDIA's licensing terms, pricing changes, and certification cycles.

◆ Working Notes

The Run:ai acquisition (2024) is the most significant NVIDIA software acquisition for the 4+1 model. Before Run:ai, NVIDIA provided silicon and libraries. After Run:ai, NVIDIA provides the orchestration plane that sits between the enterprise and its own GPUs. The enterprise doesn't interact with GPUs directly — it interacts through Run:ai's scheduling layer. The open-source Kubernetes GPU scheduling landscape (DRA, Kueue, KAI Scheduler) is evolving rapidly. Microsoft contributed DRA to upstream Kubernetes at KubeCon 2026. If open-source GPU scheduling reaches feature parity with Run:ai's fractional GPU sharing, the enterprise case for Run:ai's licensing cost weakens. NVIDIA's response: integrate Run:ai deeper into AI Enterprise, making it harder to substitute.

Layer 2BApplication Runtime & ExecutionNVIDIA Authority — Inference + Agent Runtime▼

Model serving, inference optimization, agent runtime — the Execution Plane

Vendor-Provided

NVIDIA NIM (Inference Microservices)Ceded

Pre-built, optimized inference containers for 100+ models. OpenAI-compatible API. Free for prototyping on DGX Cloud (build.nvidia.com). Production requires AI Enterprise license. Includes Nemotron, Llama, Mistral, and partner models. NIM is the inference runtime that multiple OEMs and hyperscalers brand as part of their platforms — AWS Bedrock offers NIM, Azure Foundry offers NIM, Dell deploys NIM on PowerEdge.

Dynamo 1.0 (Inference Operating System)Retained

Open-source (Apache 2.0) distributed inference serving framework. GA March 2026. Disaggregated prefill and decode. KV-aware routing to GPUs with best cache match. KVBM for memory management. NIXL for GPU-to-GPU data movement. Grove for scaling. 7x performance boost on Blackwell. Adopted by AWS, Azure, GCP, OCI, CoreWeave, and dozens of inference providers. NVIDIA positions Dynamo as 'the operating system of AI factories.' Open-source but NVIDIA-optimized — runs best on NVIDIA hardware.

NeMo (Model Lifecycle)Ceded

End-to-end model lifecycle management: data curation, model customization and evaluation, guardrailing and observability. NeMo Guardrails for content safety. NeMo Evaluator for model assessment. NeMo Data Designer for training data preparation (integrated into VAST's TuningEngine). The model lifecycle stack that operates above inference and below applications.

NeMo Guardrails (Runtime Content Safety)Ceded

Programmable content safety framework inline with inference. Controls model output, topic boundaries, and factual grounding during model serving. Deployed as part of NIM containers or standalone. At Layer 2B, Guardrails functions as runtime content filtering — it controls what the model says during inference. The same capability serves a Layer 2C governance function when applied as policy enforcement for agent behavior.

NemoClaw + OpenShell (Agent Execution Runtime)Retained

NemoClaw: open-source stack (Apache 2.0) bundling OpenShell runtime with Nemotron models. At Layer 2B, NemoClaw is the agent execution environment — the runtime that agents run inside. OpenShell provides kernel-level sandboxing (deny-by-default) and privacy router for on-device vs. cloud inference routing. Early alpha. The same capability serves a Layer 2C governance function as the policy enforcement sandbox that constrains agent behavior.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

Layer 2B is NVIDIA's deepest software authority and the layer where the platform ambition is most visible. NIM, Dynamo, NeMo, NeMo Guardrails, and NemoClaw/OpenShell constitute the complete inference, model lifecycle, and agent runtime stack. NeMo Guardrails and NemoClaw/OpenShell appear at both Layer 2B and Layer 2C because they serve dual architectural functions. At 2B they are runtime capabilities — content filtering inline with inference, agent execution environment. At 2C they are governance capabilities — policy enforcement for agent behavior, sandbox constraints on agent access. The same code, two architectural purposes. This dual-layer presence is itself evidence of NVIDIA's platform transition: a components company's software stays within one layer; a platform company's software spans layers. The open-source strategy is deliberate: Dynamo (Apache 2.0) and NemoClaw/OpenShell (Apache 2.0) are open-source, meaning the enterprise Retains the code. But both are optimized for NVIDIA hardware and NVIDIA's CUDA ecosystem. Running Dynamo on AMD or Intel GPUs is theoretically possible but practically disadvantaged. The open-source license provides code portability; the hardware optimization provides silicon lock-in. NIM is the more significant authority claim: it's closed-source, NVIDIA-only, and requires an AI Enterprise license for production. The enterprise using NIM to serve models has Ceded inference runtime authority to NVIDIA. The alternative — vLLM, SGLang, or other open-source serving frameworks — is slower but Retained. The Dell assessment's Layer 2B finding applies directly: Dell does not appear to own the core agent runtime, model-serving runtime, guardrail framework, or distributed inference framework. Those are NVIDIA's. This assessment confirms that observation from NVIDIA's perspective.

◆ Borrowed Judgment

High for NIM (closed-source, NVIDIA-controlled inference optimization decisions). Low for Dynamo (open-source, enterprise can fork and modify). Moderate for NeMo (model lifecycle decisions — training data curation, evaluation metrics, guardrail policies — are NVIDIA's defaults that the enterprise inherits unless explicitly overridden). The inference optimization decisions in NIM and TensorRT-LLM directly affect model output quality, latency, and cost. Quantization choices, batching strategies, and KV cache management are NVIDIA's engineering decisions that the enterprise consumes without visibility. If an NIM container produces different outputs than a vLLM deployment of the same model, the enterprise may not know which is 'correct.'

◆ Working Notes

Dynamo 1.0 GA (March 2026) is NVIDIA's strongest Layer 2B play. Positioning it as 'the operating system of AI factories' is explicitly a platform claim. Combined with Run:ai at Layer 2A and NIM at Layer 2B, NVIDIA controls the infrastructure orchestration, the inference optimization, and the model serving runtime — three layers of the enterprise's AI stack that sit between the hardware (which NVIDIA also provides) and the application (which the enterprise builds). The NemoClaw/OpenShell alpha status is important context: Futurum Research noted that NemoClaw addresses 'the deployment end of the agent trust chain well' but urged enterprises 'not to treat it as a complete governance solution.' Security and accountability need to be embedded throughout the development lifecycle, not just at runtime. This is the gap between NVIDIA's runtime governance (OpenShell) and Microsoft's lifecycle governance (Entra Agent ID + Agent Governance Toolkit).

Layer 2CAgentic Infrastructure — The Reasoning PlaneRuntime Governance Only — Not a Reasoning Plane▼

Policy-driven placement and resource coordination — the Autonomy Layer

Vendor-Provided

NemoClaw + OpenShell Agent Governance (Alpha)Retained

NemoClaw: open-source stack (Apache 2.0) bundling OpenShell governance runtime with Nemotron models and NVIDIA Agent Toolkit. At Layer 2C, OpenShell is the policy enforcement sandbox — kernel-level deny-by-default constraints on filesystem, network, and process access via declarative YAML. Privacy router governs on-device vs. cloud inference routing as a policy decision. The same capability serves a Layer 2B function as the agent execution environment. Alpha-stage — NVIDIA is explicit about rough edges.

NeMo Guardrails (Agent Policy Enforcement)Delegated

At Layer 2C, Guardrails functions as policy enforcement for agent behavior — controlling what agents are permitted to do, say, and access as a governance decision. Topic boundaries, factual grounding requirements, and content policies are defined declaratively and enforced at runtime. The same capability serves a Layer 2B function as inline content safety during model inference.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

Applying the 'Routing Is Not Reasoning' test from the VMware assessment: OpenShell provides runtime sandbox governance — it controls WHAT agents can access (filesystem, network, processes). NeMo Guardrails control WHAT models can say (content filtering, topic boundaries). Neither provides policy-driven decisions about WHERE compute runs relative to data, WHICH model serves WHICH request, or HOW cost/compliance/latency are arbitrated. OpenShell is agent runtime security. NeMo Guardrails is model output safety. Neither is a Reasoning Plane. NVIDIA's Layer 2C gap is structural: NVIDIA does not own storage (Layer 1A), data governance (Purview, Lake Formation, Knowledge Catalog), or enterprise identity (Entra, IAM). A Reasoning Plane needs governance metadata — which data is sensitive, which models are approved, which compliance requirements apply. NVIDIA has no data governance to query because it has no data layer. The consequence: NVIDIA's Layer 2C will always depend on another vendor's governance metadata. OpenShell can enforce sandbox policies, but it cannot make placement decisions informed by data classification, compliance status, or cost targets — because that information lives in Purview, Lake Formation, PolicyEngine, or MetadataIQ, none of which NVIDIA owns. This is the fundamental structural limitation of NVIDIA's platform ambition: NVIDIA can build runtime governance (2B/2C boundary) but cannot build a full Reasoning Plane (2C) because it lacks the data governance foundation (1A) that a Reasoning Plane queries.

◆ Borrowed Judgment

Low for OpenShell (open-source, enterprise controls the policies). Low for NeMo Guardrails (configurable by the enterprise). The governance logic is transparent — the enterprise defines what agents can and cannot do. The missing borrowed judgment is the more significant finding: NVIDIA's Layer 2C cannot borrow data governance judgment from itself because it doesn't have a data governance layer. It must borrow from Dell (MetadataIQ), HPE (Data Fabric), VAST (Catalog), AWS (Lake Formation), Google (Knowledge Catalog), or Azure (Purview). NVIDIA's governance is runtime-only; other vendors' governance is data-informed.

◆ Working Notes

The Futurum Research observation is the right framing: OpenShell addresses the 'deployment end of the agent trust chain' but enterprises should not treat it as a complete governance solution. Security and accountability need to be embedded throughout the development lifecycle. Compare to other vendors' Layer 2C: • Microsoft: identity + governance lifecycle (Entra Agent ID + AGT) — the broadest governance scope • Google: model-integrated orchestration (Agent Platform) — the deepest platform integration • AWS: policy + evaluation + registry (AgentCore) — the most modular approach • VAST: data platform governance (PolicyEngine + Polaris) — the most data-informed approach • NVIDIA: runtime sandbox (OpenShell) — the narrowest scope, addressing only execution-time security NVIDIA's Layer 2C is necessary but not sufficient. It complements other vendors' governance — it doesn't replace it.

Layer 3 (+1)AI Application Layer — The Value PlaneModel + Blueprint Enablement▼

AI-powered business capabilities — business logic, workflow automation

Vendor-Provided

Nemotron Open ModelsRetained

Post-trained on Llama, distilled from DeepSeek-R1. Deployment-ready for AI agents. Available through NIM API (build.nvidia.com) and as downloadable containers. Nemotron models are NVIDIA's answer to the model layer — open models optimized for NVIDIA hardware. Competes with OpenAI, Anthropic, Google, and Meta at the model layer while providing the hardware those competitors run on.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

NVIDIA does not build enterprise AI applications. Its Layer 3 presence is a single component: Nemotron open models. NVIDIA also provides application enablement that falls below the Layer 3 threshold: Blueprints (pre-built reference patterns for PDF extraction, digital twins, RAG pipelines, AI-Q agent task decomposition — deployed through Dell, HPE, VMware, and hyperscaler marketplaces) and NIM API endpoints (build.nvidia.com — free API access to 100+ models, 1,000 free inference credits, GPU sandbox instances). Blueprints are reference architectures, not applications — the enterprise builds from them, not on them. NIM API is a developer on-ramp and go-to-market funnel, not an application platform. The Nemotron model strategy is the interesting Layer 3 finding: NVIDIA competes with the AI model providers (OpenAI, Anthropic, Google, Meta) whose models run on NVIDIA hardware. If Nemotron achieves quality parity with proprietary models, enterprises can run inference on NVIDIA hardware with NVIDIA models — a fully vertically integrated stack from silicon to model. No other silicon vendor has this: Intel doesn't have frontier models, AMD doesn't have frontier models, AWS Trainium serves other providers' models. The NIM API funnel is NVIDIA's developer moat: free prototyping creates adoption → adoption creates switching cost → production deployment requires AI Enterprise license on NVIDIA hardware. The funnel is silicon-to-model-to-lock-in.

◆ Borrowed Judgment

Moderate. Nemotron model alignment, training data, and safety decisions are NVIDIA's. The model-to-silicon borrowed judgment is unique to NVIDIA: when the enterprise uses Nemotron on NVIDIA GPUs, both the model and the hardware are NVIDIA's. The enterprise borrows NVIDIA's judgment at every layer of the inference path. No other vendor has this — even Google (Gemini on TPU) separates the model team (DeepMind) from the silicon team.

◆ Working Notes

The NVIDIA-as-model-provider dynamic creates an unusual competitive position: NVIDIA wants enterprises to adopt Nemotron (NVIDIA model revenue) AND wants enterprises to run OpenAI/Anthropic/Meta models on NVIDIA GPUs (NVIDIA hardware revenue). Both outcomes benefit NVIDIA, but they benefit NVIDIA in different ways. If Nemotron succeeds too well, it reduces the model diversity that drives GPU demand from multiple model providers.

✦ Summary Finding

4+1 Layer AI Infrastructure Model · Vendor Assessment Series · The CTO Advisor LLC · thectoadvisor.com