vGPU – en.vmik.net

Recently I’ve been asked to deploy a “Monster VM” with 8 H200 GPUs aboard. Although everything looks simple, and there weren’t any problems with VMs with small vGPUs, the first thing I faced after running such a large VM was an error:

Error message from esxi-01: The firmware could not allocate 50331648 KB of PCI MMIO. Increase the size of PCI MMIO and try again.

Luckily, I read a recent VMware document, “Deploy Distributed LLM Inference with GPUDirect RDMA over InfiniBand in VMware Private AI“, a few weeks before, and this moment was covered.

I strongly recommend this document to anyone utilizing large GPU servers (HGX, DGX), particularly when cross-server communication is necessary.

To run such a large VM, it requires adjusting the VM’s MMIO settings to add two values to the VM’s advanced settings:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = 1024

MMIO size should be calculated based on the number and type of passthrough devices attached to the VM.

According to the doc above, each passthrough NVIDIA H100 (or H200) GPU requires 128 GB of MMIO space.

You can obtain more information about calculating the MMIO size in KB 323402. Please refer to the example, which explains how to calculate MMIO size based on the GPU size.

After adjusting MMIO settings, the VM will boot successfully.

Tag: vGPU

Quick Fix: Adjusting MMIO values in ESXi 8U3 to use Large GPUs