Wednesday, April 23, 2014

Sockets vs. Cores - VM Configuration


As a VMware Engineer you would have wondered about this during some point in your career. When you create a Virtual Machine, how to determine if you want to give 4 vCPUs to a VM by 2 Sockets and 2 Cores or 1 Socket and 4 Cores. There were conflicting information out there and the decision making for sockets vs. cores was always a challenge for VMs requiring more compute.

There has been a definitive guidance from VMware vSphere team. Which lays out the two simple best practices.

  1. At the time of creation of a VM, vSphere will create as many virtual sockets as requested vCPUs and the cores per socket is equal to one. This will enable vNUMA to select and present the best virtual NUMA topology to the guest operating system. This makes the configuration "wide" and "flat".

  1. When you need to change the cores per socket, ensure that you mirror physical server's NUMA topology. Since the default cores per socket is changed the configuration is no longer "wide" and "fat" hence vNUMA will not automatically pick the best NUMA configuration based on the physical server. But it will use the changed configuration that has potential for topology mismatch.

The details are per recent blog post by vSphere team here.

Some terminology description -

NUMA:  NUMA systems are advanced server platforms with more than one system bus. A multi GHz processor needs a large memory bandwidth to use its power effectively. The problem becomes more obvious on Symmetric Multiprocessing Systems where many processors are competing for memory bandwidth.

NUMA links small nodes using a high performance connection. Each node contains processors and memory, however a memory controller allows the node to use memory on all other nodes. When a processor accesses memory that is not on its own node, it traverses over the NUMA connection - resulting in lower access speed compared to local memory.

NUMA Scheduling in ESXi

NUMA scheduler on ESXi dynamically balances processor load and memory locality. The algorithm works as described below.

1
Each virtual machine managed by the NUMA scheduler is assigned a home node. A home node is one of the system’s NUMA nodes containing processors and local memory, as indicated by the System Resource Allocation Table (SRAT).
2
When memory is allocated to a virtual machine, the ESXi host preferentially allocates it from the home node. The virtual CPUs of the virtual machine are constrained to run on the home node to maximize memory locality.
3
The NUMA scheduler can dynamically change a virtual machine's home node to respond to changes in system load. The scheduler might migrate a virtual machine to a new home node to reduce processor load imbalance. Because this might cause more of its memory to be remote, the scheduler might migrate the virtual machine’s memory dynamically to its new home node to improve memory locality.

In summary, A VM administrator should choose the maximum amount of sockets available for a VM and if the cores per socket needs to be adjusted, it should be done in accordance to the physical server's NUMA topology. NUMA scheduling and memory placement policies in ESXi can manage all virtual machines transparently, so that administrators do not need to address the complexity of balancing virtual machines between nodes explicitly.

No comments:

Post a Comment