As a VMware Engineer
you would have wondered about this during some point in your career. When you
create a Virtual Machine, how to determine if you want to give 4 vCPUs to a VM
by 2 Sockets and 2 Cores or 1 Socket and 4 Cores. There were conflicting information
out there and the decision making for sockets vs. cores was always a challenge
for VMs requiring more compute.
There has been a
definitive guidance from VMware vSphere team. Which lays out the two simple
best practices.
- At the time of creation of a VM, vSphere will create as many virtual sockets as requested vCPUs and the cores per socket is equal to one. This will enable vNUMA to select and present the best virtual NUMA topology to the guest operating system. This makes the configuration "wide" and "flat".
- When you need to change the cores per socket, ensure that you mirror physical server's NUMA topology. Since the default cores per socket is changed the configuration is no longer "wide" and "fat" hence vNUMA will not automatically pick the best NUMA configuration based on the physical server. But it will use the changed configuration that has potential for topology mismatch.
The details are per recent blog post by vSphere team here.
Some terminology
description -
NUMA:
NUMA systems are advanced server platforms with more than one system
bus. A multi GHz processor needs a large memory bandwidth to use its power
effectively. The problem becomes more obvious on Symmetric Multiprocessing
Systems where many processors are competing for memory bandwidth.
NUMA links small
nodes using a high performance connection. Each node contains processors and
memory, however a memory controller allows the node to use memory on all other
nodes. When a processor accesses memory that is not on its own node, it
traverses over the NUMA connection - resulting in lower access speed compared
to local memory.
NUMA Scheduling in ESXi
NUMA scheduler on
ESXi dynamically balances processor load and memory locality. The algorithm
works as described below.
1
|
Each virtual
machine managed by the NUMA scheduler is assigned a home node. A home node is
one of the system’s NUMA nodes containing processors and local memory, as
indicated by the System Resource Allocation Table (SRAT).
|
2
|
When memory is
allocated to a virtual machine, the ESXi host preferentially allocates it
from the home node. The virtual CPUs of the virtual machine are constrained
to run on the home node to maximize memory locality.
|
3
|
The NUMA scheduler
can dynamically change a virtual machine's home node to respond to changes in
system load. The scheduler might migrate a virtual machine to a new home node
to reduce processor load imbalance. Because this might cause more of its
memory to be remote, the scheduler might migrate the virtual machine’s memory
dynamically to its new home node to improve memory locality.
|
In summary, A VM
administrator should choose the maximum amount of sockets available for a VM
and if the cores per socket needs to be adjusted, it should be done in accordance to the
physical server's NUMA topology. NUMA scheduling and memory placement policies
in ESXi can manage all virtual machines transparently, so that administrators
do not need to address the complexity of balancing virtual machines between
nodes explicitly.
No comments:
Post a Comment