Monday, January 21, 2013

VMware ESX Management Services will not restart

I have run into a few occassions with esxi 4 and esx 4 where I wanted to restart the management services and they will not come back. There would be a stuck process (service)  causing it. The result is the host and all the VMs running on the host are disconnected from the cluster - though the virtual machines should continue to run without issues.

However to get that host back into the cluster without rebooting the host you can follow the following steps.

  • Log onto the affected host from a KVM, physical console or iLO/DRAC remote console and browse to the following directory in the file system by running ...
    • #cd /var/lun/vmware
  • Get the Process ID (pid) for the management service by checking the content of the following files and kill the management service
    • #cat vmware-host.pid
    • note the pid
    • use the pid to kill the process by using #kill -9 <pid value>
  • Delete the files vmware-host.pid and watchdog-host.pid by using
    • #rm vmware-host.pid
    • #rm watchdog-host.pid
  • Start the management service by
    • #service mgmt-vmware start