QuickFix: ESXi 7 and broken vmsyslog

I encountered a situation where ESXi 7 Update 3g (build 20328353) stopped sending logs to a remote Syslog server. Upon further investigation, it turned out that it also stopped writing logs locally, and the logs in /scratch/log are not updated. Free disk space is not a problem.

During diagnostics, errors were detected in the /var/log/.vmsyslogd.err log:

vmsyslog.main            : CRITICAL] Dropping messages due to log stress (qsize = 25000)

I did not find adequate KB for version 7 on this topic, there was only KB for version 6.5/6.7 with a mention of this error, where it was written that “the problem has been fixed”.

The esxcli system syslog config get command correctly displays the status of the syslog settings, but esxcli system syslog reload does not lead to positive results, logs do not start to be written locally, and are not sent to the remote server.

Restarting the service from the host management interface with the Restart button also does not lead to any results. In the log, you can only see:

vmsyslog.main            : ERROR   ] reloading (3200395)

Which is similar to the result of esxcli system syslog reload.

Stopping and restarting the service from the ESXi interface fails because:

This service with 'vmsyslogd' is marked as 'required' and cannot be stopped.

All that remains is to stop it forcibly directly from the host:

ps -cC | grep vmsyslog
3418096  3418096  vmsyslogd             /bin/python /usr/lib/vmware/vmsyslog/bin/vmsyslogd.pyc 1

We determine the PID of vmsyslog, in this case 3418096, and kill it:

kill -9 3418096

The vmsyslog will show that the process was killed and then automatically restarted:

vmsyslog.main            : ERROR   ] Watchdog 3418095 fired (child 3418096 died with status 9)!
vmsyslog.main            : ERROR   ] Watchdog 3418095 exiting
vmsyslog                 : CRITICAL] vmsyslogd daemon starting (3418940)

After restarting, logs start to be written locally and sent to the remote server.

Loading

Quick fix: VMware. Some of the disks of the virtual machine failed to load.

I have faced an issue with one of the VMs running on VMware ESXi, 7.0.3, 20328353.

Symptoms:

1. VM is running. There are no reports from users;

2. vMotion fails with an error:

The object or item referred to could not be found.

3. After vMotion in hostd.log we can find the following:

Failed to find file size for /vmfs/volumes/.../VM_NAME.nvram: No such file or directory

4. In the vCenter UI under VM a message is displaying:

Some of the disks of the virtual machine VM_NAME failed to load. The information present for them in the virtual machine configuration may be incomple

5. No issues with the storage layer. All VM’s files are located on the datastore;

6. Other VMs on the host and datastore works fine;

7. Recommendations like “Rescan Datastore” don’t work.

Solution.

Before you begin, make sure that you have a backup.

The solution for me was simple, but it required downtime:

  1. Power off the VM;
  2. After that, the VM will be in an inaccessible state;
  3. Remove VM from the vCenter inventory;
  4. Locate VM’s files on the datastore and find vmx file;
  5. Register VM;
  6. Power on the VM.

After that VM should be up and running without issues.

Loading