VCSA 6.7 Upgrade error - The mystery of how the installer connected to the wrong VM
When trying to upgrade our lab vcenter from 6.5 to 6.7 this week we encountered a strange error.
Our lab environement is running vSphere 6.5 on VCSA and we are running with an external PSC. So when starting the upgrade of the PSC I got an error early in the process, while connecting to the source VCSA.
I had remembered that I've seen some strange errors before if the root password of the appliance was expired. This was not the case here, but I did change the password and reboot the appliance to see if that solved the problem.
As I got the same error on the next try I tested an earlier 6.7 (the GA) version to see if that had the same error and it failed on that as well.
After this I set out to read the installer log and saw a lot of messages stating "The guest operations agent is out of date."
This led me to check the VMware tools status on the appliance, but it seemed to be working. And it is "Guest managed" so no immediate update possibility from vCenter. Anyways I tried to update the tools manually by running the following command on the appliance shell: tdnf update open-vm-tools
After this I did another reboot of the appliance, but still I had the same error.
When going back to the installer log I found something interesting.
It seems that when the installer queried my vCenter running the source VCSA it found a totally different VM!
I verified this by checking the VM ID with PowerCLI and also by checking the ID of the actual VCSA
I have no idea what is happening here. I noticed the 127.0.0.1 address in the logfile and suspect the installer might use that in it's queries. As you also can see from the log file it can resolve the hostname to the correct IP so might be that it uses the VM tools reported IP? You can also see that it seems to actually connect to the correct VM as you see the output of the VCSA shell banner..
Anyways, I tried the installer once more but this time with the IP address of the source VCSA, and then it managed to find the correct VM and continue with the deployment process.
The rest of the update process went fairly smoothly. Hopefully we do not encounter any issues with this version so we can update our production environment shortly.