Something that came up in the field, a ESXi host for some reason magically lost connection to our vCenter cluster. I checked the networking side and nothing came up (just connection down for the ports connecting to the host). I then KVMed into the server and the NICs looked like they were up as well! What? I couldnt ping or RDP to any vm on the host — what gives??
The decision from mgmt was to reboot the box. After rebooting the host, I was able to ping/RDP to the device and it came back into the vCenter. OK so what happened?
Generally I keep the SSH on the ESXi Firewall turned off for obvious security reasons. I went into ESXi to turn it on and I SSHed into the host.
login as: root
Using keyboard-interactive authentication.
The time and date of this login have been sent to the system logs.
VMware offers supported, powerful system administration tools. Please see www.vmware.com/go/sysadmintools for details.
The ESXi Shell can be disabled by an administrative user. See the vSphere Security documentation for more information.
OK so now that I’m in, lets run a few commands….
~ # esxcfg-nics -l
Name PCI Driver Link Speed Duplex MAC Address MTU Description
vmnic0 0000:0b:00.00 bnx2 Up 1000Mbps Full 34:40:b5:d4:4e:c4 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic1 0000:0b:00.01 bnx2 Down 0Mbps Half 34:40:b5:d4:4e:c6 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic2 0000:10:00.00 bnx2 Down 0Mbps Half 5c:f3:fc:9e:c7:d8 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic3 0000:10:00.01 bnx2 Down 0Mbps Half 5c:f3:fc:9e:c7:da 1500 Broadcom Corporation Broadcom NetXtreme II BCM5709 1000Base-T
vmnic4 0000:1a:00.00 be2net Up 10000Mbps Full 00:00:c9:da:3a:62 1500 Emulex Corporation OneConnect 10Gb NIC
vmnic5 0000:1a:00.01 be2net Up 10000Mbps Full 00:00:c9:da:3a:66 1500 Emulex Corporation OneConnect 10Gb NIC
vusb0 Pseudo cdc_ether Up 10Mbps Half 36:40:b5:f9:3d:8f 1500 Unknown Unknown
This command list out the NICs I got on this host. It also tells me the name VMware gives it. In this case, we got I’m interested in 2 NICs on this host named vmnic4 and vmnic5. So to get more details on them run the following…
~ # ethtool -i vmnic4
~ # ethtool -i vmnic0
firmware-version: bc 7.4.0 NCSI 2.0.11
OK this gives me the driver version and the firmware version. Why is this information important for this post? After the VMware Admin went through the logs and found nothing showing why the host lost networking connectivity, and after I looked on my end and nothing looked out of the ordinary, we called up VMware Support and sure — a BUG with this version. The fix was to upgrade this code to a newer code and as of one week (when we did it), so far so good!