Like a nasty cockroach or slippery rodent on the subway track — bugs plague Cisco devices just as any other code out there (sorry Jeremy Cioara — there ARE bugs!) lol
Yes, deep in those smelly tunnels are thos unspeakable pests! I’ve ran into a bunch of them in the past but I think I’ll start tracking them as I find them just since I’m sure others will run into them as well. Use the CATEGORY > BUGS to see if any of the titles makes sense for you — you might be looking at a bug!
Today’s blog post is bug # CSCub31212
N5k System restart due to “ipfib” hap reset.
I had a 5k of mine randomly reboot on me for some strange reason. Luckily the backup kicked in so we didn’t lose anything aside from a bunch of alerts coming in, but after looking at the logs I saw nothing except for logs within the past hour!
I looked at all BGP/EIGRP neighborships and saw that none of the adj broke except for the switch that rebooted. (use #sh ip eigrp nei and #sh ip bgp summ to see the time in how long a neighbor has been up for).
Next, I did a #sh system reset-reason
CORE-02# sh system reset-reason
—– reset reason for Supervisor-module 1 (from Supervisor in slot 1) —
1) At 43082 usecs after Mon Aug 25 13:53:09 2014
Reason: Reset triggered due to HA policy of Reset
Service: ipfib hap reset
So for some reason this process caused the switch to reboot?
Google around didnt take me long to find said bug ID.
“The leak is around 1.8-2MB per day which results in a crash after almost 130 days . (250MB is the limit)
To identify the current memory allocation for ipfib process , please use the following command and check
for MemAlloc. If you observe the MemAlloc is close to 250000000(250MB) you are on
the risk of a crash .”
Basically its ticking time bomb. Fix is to upgrade the code to a newer version.