Recently I had to upgrade our ESXi hosts from Update 2 to Update 3 due to security patch requirements. This requirement stretches across two separate physical environments, one running IBM blades and the other running on Cisco UCS blade chassis in a Flexpod configuration. The upgrade paths for both are slightly different, and they also run on different vCenter platforms. Both of these also have different upgrade paths as one is running VMware SRM and is in linked mode. I’m not going to discuss the IBM upgrades but I did need to upgrade the firmware of the Infrastructure and Servers for Cisco UCSM.
Before you being any upgrade process I highly recommend reading the release notes to make sure that a) an upgrade path exists from your current version, b) you become aware of any known issues in the new version and c) the features you want exist in the new version
UCS Upgrade Prep Work
Check the UCS Release Guides
Check the release notes to make sure all the components and modules are supported. The release notes for UCS Manager can be found on their site. The link is listed further below in the documents section.
Some of the things to check within the release notes are:
* Resolved Caveats
- UCS Version Upgrade patch
- UCS Infrastructure Hardware compatibility
- Minimum software version for UCS Blade servers
Open a Pre-Emptive Support Call
I opened a call with Cisco TAC to investigate the discrepancy in the firmware versions. The advice was to downgrade the B200 M4 server firmware down to 4.0 (1). However, as I was planning on upgrading anyway I’ve now confirmed that the best option is to upgrade to the planned 3.1 version. As part of this upgrade I will also upgrade all the ESXi hosts on that site the same day. There is a second UCS domain on another site that will be upgraded on another date.
Prior to performing my upgrade I also opened pre-emptive support calls with Cisco for each Cisco UCS domain. This turned out to be instrumental to the successful upgrade of the infrastructure firmware. Within an hour I got a response from Cisco asking to connect via Webex to ensure there were going to be no issues. Cisco downloaded ucs-dplug.5.2.3N2.2.23d.gbin for debugging and uploaded it to the FI’s. They then ran a debug to check the size of the /tmp directory using the ‘df -h’ command. If it’s above 1% there can be issues with the upgrade and it can cause the FI to not reboot correctly. This means you’ll need to connect a console cable to the FI and bring it back online manually. Not a pretty situation. From what the support engineer said, this is only an issue for 2.2(6c) and below. The later version don’t have this issue.
UCS Firmware 3.1 Upgrade Documents
UCS Firmware Managment Guide, Release 3.1
UCS Infrastructure Firmware Upgrade Auto Install
UCS Server Firmware Upgrade Auto Install
Cisco UCS Upgrade
Upgrade Cisco UCS steps:
As per the UCSM Firmware Management Guide it’s not possible to upgrade using Auto Install if you are on version 2.1 of UCSM.
You cannot use Auto Install to upgrade either the infrastructure or the servers in a Cisco UCS domain if Cisco UCS Manager in that domain is at a release prior to Cisco UCS 2.1(1).
If you need to perform a manual upgrade there’s a good guide I did previously.
The order of the updates whether via the auto install or the manual install are:
– Install Infrastructure Firmware (FI’s, IOMs etc)
– Install Server Firmware (Blades)
Step 1: Perform a backup of your FI’s and create All Configuration and Full-State backup files.
1.1 In UCS Manager go to the Admin tab, select All, click on the General tab and click Backup Configuration
1.2 From the backup configuration dialog box click Create Backup Operation
1.3 In the backup dialog select the following options:
– Admin State: Enabled
– Type: Full State
– Location of the Backup File: Remote File System
– Protocol: FTP
– Hostname: IP/DNS name of your FTP server
– Remote File: Filename you want
1.4 Perform the above steps again but select All Configuration
If Cisco UCS Manager displays a confirmation dialog box, click OK.
1.5 Click Ok to close the Backup Configuration dialog box.
Step 2: Disable Call Home
In UCS Manager go to the Admin tab, expand All > Communication Management > Call Home. Select Off for Call Home.
Step 3: Verify the FA failover settings
3.1 Click on the Admin tab, expand Equipment > Fabric Interconnects. Click the FI node for the fabric interconnect that you want to verify.
3.2 In the Status area, verify that the Overall Status is operable.
If the status is not operable do not proceed with the firmware upgrade and open a support call with TAC.
If the fields in the High Availability Details area are not displayed, click the Expand icon to the right of the heading.
Verify the settigs for the FI’s:
- Ready field – Yes
- State field – Up
Also take note of which FI is the primary
Step 4: Check the associated Maintenance Policy
4.1 Click on the Servers tab, expand Servers > Policies. Expand the organisation for where you want to create the policy. Expand Maintenance Policies and verify that your maintenance policy is in use by the relevant Server Profile Templates.
If not, you can create a new one and modify the maintenance policy on the Server Profile Template.
Step 5: Verify the Status of the IO modules
5.1 Click on the Equipment tab, expand Equipment > Chassis.
5.2 Click on the chassis you want to verify for and click the IO Modules tab. For each IO module check that the overall status is Operable
Step 6 Verifying the Status of Servers
6.1 Click the Equipment tab and click Equipment. Select the Servers Tab and for each server check that the overall status is Ok.
6.2 Click on Equipment and Chassis > Chassis Number > Servers. Click the Inventory tab and select the Adapters sub-tab. Verify that the adapters are all operational.
Step 7 Check on the Ethernet and Fabric paths (provides a baseline for checking later)
7.1 Connect to the FI via SSH. You can then connect to the individual FI’s using the connect nxos command. Next run the command to show the active number of ethernet connections. The number of connections can be checked against those once he upgrade has been completed. The fwm command will show the number of MAC addresses connected also.
UCS-A /fabric-interconnect # connect nxos a
UCS-A(nxos)# show int br | grep -v down | wc –l
UCS-A(nxos)# show platform fwm info hw-stm | grep ‘1.’ | wc –l
The above can also be performed by connecting to nxos b to execute the same commands.
7.2 While connected via SSH you can also check the number of active fibre connections.
UCS-A /fabric-interconnect # connect nxos a
UCS-A(nxos)# show npv flogi-table
UCS-A(nxos)# show npv flogi-table | grep fc | wc -l
Perform the same commands on fabric b.
7.3 Check the flogi database and the number of servers logged into the fabric.
UCS-A /fabric-interconnect # connect nxos a
UCS-A(nxos)# show flogi database
UCS-A(nxos)# show flogi database | grep fc | wc -l
Step 8 Check the Available Space on Fabric Interconnect
If an image download fails, check whether the bootflash on the fabric interconnect or fabric interconnects in the Cisco UCS has sufficient available space.
Step 8.1 Click on the Equipment tab > expand Equipment > Fabric Interconnects. Click the fabric interconnect on which you want to check the available space. Select the General tab.
Step 8.2 Expand the Local Storage Information area. and check that the FI bootflash is less than 50%. If not, dlete some old files form the FI and also any old tech support files.
Step 9 Download Software Bundles
Go to Cisco’s website and download the relevant software. You’ll then need to upload it to the FIs. This is called downloading the package even though you’re uploading to the fabric interconnects. Odd I know but what can you do. Once you’ve downloaded the required files you can then perform the upgrades by activating the new firmware versions.
Step 9.1 Click the Equipment tab, select the Equipment node. Click on the Firmware Management tab. Click the Installed Firmware tab and select Download Firmware.
Step 9.2 In the Download Firmware dialog box, click the Local File System radio button in the Location of the Image File field.
Step 9.3 Select the bin files to upload and click Select.
The A bin file is for the infrastructure and the B bin file is for the blade servers
The upload of the firmware will take a few moments.
Step 9.4 Click Ok on the download firmware task
Step 9.5 Click Ok once the bundle has been downloaded
The newly uploaded bundles will show up in Firmware Management > Packages subtab.
Upgrading the Infrastructure Firmware with Auto Install:
Step 10: Install the infrastructure firmware.
10.1 Click on the Equipment tab, select Equipment and click on Firmware Management tab. Click the Firmware Auto Install tab and in the actions areas click Install Infrastructure Firmware
10.2 In the Prerequisites I clicked on Ignore All. This was due to some non-critical alerts from the blade servers. Once selected click Next.
10.3 In the Properties area of the Install Infrastructure Firmware dialog box, enter a description, select the firmware version number and select Upgrade Now. Click Finish.
The firmware will begin to install at this point and can be tracked via the FSM.
If there is not enough space under bootflash, a warning will display and the upgrade process will stop.
You will be kicked out of the session and can log in again after approx 5 minutes. Try not to freak out. I’d recommend grabbing a coffee at this point.
Acknowledge the reboot of the primary fabric interconnect. If you do not acknowledge that reboot, Cisco UCS Manager cannot complete the infrastructure upgrade and the upgrade remains pending indefinitely.
10.5 On the toolbar, click Pending Activities. In the Pending Activities dialog box, click the User Acknowledged Activities tab. Select Reboot Now.
10.6 Click OK. UCSM will reboot immediately. You cannot stop this reboot after you click OK.
Upgrading the Server Firmware with Auto Install:
You cannot cancel a server firmware upgrade process after it begins. UCSM applies the changes immediately.
Step 11: Begin the server installation. Click the Equipment tab, select the Equipment node, click the Firmware Management tab and click Firmware Auto Install tab. In the Actions area click Install Server Firmware.
11.1 On the prerequisites page review the settings. Click Ignore All and click Next.
11.2 Select the newly uploaded server firmware bundle for the B-Series and click Next.
11.3 Select the firmware policy you want to update. Make sure that this policy is used by all the Service Profile Templates you need to update.
11.4 Review the dependency packages and click Next
11.5 Review the impacted endpoints and check the new version is correct. Click Install.
11.6 Click Ok on the auto-install confirmation
11.7 You can check the progress of the upgrade using the FSM tab for each server. Each blade will require a reboot as part of the upgrade. This can be activated by selecting Pending Activites in the taskbar and choosing which blades to reboot based on your business requirements.
11.8 Once the reboots have completed the server firmware will have completed its upgrade process.
Once the firmware has been updated on both the infrastructure and servers you can then proceed with the ESXi host upgrades.
Thanks for this info. Was wondering, you mentioned below:
Acknowledge the reboot of the primary fabric interconnect.”
My understanding is that the subordinate goes first on a reboot, is that what you actually acknowledged?
Hi John, apologies for the tardy response. Yes, it is the subordinate that is acknowledged.
Well…this isn’t exactly a no-downtime upgrade. Many resources had indicated that it is, but when it came time to approve the pending changes, it brought my VMware environment completely down.
Hi Jonathan, I’d be interested in knowing why you experienced an outage. At what point did the downtime occur? Was it during the FI upgrade or at the blade changes?
When it failed the FI over it tanked all of my VMware hosts. They lost all network and storage connectivity.
Sounds like everything was pointing at the primary FI rather than the secondary so the reboot/accept change for the FI brought all connectivity down with it. This is just a guess though, I have read about that elsewhere at one point. In the steps it points out to change the FI lead which in most circumstances ensures no outage. Either way, I’m sorry to hear you ran into issues. This guide was written after I had painful upgrades in the past, it’s a bit old now so hopefully it’s still a valuable resource. Thanks for taking the time to respond, I really appreciate the feedback
Nice work Derek. This guide proved helpful with my 2.2(7b) upgrade to 3.1(3f). I’m curious if you have, or will, upgrade from 3.1x to 3.2x and provide similar upgrade guide.
Thanks for the feedback. At the moment I don’t have any plans to do an updated version. I’m hoping to revisit the upgrade process later this year and might get around to it then.
This is very useful. Thank you!
Nice article there. Being a old guy, i prefer the Old way manual upgrade with controlled reboot of each and every component.
Is there anyway that i can control the reboot of FICS and Blades. reason being that i has a Prod running on top and My experience with FICS reboot is that we must keep at least 30 min difference each FIC reboot after changing the Subordinate FIC to Primary. Once i rebooted them within 5 min of each other & that resulted in Outage on half of the NIcs .