Over the weekend I had to run a failover test for an application within SRM. As SRM can only replicate down to the datastore level and not the VM level this meant doing a full test failover of all VMs but ensuring beforehand that all protected VMs in the Protection Group were set to Isolated Network on the recovery site. This ensure that even though all VMs would be started in the recovery site they would not be accessible on the network and therefore not cause any conflicts. The main concern, outside of a VM not connecting to the isolated network, was that the VM being tested and the application that sits on it are running on Windows 2000. Yes, that’s not a typo the server is running Windows 2000. The application is from back around that period as well so if it drops and can’t be recovered then it’s a massive headache.
Step 1: Power down the production VM
Step 2: Perform Test Recovery
Go to Recovery Plans -> Protection Groups and select Test
When the prompt comes to begin the test verify the direction of the recovery, from the protected site to the recovery site. Enable the Replicate recent changes to recovery site. In most cases you will be already running synchronous writes between the sites and the data will just about be up to date anyway. It is recommended however to perform a recent change replication anyway to make sure that all data is up to date.
Click Next and then click Start to confirm the test recovery
Step 3: Monitor the failover
In the tasks console within vCenter you will see the VMs being reconfigured and powering on.
Take a look at the Recovery Steps within SRM and you can see the list of tasks as they occur. The Priority 1 VMs will power on first and each VM will power on in order.
Once all the VMs have completed the Recovery Steps will show successful and if any VMs have problems powering on it will also be listed here.
Step 4: Change the DNS settings of the server to be the new IP defined in the computer configuration within SRM. To do this go to a Domain Controller and open DNS then change the IP address of the required server.
Step 5: Change the network from Isloated Network to an active domain network. Open Edit Settings for the test VM in the recovery site Select a different network for the Network Connection so that it is no longer on the Isolated Network
Step 6: Send a ping request to the new IP address to ensure it is active. If the VM is still not pingable log onto the server and check that the auto-IP configuration has taken effect on the vNIC. As the change in network was required some of the settings may have been lost. Re-enter the IP address if required.
Step 7: Next you can hand the VM over to your Applications team to perform testing.
Step 8: Once testing has been completed by the applications team and hopefully every test case has been signed off as successful then you can begin to perform the cleanup. The first task is to shutdown the VM in the Recovery site.
Step 9: Edit the settings of the VM and put it back on an Isolated Network. Click Ok to save the changes
Step 10: Go to Recovery Plans -> Protection Groups and select Cleanup
Once prompted verify the direction once again of the cleanup and click Next
Step 11: You can monitor the progress of the cleanup in the vCenter task window or from the recovery steps in SRM. You will see all the VMs in the recovery site managed by SRM be reconfigured and then shutdown
The cleanup should only take a few minutes.
Step 12: Change the DNS for the test server back to its original production IP address from the domain controller
Step 13: Power On the production VM from its source site once again and verify that it can respond to pings on its original IP address
And that’s it, you will have successfully tested the application on just one server in your recovery site.