Operations Guides
Simulated Disaster Scenario - EyeglassRunbookRobot Access Zone Failover
Home

Simulated Disaster Scenario  - EyeglassRunbookRobot Access Zone Failover

Use this procedure to simulate an Access Zone failover using uncontrolled mode, to simulate a DR event.  This assumes dual delegation has been implemented and the Runbook Robot Access Zone is fully functional.

Note: This test can be done with or without production data failover in the same maintenance window.

Note: Before implementing the following simulated disaster scenario Access Zone failover steps, please make sure you have followed instructions/steps in the “Important Note” , “Initial Environment Setup”, “Verify Environment Setup” and “Support Statement” sections.

Pre Simulated Disaster - Cluster1 (prod) is available - Controlled Failover

  1. Review all steps in the “Failover Planning Guide and checklist” before beginning.  This is required to maintain support for this procedure. See support statement above on planning guide requirement.
  2. Using Eyeglass, perform controlled Failover of your Production Access Zone(s) from Cluster1 to Cluster2.  
  3. On Eyeglass, enable each Production SyncIQ mirror-policy jobs for your Production Access Zone if they are in USERDISABLED state.  Consult “Failover Planning Guide and checklist” to maintain support for this procedure.
  4. Write data to share(s) protected by Production SyncIQ Policies from DFS mount (confirm that Cluster2 share path is the active target) on Windows Client after controlled failover of the Production Access Zone.
  5. Do not proceed until above failover is validated as successful.
  6. Do not fail over the EyeglassRunbookRobot-SMBzone Access Zone on Cluster1.
  7. Procedure complete.  Do not continue to next steps until successful Controlled Failover of your Production Access Zone(s) has completed successfully.

Simulated Disaster - Cluster 1 (prod) becomes unavailable: EyeglassRunbookRobot-SMBzone Access Zone

  1. Simulate Cluster 1 failure: See below pool Cluster 1 to Cluster 2 IP pool mapping just pre-disaster:
  1. On Cluster1 OneFS UI, disconnect node interfaces from dedicated IP pool used for client access to test data on EyeglassRunbookRobot-SMBzone Access Zone (IP Pool assigned to the robot Access Zone). If SMB folder was properly set up, SMB folder target path from Cluster1 will fail when the node interfaces are removed from the pool. This is required to disconnect SMB session from clients to this pool and cause SMB mount failure.
  2. Step 1a. above  simulates DNS response failure to Cluster 1 as well without any IP’s in the pool, without actually impacting SSIP or normal DNS operations. At this point in the process, name resolution is down, and NetBIOS sessions are disconnected from Cluster 1 EyeglassRunbookRobot-SMBzone Access Zone.

                

Notice from the above screenshot, that name resolution to SmartConnect name is down as expected (SERVFAIL is returned). At this point we have simulated a disaster as Cluster1 EyeglassRunbookRobot-SMBzone Access Zone SmartConnect zone name resolution is failing, and no shares can be access on Cluster1 EyeglassRunbookRobot-SMBzone Access Zone.

  1. Also, set the schedule for the EyeglassRunbookRobot-SMB Test policy on Cluster 1 to manual. As a policy won’t be able to run anyway if the source cluster has been destroyed. Do not proceed until this step is done.

NOTE: in a real DR event, it is assumed the source cluster is unreachable on the network.  

NOTE: Make note of the schedule, it will need to be reapplied at the end of this procedure.        

  1. Perform Failover: Using Eyeglass, perform uncontrolled failover for EyeglassRunbookRobot-SMBzone Access Zone from Cluster1 to Cluster2.

  1. Wait until uncontrolled failover completes.
  1. Check SPN’s are failed over in AD correctly using ADSI Edit.
  2. Validation: Test using nslookup to make sure DNS now resolves to Cluster 2. 
  3. Correct or debug resolution of SmartConnect name before continuing.
  1. Test Client Access: This step requires unmount and remount of the share to get new IP address. 
  1. Reboot the client machine that was used to validate the share pre-disaster to guarantee that the Netbios session to Cluster1 has not been preserved.
  2. Mount the share.
  3. Write data to share protected by EyeglassRunbookRobot-SMB Test SyncIQ Policy from SMB mount on Windows Client after the uncontrolled failover.
  1. Uncontrolled Access Failover complete.

Post Simulated Disaster - Cluster1 (prod) becomes available: EyeglassRunbookRobot-SMBzone Access Zone

  1. Simulate Cluster 1 availability: See below Cluster 2 to Cluster 1 EyeglassRunbookRobot-SMBzone Access Zone IP pool mapping just after Cluster1 is available. Note that previously removed node interface have not been re-connected at this point:
  1. One Cluster1 OneFS UI, edit source cluster EyeglassRunbookRobot-SMBzone Access Zone IP pool SmartConnect name and apply igls-original prefix to existing SmartConnect name.  This is required step before re-connecting the previously removed Cluster1 node interface.
  2. On Cluster1 OneFS UI, reconnect previously removed node interfaces back to IP pool used for client access to test data on EyeglassRunbookRobot-SMBzone Access Zone.
  1. On Cluster1 OneFS UI, run resync-prep on EyeglassRunbookRobot-SMB Test SyncIQ policy.  Consult EMC Documentation.
  2. Verify that resync-prep process was completed without error before proceeding to next steps.
  1. Resolve any errors before continuing.  Resync prep must have run successfully before you attempt to complete remaining steps. Check the cluster reports show no errors before continuing.
  1. Check Eyeglass job state
  1. From Eyeglass, verify EyeglassRunbookRobot-SMB Test Policy, and the mirror policy are in the correct state.  Mirror policy should be Enabled and the Cluster 1 policy should be Disabled state.  
  2. Allow Eyeglass Configuration Data Replication to run at least once.
  3. Note: Configuration Data Replication task must complete before continuing with steps below. Verify a config sync task has been run from running jobs window without errors.
  4. Verify Eyeglass jobs show policy state correctly with Cluster 1 policy showing policy Disabled and Cluster 2 mirror policy showing Enabled and OK (green).
  5. Wait for Config sync to correctly show the above state.
  6. Do not continue until the above validations are done.
  1. From the Eyeglass Jobs window, select Run Now from the Select a Bulk Action menu and run the Zone Failover Readiness jobs. This allows a new Access Zone Failover Readiness Audit to be computed.

        

  1. On Eyeglass DR Dashboard, confirm that zone readiness looks good for the EyeglassRunbookRobot-SMBzone Access Zone.
  2. Perform Controlled Failback: Using Eyeglass, perform controlled failback of EyeglassRunbookRobot-SMBzone Access Zone from Cluster 2 to Cluster 1.

  1. Wait until controlled failover completes.
  1. Check SPNs are failed over in AD correctly using ADSI Edit.
  2. Validation: Test using nslookup to make sure DNS now resolves to Cluster 1.
  3. Correct or debug resolution of SmartConnect name before continuing.
  1. Test Client Access: This step requires unmount and remount of the share to get new IP address.
  1. Reboot the client machine that was used to validate the share pre-disaster to guarantee that the Netbios session to Cluster2 has not been preserved.
  2. Mount the share.
  3. Write data to share protected by EyeglassRunbookRobot-SMB Test SyncIQ Policy from SMB mount on Windows Client after uncontrolled failover.
  1. Controlled procedure complete.
  2. If performing failback of Production data follow planning guide process to maintain support.

Copyright Superna LLC 2017

© Superna Inc