Simulated Disaster Scenario - DFS Test SyncIQ Policy Failover

Use this procedure to simulate a DFS policy failover using uncontrolled mode to simulate a real DR event. This assumes a DFS mode policy has been enabled inside the Runbook Robot Access Zone.

Note: This test can be done with or without production data failover in the same maintenance window.

Note: Before implementing the following simulated disaster scenario DFS Test SyncIQ policy failover steps, please make sure you have followed instructions/steps in the “Important Note” , “Initial Environment Setup”, “Verify Environment Setup” and “Support Statement” sections.

Pre Simulated Disaster - Cluster1 (prod cluster) is available - Controlled Failover

Review all steps in the Failover Planning Guide and checklist before beginning. This is required to maintain support for this procedure. See support statement above on planning guide requirement.
Perform Microsoft DFS controlled failover for Production SyncIQ policies (the uncontrolled test policy will NOT be failed over at this step) from Cluster1 to Cluster2 using Eyeglass.
On Eyeglass, enable the Production SyncIQ mirror-policy job in the Jobs window if it is in USERDISABLED state post failover.
Write data to production shares protected by Production SyncIQ Policy from DFS mount (confirm that Cluster2 share path is the active target) on Windows Client after controlled failover.
Production data controlled failover completed. See below sample DFS Readiness at this stage.

Simulated Disaster - Cluster1 (prod cluster) becomes unavailable - Uncontrolled Failover

This procedure simulates a source cluster that has been destroyed or is unreachable on the network for a long period of time and requires a failover to the secondary site.

Note: This step will only operate against test policy and Access Zone created in initial setup section only, to maintain access to support for this procedure.

Simulate Cluster 1 failure:

On Cluster1 OneFS UI, remove the node interfaces from dedicated IP pool used for client access to EyeglassRunbookRobot-DFSzone Access Zone (NOTE: perform only on this 1 IP pool). Consult EMC documentation. The DFS folder target path from Cluster1 will now be failed when the node interfaces are removed from the pool.
Step 1a. above simulates DNS response failure to Cluster1 EyeglassRunbookRobot-DFSzone Access Zone, without actually impacting SSIP or normal DNS operations. At this point in the process, name resolution is down, and NetBIOS sessions are disconnected from the Cluster1 EyeglassRunbookRobot-DFSzone Access Zone.

Notice from the above screenshot, that name resolution to the SmartConnect zone name is not resolving as expected (SERVFAIL is returned). At this point we have simulated a disaster as Cluster1’s EyeglassRunbookRobot-DFSzone Access Zone SmartConnect zone name resolution is failing, and no shares can be access on Cluster1 on this Access Zone.

Set the schedule for the EyeglassRunbookRobot DFS policy on the source cluster to manual. As the policy won’t be able to run anyway if the source cluster has been destroyed. Do not proceed until this step is done.

Perform DFS uncontrolled failover for EyeglassRunbookRobot Test DFS mode SyncIQ policy from Cluster1 to Cluster2 using Superna Eyeglass. (see documentation for more details)

Wait until the uncontrolled failover completes.
Write data to share protected by EyeglassRunbookRobot Test DFS SyncIQ Policy from the DFS mount (confirm that Cluster2 share path is the active target) on Windows Client after uncontrolled failover.

Uncontrolled DFS Failover is complete.

Post Simulated Disaster - Cluster1 (prod cluster) Recovery Steps for DFS

These steps are executed to restore the uncontrolled policies to a working state. The production data is currently failed over to Cluster 2 using controlled failover. Some customers may choose to stay on Cluster 2 as production for some period of time before planning a failback. The test policies can be recovered by following the steps below:

Simulate Cluster 1 returning to Service :

On Cluster1 OneFS UI, rename shares within Test SyncIQ policy path to have igls-dfs-<sharename> format (this step should happen after "uncontrolled failover" step)
On Cluster1 OneFS UI, reconnect previously removed node interfaces back to IP pool used for DFS client access to test data on EyeglassRunbookRobot-DFSzone Access Zone.

On Cluster1 OneFS UI, run resync-prep on EyeglassRunbookRobot-DFS Test SyncIQ policy (consult EMC Documentation).
Verify that resync-prep process was completed without error before proceeding to next steps.

Check on OneFS SyncIQ reports tab to make all steps pass successfully.

Check the job state in Eyeglass

From Eyeglass, verify both policies on Cluster 1 and Cluster 2 and re-enable the Eyeglass job for the EyeglassRunbookRobot-DFS Test Policy on Cluster 1 and the mirror policy on Cluster 2 in the Jobs icon.
Allow Eyeglass Configuration Data Replication to run at least once.
From Eyeglass Jobs-->“Running Jobs” window, verify that Eyeglass Configuration Data Replication in step 4b above has completed without errors.

Note: As stated in step 4b above, Eyeglass Configuration Data Replication task must complete before continuing with steps below.

Verify Eyeglass jobs show policy state correctly with Cluster 1 policy showing policy Disabled and Cluster 2 showing Enabled and OK (green).
Wait for Config sync to correctly show the above state.
Do not continue until the above validations are done.

Perform Microsoft DFS-type controlled failback from Cluster 2 to Cluster 1 for EyeglassRunbookRobot DFS Test SyncIQ mirror-policy using Superna Eyeglass DR Assistant.

Wait for Failover to complete,
Write data to the share protected by EyeglassRunbookRobot DFS Test SyncIQ Policy from a DFS mount (confirm that Cluster1 share path is the active target) on Windows Client after controlled failback.

Perform Microsoft-DFS-type controlled failback of all Production SyncIQ mirror-policies from Cluster 2 to Cluster 1 using Superna Eyeglass.
Write data to share protected by Production SyncIQ Policy from DFS mount (confirm that Cluster1 share path is the active target) on Windows Client after controlled failback.