DR Design Guides
Failover Planning
Home

Failover Planning  


Are you Planning a failover?  

We recommend you review our planning checklist for proven process to successfully failover:

Failover Planning Guide and checklist

For a summary of Best Practices for Eyeglass and PowerScale Refer to Eyeglass and PowerScale DR Best Practices.


Do I need to remount shares with Access Zone Failover or Pool failover?

This is a common question the data integrity failover option helps improve and reduce the impact on the client machine with Access Zone and IP pool failover.  The data integrity failover feature disconnects the netbios session for all shares involved in a failover.  This also removes the cached IP session to the source cluster. After the DNS redirect step is completed Windows machines can mount the DR cluster correctly with some exceptions identified below.   

Windows machines can re-establish the netbios session and query DNS to get an IP from the DR cluster.  This avoids a remount requirement on the Windows machine.

Limitations

  1. A machine with an open file will continue to cache the source cluster netbios session.
  2. Machines with no active open file can switch clusters without a remount requirement.
  3. A user active using explorer on the share that was failed over will still cache the netbios session of the source cluster, and will require a remount of the share. 


How to determine best approach for Quota for failover?

Quota have some challenges for failover with Onefs 8.x.   The quota scan job runs as soon as new quotas are created. The quota scan job sets a flag on newly created quotas to indicate when the quota domain has been created.    SyncIQ operations that conflict when Quotas are marked with a flag indicating the quota domain has not been created yet.  This can fail SyncIQ operations for make writable or resync prep step in a failover.

Quota failover options:

  1. Failover quotas before a planned failover and leave quotas on both clusters after failover.
    1. During Failover use the new skip failover option by unchecking the quota check box and the quota step of the failover will be skipped leaving quotas on the source cluster.  This should be used if failing over and back within a weekend to avoid interference from quota scan and SyncIQ. The quotas will not be required for the failover testing, and it is safer to leave them on source cluster.
      1. NOTE:  On failback make sure to uncheck the quota failover option.
    2. New option in 2.5.3 or later see the CLI guide to enable quota inventory job to collect quotas on a new job on a default twice per day schedule.  Pre-sync quotas is also available as well using a new quota sync schedule job.  CLI guide.  This will ensure quota's are not failed over and are pre-staged on the target DR cluster at all times.
      1. NOTE: Use the skip quota option in DR Assistant if pre-syncing quotas.

Access Zone Failover

Eyeglass uses the Access Zone as the basis for grouping data for failover when customers choose not to use DFS mode or per SyncIQ.  This Access Zone is selected as the unit of failover to simplify the DR readiness to the Access Zone level planning and failover operations. Shares, exports and quotas can be failed over with this mode of failover.

Access Zone failover includes networking failover of SmartConnect Zones and any SmartConnect Zone aliases that exist as well.  Eyeglass must failover ALL IP pools that are members of the Access Zone and all aliases, which means all SyncIQ policies and ALL shares, exports and quotas must failover at the same time.   The SmartConnect failover process requires the source cluster zone names to be renamed (not deleted) during failover to avoid SPN collisions in Active Directory, and to prevent clients from mounting the source cluster after failover.

This requires planning and mapping of IP pools from source to target clusters before readiness for the Access Zone is marked as ready for failover.

In addition, SMB authentication  depends on the AD machine account to have the correct and SPN  values for SmartConnect Zones,  failover and authentication depend on SPN’s being registered with the cluster that is writable .  Eyeglass Access Zone failover automates SPN management. Eyeglass Access Zone failover also creates  SmartConnect Zone aliases required to access data with a simple DNS update that that will delegate the SmartConnect Zone to the PowerScale cluster. (NOTE: DFS mode does not require DNS, SPN and SmartConnect Zone changes during failover)  

The following figure shows Cluster Configuration Before Access Zone Failover.  This is the normal  state with primary and secondary clusters available. Preparation for Failover  is the creation of  mapping hints before failover.

The following second figure shows the  Cluster Configuration Access Zone  Failover Steps with the Primary Cluster not accessible (e.g. Real DR example)

Eyeglass DR Assistant - Access Zone Failover - Summary

  1. Ensure that there is no live access to data, OR enable the Data Integrity failover option to disable access to SMB Shares before failover.
  2. Begin Failover (Eyeglass automated).
  3. Validation (Eyeglass automated).
  4. Set configuration replication for policies to USERDISABLED (Eyeglass automated).
  5. Provide write access to data on target (Eyeglass automated).
  6. Move SmartConnect Zone to Target (Eyeglass automated).
  7. Update SPN to allow for authentication against target (Eyeglass automated).
  8. Repoint DNS to the Target cluster IP address (use post failover script) (Eyeglass automated with scripting).
  9. Refresh session to pick up DNS change (use post failover script) (Eyeglass automated with scripting).

For details on this failover mode consult the Access Zone Failover Guide link. 

IP Pool Failover

Eyeglass now offers IP pools as new failover unit within an Access Zone.  The IP pool is selected as the unit of failover to simplify the DR readiness to the IP pool now has its own DR Readiness calculation and failover operations. Shares, exports and quotas can be failed over with this mode of failover.

IP pool  failover includes networking failover of SmartConnect Zones and any SmartConnect Zone aliases that exist as well.  Eyeglass must failover ALL policies mapped to the Pool using IP pool policy mapping UI in the DR Dashboard.  All  SmartConnect names and aliases configured on the pool, and all mapped SyncIQ policies plus ALL shares, exports and quotas associated to the SyncIQ policies will failover at the same time.   The SmartConnect failover process requires the source cluster zone names to be renamed (not deleted) during failover to avoid SPN collisions in Active Directory and to prevent clients from mounting the source cluster after failover.

This requires planning and mapping of IP pools from source to target clusters before readiness for the pools is marked as ready for failover.

It also requires converting an Access Zone to IP pool failover, which means all pools within an Access Zone must have a policy mapped to a pool before ANY pool in the zone can be failed over.  

In addition, SMB authentication  depends on the AD machine account to have the correct and SPN  values for SmartConnect Zones.  Failover and authentication depend on SPN’s being registered with the cluster that is writable .  Eyeglass IP pool  failover automates SPN management, along with SmartConnect Zone aliases creation needed to access data with a simple DNS update that delegates the SmartConnect Zone to the PowerScale cluster. (NOTE: DFS mode does not require DNS, SPN and SmartConnect zone changes during failover).  DFS IP pools can be failed with Pool failover feature.

The following figure shows IP Pool Failover with the Primary Cluster is not accessible (e.g. Real DR example):

Eyeglass DR Assistant - IP pool Failover - Summary

  1. Ensure that there is no live access to data, OR enable Data Integrity failover option to disable access to SMB Shares before failover.
  2. Begin Failover (Eyeglass automated).
  3. Validation (Eyeglass automated).
  4. Set configuration replication for policies to USERDISABLED (Eyeglass automated).
  5. Provide write access to data on target (Eyeglass automated).
  6. Move SmartConnect zone to Target (Eyeglass automated).
  7. Update SPN to allow for authentication against target (Eyeglass automated).
  8. Repoint DNS to the Target cluster IP address (use post failover script) (Eyeglass automated with scripting).
  9. Refresh session to pick up DNS change (use post failover script) (Eyeglass automated with scripting).

For details on this failover mode consult the Access Zone Failover Guide link.   Look for the IP pool failover section.

SyncIQ DFS Mode with Eyeglass

This mode enables the most seamless failover and failback operations with full Quota failover/failback integration (excluding exports).  The solution enables zero touch client failover to always mount the writable copy of the SyncIQ data with quotas active, and requires no DNS updates, no remount, no re-authentication.

This is achieved using DFS folder UNC targets (with the same share name), a SmartConnect Zone for each cluster setup with DFS to use both clusters, and Eyeglass ensures shares only existing on one cluster at a time and moves them during failover events.  The DFS Target folder - path to the Secondary cluster will automatically be activated once the shares are created by Eyeglass.

NOTE: It’s possible to use 2 different SmartConnect Zones on source and destination cluster so that nothing needs to change during failover on either cluster.  The following figure shows typical DFS folder setup: 


Eyeglass DR Assistant - DFS Mode Failover - Summary

  1. Ensure that there is no live access to data, OR enable Data Integrity failover option to disable access to SMB Shares before failover.
  2. Begin Failover (Eyeglass automated).
  3. Validation (Eyeglass automated).
  4. Set configuration replication for policies to USERDISABLED (Eyeglass automated).
  5. Provide write access to data on target (Eyeglass automated).
  6. (Not performed and not required) Move SmartConnect zone to Target (Eyeglass automated).
  7. (Not performed and not required) Update SPN to allow for authentication against target (Eyeglass automated).
  8. (Not performed and not required) Repoint DNS to the Target cluster IP address (use post failover script) (Eyeglass automated with scripting).
  9. Fail over Shares and Quotas - shares and quotas are created on target and deleted from the source cluster (Eyeglass automated).
  10. DFS Clients automatically switch to DR cluster with DFS 2nd Folder UNC target path.

For Details on this failover mode consult the Microsoft DFS Mode Failover Guide link. 

SyncIQ Mode with Eyeglass

This mode of failure allows targeted failover with some manual steps that allows selected policies to failover without entire Access Zone of policies.   Since no SPN management is performed with this failover type, it is better suited to NFS export failover + quotas.  Shares and exports are pre-synced with Eyeglass so both protocols are supported with this mode.

This failover mode does not automate SmartConnect Zone failover as is done with Access Zone failover.  This means selective SmartConnect Zones can be failed over requiring manual SmartConnect Zone aliases and DNS update to complete the failover.

This mode of failover is also useful with post failover script engine that can execute host side unmount and remount commands using scripts and leveraging the samples provided with Eyeglass.  Superna Professional Services can also be engaged to build host side scripts for customer requirements.

Review the  Script Engine Overview  section in the Eyeglass Administration Guide

These scripts allow simple SSH based remote host unmount and remount automation but can also be done without needing to update DNS since the target cluster SmartConnect Zone can be mounted directly once the SyncIQ policy is marked writable on the target cluster.

We recommend this option for automation when the host count is <30.  If the host count is higher we recommend Access Zone failover and DNS updates.

The following diagrams show the flow of failover and steps with sample commands that would be run during the Eyeglass policy failover.  The SPN commands are shown if SMB manual failover is being executed.

For Details on this failover mode consult the SyncIQ Policy Failover Guide. 


DR Rehearsal Mode

This is a new failover mode that allows the target cluster to have its file system writable while production cluster stays in production.   During this time only 1 copy of data exists, when DR Rehearsal mode is disabled the changes to the target cluster are discarded and re-synced from the production cluster.  A different DNS name is required to mount the data.  

Pros:

  1. Faster failover and testing is possible.
  2. Production stays operational .
  3. AD and network cloning is possible to mirror production .
Cons:
  1. Data is not synced during the testing.



Failover Readiness

The Eyeglass assisted failover has diagnostics to detect when failover is not possible or recommended, and updates a simple DR Dashboard to indicate your current state.  


For Access Zones or IP pools, the DR Dashboard indicates when any of the following need attention: Data sync issues, configuration sync issues, SPN out of sync conditions and invalid IP pool mapping for IP pool or Access Zone failover.  

The DR Dashboard also provides a per SyncIQ readiness and DFS mode policy dashboard for SyncIQ + configuration sync readiness.  This allows sub Access failover readiness to be assessed versus the entire Access Zone. Eyeglass validates your DR readiness at regular intervals and will notify you via Eyeglass external alarming (if configured) if a problem is detected.

The Eyeglass Runbook Robot feature is another way to validate your readiness by automating a failover on a specific, non-production “EyeglassRunbootRobot” Access Zone or SyncIQ Policy every night at midnight.  This exercises the actual failover steps in your environment daily and will also notify you via Eyeglass external alarming (if configured) when a problem is detected.

This feature operates as cluster witness and mounts the cluster over NFS and writes and reads back test data to verify failover from the client view of the cluster.   It can be configured in basic or advanced modes.  See Runbook Robot admin guide.

The basic mode only uses a SyncIQ policy for failover with no other logic running.  Easy to setup and provides quick test of failover and failback.

The advanced mode tests all logic and operates with the Access Zone failover mode and provides the same NFS write and re-read logic in addition to SPN management and SmartConnect Zone mapping and failover logic.

© Superna Inc