DR Design Guides

How to Validate and troubleshoot A Successful Failover WHEN Data is NOT Accessible on the Target Cluster

Home



Quick and Simple Debug Data Access Steps:

NOTE: follow the order below to find root cause.

  1. READ ME FIRST:  Have you rebooted the PC you are testing SMB share access. Do not proceed if you have not already done this!  Number issue when testing data access.
  2. Did you get this error message below? This means your PC is connected to the source cluster that is read only and locked by syncIQ.  Reboot PC test again.

  3. Check DNS and smartconnect by pinging the FQDN smartconnect name (NEVER USE SHORT NAME TO PING)  and verify the IP address returned is from the target cluster.
    1. If you get DNS failed to resolve error message continue with detailed nslookup steps below to check DNS resolution of smartconnect names. If IP is correct check next steps below.
  4. Mount share from client (DFS or non DFS) with password login popup Error
    1. Did you get this popup message to login with a user name and password?  This is mostly likely an Active Directory SPN  missing on the target cluster AD computer object.  Use ADSI Edit to verify the HOST\<FQDN smartconnect name>  is present on the SPN property of the target cluster.  See animiated GIF example of how to check.   

    2. Enter an administrator account to authenticate and test data access before fixing the SPN to verify this the only issue 
  5. Done.  If none of these steps resolve data access.  Then proceed into the detailed debugging steps below.

Detailed Steps to Debug SmartConnect and DNS Resolution failures:


Test DNS response on the clusters:  This test verifies that SmartConnect names were failed over successfully and also can verify if dual delegation in your DNS environment is setup correctly.  This test also eliminates an issues with your internal DNS and verifies Isilon SmartConnect zones failed over successfully.

  1. If you get failed Ping or name does not resolve name to correct IP address of the TARGET cluster.  Continue with steps below to debug DNS.
    1. From any Windows client machine type “nslookup<press enter key>
    2. Source Cluster DNS Test:
      1. Then type "server x.x.x.x" <enter key>.  where x.x.x.x is the Subnet service ip of the source cluster
      2. type "FQDN of SmartConnect Zone used in failover"  <press enter key> .  Hint: Refer to the failover log from DR Assistant for the full list of SmartConnect names that were failed over.
      3. The expected response is a failed resolution since failover disables the SOURCE cluster DNS response. It is important clients do not recieve an IP from the source cluster.
      4. Example of a failed nslookup on the cluster you failed away from “** server can't find userdata.ad1.test: REFUSED”
      5. NOTE: if lookup does NOT return REFUSED response, then SmartConnect name did not failover correctly AND consult recovery guide Networking section. To fix SmartConnect names.
  2. Target Cluster DNS Test:
    1. Test TARGET cluster SSIP (subnet service IP ) with  DNS
    2. type "server y.y.y.y" <enter key>. where y.y.y.y is the subnet service ip of the target cluster
    3. type "FQDN of SmartConnect Zone used in failover" Refer to the failover log for list of SmartConnect names that were failed over
    4. Expected response SUCCESSFUL NAME RESOLUTION RETURNING IP OF THE TARGET CLUSTER. This means SmartConnect was failed over correctly to the target cluster.  
    5. If DNS test fails this step OR  IP fails to resolve OR is the wrong IP address.   consult recovery guide Networking section. To fix SmartConnect names.
    6. Double check dual delegation is configured correctly.
      1. On a Windows PC type nslookup at the command prompt
      2. type "set type=ns" <enter key>
      3. type "FQDN of smartconnect name"
      4. You should recieve two name server IP in the response and each should be the SSIP on source and target cluster. If you do not recieve 2 name server records in the response. Dual DNS delegation is not configured correctly.  Escalate to your DNS administrator.
      5. Root Cause: Your internal DNS is not setup correctly for dual delegation is not configured correctly, since SSIP on the cluster correctly answers DNS queries. Stop here and correct using guide and video below.   

    Detailed Steps to test Mounting a DFS protected Share with DFS failover mode:

    1. From a Windows client machine connected to Active Directory mount a dfs folder example \\<domain name>\<dfs root name>\<DFS folder name>
    2. Verify file write access by creating a file
      1. If successful - done
    3. If write test fails OR mount fails or mount error
      1. Login to the source cluster and verify the SMB share used for the DFS referral UNC has been renamed igls-dfs-xxx where xxx is the name of the share used for the referal UNC on the DFS folder. If the share name does not have the igls-dfs prefix, add the prefix and save the share.
        1. common error when DFS client is still connected to the source cluster.

      2. Retest data access
      3. If data write still fails, login to the target cluster and verify the SMB share name DOES NOT have igls-DFS prefix, if it does remove the prefix and save the SMB share.
      4. Retest data access
      5. Ask support for a list of SMB Shares that failed to rename during failover to get a complete list of shares that need to have the above steps done to remediate failed rename steps.
    4. Double Check DFS Folder is setup correctly.
      1. If the above steps did not show a rename step failed on source or target cluster.  Continue below to double check DFS folder configuration.
      2. DFS Folder Configuration Validation
        1. Verify DFS referrals are correctly configured in Microsoft DFS Management snapin
        2. Check each item below to verify configuration:
        3. Open DFS manager snapin, right click the DFS folder you are validating
        4. See example
        5. Verify if both DFS referrals exist and are pointing at source and target cluster SmartConnect names and the share name is the same name as the screenshot example shows. Continue to next step. MAKE SURE DFS REFERRALS USE FQDN SMARTCONNECT NAME, NEVER USE SHORT DNS NAMES.
        6. Test each referral mount UNC path DIRECTLY 
          1. example from above tested  from a Windows client \\dr.ad1.test\smb2 (failover target cluster SmartConnect name used in this test).
          2. If the share mounts and data is visible,  verify you can write data.
          3. This test verifies dns and SmartConnect is configured correctly and AD authentication to the SMB share is correctly configured.
          4. If this step fails continue below.
      3. Follow steps above in this section How to Validate and troubleshoot A Successful Failover WHEN Data is NOT Accessible on the Target Cluster
        1. If the above steps find a DNS resolution issue, fix the issue and retest direct share referal UNC mount to DR target cluster or mount the DFS folder again.



    Detailed steps to test NFS export remount after failover

    1. NFS exports must be remounted to allow the Linux host to resolve the name to ip and connect to the export.
    2. command to un mount an export if a file is open is "umount -fl \<export-path-here>"  (the flags are force and lazy to un mount if an open file exists)
    3. command to re-read /etc/fstab and mount any export that is not already mounted "sudo mount -a"  (Run as root user)  
    4. If a mount error occurs review the error message for reason
    5. Type "mount"
    6. Find the export in the output and verify the ip address showing in the output is from the target cluster, if not follow Smartconnect debugging steps.
    7. To test data write access:
      1. change directory to the mount point created in /etc/fstab
      2. example only if mount point was /mnt/appdata:   
        1. cd /mnt/appdata
        2. touch test.txt
        3. The above command should complete without error.  If a read- only filesystem error is returned, it means DNS returned the source cluster ip address, follow smartconnect steps here.
        4. Verify the file was created "ls test.txt"
        5. If the file is present data access is confirmed.
        6. Clean up test file
          1. rm test.txt
    8. NOTE: IF YOU CHANGED THE TARGET PATH TO REPLICATE AN EXPORT THE REMOUNT PATH WILL NEED TO UPDATED. EXAMPLE source path /ifs/data/export is replicated to /ifs/data/dr/export  will require the host to change the mount path to /ifs/data/dr/export.
      1. To avoid this issue do not change the target cluster path when createing the SyncIQ policy



    Copyright Superna LLC