Administration Guides

Golden Copy Configuration Steps

Home



Supported S3 Protocol Targets

Review supported S3 Targets and limitations here

How to prepare a Cluster for Golden Copy Management

  1. Mandatory Step - Create the service account for the Search & Recover/Golden copy products on the source cluster, and configure the sudo file on the source cluster
  2. Mandatory Step Time Sync:
    1. Time sync of the cluster and Golden Copy must have accurate time sync and be within 15 minutes of NTP synced time source.  If this is not done then S3 targets will reject all uploads with an error message due to time skew too great.
    2. Setup NTP on your source Isilon/Powerscale cluster and Golden Copy to ensure accurate time.  

Quick Start Steps for Golden Copy Setup  

This quick setup guide provides the steps to get configure a folder for archive copy or sync jobs and a link to learn more detailed CLI options.

Prerequisites:

  1. All searchctl commands must be run as the ecaadmin user from Golden Copy node 1
  2. Adding the PowerScale by IP address is recommended. 
  3. The eyeglassSR service account must be created following the minimum permissions guide for Golden Copy.  The guide is here.
  4. Default login is user ecaadmin and password 3y3gl4ss

License Keys

  1. Copy license zip file to Search node /home/ecaadmin directory and change permissions chmod 777 .
  2. searchctl licenses add --path /home/ecaadmin/<name of zip>.zip .
    1. Verify the license is installed.
  3. searchctl licenses list 
  4. Get help on licenses
    1. searchctl licenses --help 
  5. searchctl licenses uninstall (Removes all of the licenses on the system).
  6. searchctl licenses applications list (Lists all the application types that are licensed on the System,  used on the unified Search & Recover and Golden Copy cluster).
  7. searchctl isilons license --name NAME --applications APPLICATIONS - Use this command to assign a license key to a cluster.  Example to assign the advanced license key
    1. searchctl isilons license --name <cluster name> --applications GC

How to Add a cluster to Inventory

  1. searchctl isilons add --host < ip address of Isilon in system zone> --user eyeglassSR --applications {GC, SR}
    1. [Advanced backup bundle license key required] [--goldencopy-recall-only] Use this option to add a cluster for a redirected recall job, this cluster type is available with the backup bundle or the upgrade to the advanced license key.
    2. [--applications APPLICATIONS]​  This is required  parameter to assign the cluster to the search application, or the Golden copy application.   For Golden Copy product enter GC 
    3. NOTE: Once a license is assigned to a cluster it is locked and cannot be removed. A unlock license key must be purchased to be able to re-assign a license.

Archiving to a Dell ECS

  1. searchctl archivedfolders add --isilon gcsource --folder /ifs/archive --accesskey CqX2--bcZCdRjLbOlZopzZUOsl8 --secretkey zZMDh3P7fcMD2GUvLNK5md7NVk --endpoint https://x.x.x.x:9021 --bucket <bucket name> --cloudtype ecs  --endpoint-ips x.x.x.x,y.y.y.y,k.k.k.k
  2. See Walk through on Setting up the S3 Bucket here.
    1. NOTE: --isilon is the name of the cluster not the ip address.  Get the name using searchctl isilons list
    2. NOTE: change the yellow highlights with correct values.
    3. Supports a load balance option with health check of ECS nodes + load balancing copies evenly to ECS nodes provided.  Requires the --endpoint-ips flag and specify data ECS ip addresses.
    4. NOTE: The following settings are needed in the /opt/superna/eca/eca-env-common.conf.  The S3 heartbeat is enabled by default and will remove an ECS node from load balance table if it becomes unreachable.  This setting enables round robin load balance algorithm.  A future release to support alternate load balancing mode for even connections.
      1. ssh to node 1
      2. nano /opt/superna/eca/eca-env-common.conf
      3. paste the setting below
      4. Control+x  answer yes to save
      5. Cluster down up is required. 
      6. export ARCHIVE_ENDPOINTS_ROUND_ROBIN=true
    5. NOTE: A service account user should be created.
    6. NOTE: replace endpoint-IPS with data nodes on ECS that can receive copied data.  Alternate syntax is x.x.x.x-y.y.y.y to use a range of IPS.
    7. NOTE: A dedicated View should be created for S3 storage without any other protocols enabled.
    8. NOTE:  the access key and secret key can be found by logging into the console select Admin menu --> Access Management -->, click on your user id and record the access ID and secret ID.  You need to login to the console as the user configured for authentication to get the S3 keys.

Archiving to AWS

  1. searchctl archivedfolders add --isilon prod-cluster --folder /ifs/data/policy1/aws --accesskey AKIAIs3GQ --secretkey AGV7tMlPOmqaP7k6Oxv --endpoint s3.ca-central-1.amazonaws.com --region ca-central-1 --bucket mybucketname --cloudtype aws
    1. NOTE: The region is mandatory field with Amazon S3 .
    2. NOTE: The endpoint must use the region encoded URL. In the example above the region is ca-central-1 and is used to create the end point URL. 

Archiving to Google Cloud Storage

  1. Get the Google authentication key following the Google Cloud Storage Walk Through Guide.   Copy the key to the node 1 of Golden Copy using winscp and copy the file to /home/ecaadmin
  2. searchctl archivedfolders add --isilon gcsource  --folder  /ifs/archive --secretkey /home/ecaadmin/ <service account gcs keyname.json> --bucket mybucketname --cloudtype gcs
    1. Change all yellow highlights to the correct value for your environment

Archiving to Azure Blob Storage

  1. searchctl archivedfolders add --isilon gcsource --folder /ifs/archive --secretkey NdDKoJffEs9UOzdSjTxUlaE9Xg== --endpoint blob.core.windows.net --container gc1  --accesskey <storage account name> --cloudtype azure 
    1. NOTE: The storage account name is used as the access key with Azure.
    2. NOTE: Get the access keys from the Azure console.
    3. NOTE: The container name must be added with the --container flag.
    4. NOTE:  cloudtype flag must be azure.


Archive to Cohesity

  1. searchctl archivedfolders add --isilon gcsource --folder /ifs/archive --accesskey CqX2--bcZCdRjLbOlZopzZUOsl8 --secretkey zZMDh3P7fcMD2GUvLNK5md7NVk --endpoint https://x.x.x.x:3000 --bucket <bucket name> --cloudtype other 
  2. See Walk through on Setting up the S3 Bucket here.
    1. NOTE: change the yellow highlights with correct values.
    2. NOTE: A service account user should be created.
    3. NOTE: A dedicated View should be created for S3 storage without any other protocols enabled.
    4. NOTE:  the access key and secret key can be found by logging into the console select Admin menu --> Access Management -->, click on your user id and record the access ID and secret ID.  You need to login to the console as the user configured for authentication to get the S3 keys.

Archive to BackBlaze

  1. searchctl archivedfolders add --isilon gcsource --folder /ifs/archive --accesskey 000257a71c6ed0001 --secretkey K000xlgzxveQ --endpoint https://s3.us-west-000.backblazeb2.com --bucket gctest --cloudtype other
    1. NOTE:  the URL may be different for your bucket, replace all highlighted settings

Most Common Every Day CLI Commands to Manage Archive jobs

  1. These commands are most commonly used for day to day. Many advanced flags exist on commands that should be reviewed.  The commands below are the minimum flags needed to complete the task.
  2. Listing running jobs
    1. searchctl jobs running
  3. Listing history of all previous jobs
    1. searchctl jobs history
  4. Listing all configured folders and the folder ID's and paths
    1. searchctl archivedfolders list
  5. How to run an archive job on a folder 
    1. NOTE:  This will not recopy everything, it will check the target if the file exists and will skip it and only copy new files missing from the target or update the target if a file changed.  It will not detect deleted files.  This requires an incremental job to be configured on the folder.  Steps in this guide to configure.
    2. searchctl archivedfolders archive --id xxx  (xxx is the folder ID found using the list folders command above)
  6. Follow the progress of a running job
    1. searchctl jobs running   (get list of running job ID's)
    2. searchctl jobs view --follow --id xxxxx (xxxx is the job ID name from the step above)
  7. Get the processing rates of a folder that has a running job
    1. searchctl archivedfolders list  (get the folder ID)
    2. searchctl stats --folder  xxxx  (folder ID of the path with a running job from the above step)



How to Test Data Copy and Start the copy Job for a folder that has been added to Golden Copy 

How to test file copy target permissions before starting a copy job:

  1. Before starting a copy job, it is best practice to test file copy permissions with the test feature.
  2. searchctl archivedfolders test --id <folderID>  (list folder ID searchctl archivedfolders list)  The command must pass all tests before starting an archive job.  Resolve all issues.
  3. This command will complete the following validations:
    1. Isilon/PowerScale S3 connectivity test (port test)
    2. File upload to S3 target test
    3. Validation this file was copied to S3 target
    4. Deleting this file from S3 test


How to Modify the Cluster service account or password

  1. searchctl isilons modify
    1. --name NAME (name of cluster to modify)
    2. [--ip IP]  (change ip in system zone to another cluster node)
    3. [--user]  (service account name change , should always use eyeglassSR unless directed by support)
    4. [--update-password]  (service account password)

How to add folders to be copied or synced

  1. searchctl archivedfolders [-h] {add,history,rerun,errors,archive,metadata,s3stat,recall,list,modify,remove,test,configure,getConfig,stats,export,audit,notifications} 
    --isilon HOST --folder PATH [--accesskey ACCESSKEY] [--secretkey SECRETKEY] [--endpoint ENDPOINT] [--bucket BUCKET] [--region REGION]
    [--container CONTAINER] --cloudtype {aws,ecs,azure,other,gcs,blackhole} [--recyclebucket TRASHBUCKET] [--skip-s3-file-exists {True,False,true,false}]
    [--endpoint-ips ENDPOINT_IPS] [--meta-prefix METAPREFIX] [--includes INCLUDES] [--excludes EXCLUDES] [--incremental-schedule INCREMENTALCRON]
    [--full-archive-schedule FULLCRON] [--archive-data-audit-schedule ARCHIVEDATAAUDIT] [--tier TIER] [--backup-num BACKUPNUM] [--cluster-name CLUSTERNAME]
    1. add  (add a folder for ECS, AWS, Azure)
    2. --path (PowerScale path to copy or sync example /ifs/data/projectx )
    3. --force (used to add AWS Snowball device and bypass connectivity checks to AWS, only use when following the Snowball guide)
    4. [--tier] default is standard (AWS), Cool (Azure)
      1. Requires Golden Copy Advanced license or Backup Bundle license
      2. Requires 1.1.6
      3. Azure
        1. flag to specify the tier the API calls are sent to, this should match the container tier configuration options are Access tier for Azure e.g. hot, cool, archive)  Without this flag the default is cold.
      4.  AWS
        1. specify AWS tier using (standard (default), standard_IA, glacier, glacier_deeparchive)
    5. --cloudtype {aws,ecs,azure,other, gcs, blackhole}   (type of target storage) 
      1. the Other type can be used to add S3 storage targets for testing before it is qualified formally and some S3 targets will require other.
      2. other - AWS v4 S3 authentication signature
      3. otherv2 - AWS v2 S3 authentication signature
    6. [--region REGION]  (required for AWS)
    7. --cluster-name - Allows creating an alias for the root folder name in the storage bucket.   Use a string to replace the actual cluster name in the folder used to copy all the data under this root folder name.  See the use cases for this feature here.
    8. --bucket BUCKET  (required for all storage targets, except Azure which uses the --container flag)
    9. --container <container name> (Azure only requires the container flag)
    10. --endpoint ENDPOINT (required for ecs to include URL and port used for storage buckets, for azure the URL needs to include the storage account name example blob.core.windows.net)
    11. --secretkey SECRETKEY (required for all storage targets)
    12. --accesskey ACCESSKEY (required for all storage targets for Azure this is the storage account name) 
    13. [--skip-s3-file-exists] {true, false} (this option defaults to false which means S3 is checked first to verify if the file to be copied already exists in the S3 bucket with the same last modified date stamp.  If set to true this check will be skipped and all files will be copied to the S3 bucket in a full copy mode.  This will overwrite all files in the storage bucket even if they already exist in the Storage bucket.
      1. Use this option to avoid AWS or Azure fees for issuing LIST, Get commands if you have a large number of files in the path
    14. [--recyclebucket TRASHBUCKET Name] - Enter the storage bucket name to store deleted files detected during sync mode operations.  The deleted files will be copied to the recycle bucket where a TTL can be set on the bucket to keep deletes for a period of time before they are deleted permanently .  example  --recyclebucket mytrashbucketname
    15. [--prefix xxx]  This is an option allows Copy Mode to run but insert a prefix into the storage bucket path when the copy runs.  This would be useful to make a copy into a new location in the storage bucket without updating the existing path.
      1. <bucket_root>/<cluster_name>/[optional prefix]/ifs/
      2. example --prefix temp  will insert temp into the path when you want a temporary copy in the S3 bucket.
    16. [--endpoint-ips]  (list of ip or range to load balance ECS nodes,  only used for ECS targets or Other targets that support multiple endpoints).  Check Limitations and Requirements for S3 Targets that support load balancing of multi part uploads.
      1. NOTE: replace endpoint-IPS with data nodes that can receive copied data.  Alternate syntax is x.x.x.x-y.y.y.y to use a range of IPS.
    17. [--meta-prefix METAPREFIX] -   The default prefix is x-amz-meta-  for storing meta data properties on objects.  Some S3 targets require custom meta data in the http headers to be used.  Example open-io requires oo-  to be used.   This flag allows changing the meta data http header tag for S3 targets that require this. 
    18. Checksum control on upload is now managed globally. This will calculate a checksum to include in the headers to be validated by the target before confirming upload was successful.   See link here to configure.
    19. Scheduling jobs
      1. The syntax for the Cron for all jobs see scheduling examples here.
      2. [--incremental-schedule INCREMENTALCRON]
      3. [--full-archive-schedule FULLCRON]
      4. [--archive-data-audit-schedule ARCHIVEDATAAUDIT] - compares the file system data to the target bucket and will ensure they are in sync with add new files, upload modified or delete files on the target.  This is a long running job and should be scheduled at most weekly or monthly. 
    20. [--includes INCLUDES] [--excludes EXCLUDES] - These optional flags allow including files or folders with pattern matching to either include files and folders or exclude from the copy process. (Release 1.1.4 or later)
      1. --includes - File paths matching this glob will be included in the archiving operation. If not specified, all files will be included. 
      2. --excludes - File paths matching this glob will be excluded from archiving. This flag only applies to those files that are included by the --include flag. If not specified, no files will be excluded. 
      3. Examples:
        1. Exclude everything in the user’s appdata profile:
          --exclude ‘/ifs/home/*/AppData/**’

          Only archive docx and pdf files, and exclude everything in a tmp directory:
          --include ‘*.pdf,*.docx’ --exclude ‘/ifs/data/home/tmp/**’

          Only archive docx, pdf and bmp files
          --include ‘*.pdf,*.docx,*.bmp’

          Archive all files except those in AppData, but only do full content for pdf and docx
          --exclude ‘/ifs/home/*/AppData/**’


Overview of Archive Job Types

2 types of folder modes exist full archive and incremental.

  1. Full Archive: The full archive will tree walk a folder identify which files already exist on the target object store and skip them or copy new files or update modified files.  It will not delete files that exist on the target but do not exist on the source file system path.
    1. If you run a Full archive job multiple times it will only copy new files, modified files detected on the tree walk.  This acts like an incremental and can locate missing data in the S3 target.
  2. Incremental Archive:  This mode is enabled with a schedule on the folder.  This mode uses the change list and creates snapshots that are compared to detect created, modified and deleted files to copy to the target S3 object store.  This mode will default to a mode that will note delete files from the object store if files are found to be deleted on the file system.
    1. See the procedure below to enable deleting files from S3 storage during incremental sync jobs.
    2. An alternate delete mode exists called delayed deletes.  This mode allows a folder to be configured with a second storage bucket to hold deleted objects when they are deleted from the file system.  This can be reviewed here.
    3. The default setting will sync deletes from the file system to the S3 target during incremental.

Concurrent Job Prioritization Incremental vs Full 

  1. When full and incremental jobs run at the same time a QOS setting can be a changed to prioritize one job type over the other.  
  2. The default setting will prioritize incremental jobs over full archive jobs.
    1. This means resources will used to archive incremental queues at the cost of copying any files from back log in the full archive jobs.   The full archive job may stop completely or progress very slowly until the incremental back log is completely copied.   This process will repeat each time an incremental job starts.
  3. How to change the default to prioritize full archive jobs.
    1. Login to node 1 as ecaadmin
    2. nano /opt/superna/eca/eca-env-common.conf
      1. add these variables to the file
      2. export ARCHIVE_FULL_TOPIC_REGEX="archivecontent-*"
      3. export ARCHIVE_INCREMENTAL_TOPIC_REGEX="nomatch"
    3. save with control+x  answer yes to save
    4. Restart cluster for the change to take effect
      1. ecactl cluster down
      2. ecactl cluster up

How to configure Incremental jobs to Sync Deleted Files to S3 Object Store

  1. This is a global setting for all incremental jobs and will detect deleted files on the file system and will delete the object in the S3 target storage.
  2. NOTE: The default wil sync deletes.
  3. Login to Golden Copy as ecaadmin
    1. nano /opt/superna/eca/eca-env-common.conf
    2. add a line to this file by copying and pasting the line below
    3. export ARCHIVE_INCREMENTAL_IGNORE_DELETES=false (this is the default change to true to leave deleted files as objects)
  4. save the file with control+x answer yes
  5. Shutdown the cluster and start up again to take effect
    1. ecactl cluster down (wait until it finishes)
    2. ecactl cluster up
  6. done

How to start a Full or Incremental Archive Job 

  1. searchctl archivedfolders archive
  2. --force (used to add AWS Snowball device and bypass connectivity checks to AWS, only use when following the Snowball guide)
  3. --id ID (folder id)
  4. --uploads-files  (requires 1.1.6 build > 21124)  accepts a file with full path to files with one file per line (carriage return after each file).  A text file with a path per line will be accepted and each file will have it's meta data queried and included with the copied object.
    1. NOTE: file path must be full path to file example /ifs/xxx    path to file and is a case sensitive file path.  The case must match the actual file system case for path and file name.
    2. NOTE:  The search export file format has  been deprecated for a flat file format with only file path.
    3. NOTE:  Make sure only 1 file per line with a carriage return at the end of each line.
  5. --incremental (requires 1.1.4 build > 21002) This option will run an on demand snapshot based changelist to detect created, modified and deleted files since the last incremental job ran and copy these changes to the S3 target configured.  
  6. --follow (Requires 1.1.4 build > 21002) the archive job will be started and will move directly to a monitor UI to view progress without needing to use searchctl jobs view. 
  7. --auto-rerun this will queue all failed copies into a new job to automatically retry all failed copies. Requires 1.1.4 or later
  8. [--recursive {true,false}] (can be used to update a single path only or recursive update to copy all data under a path to the storage bucket, this is optional and the default without using this flag is recursive copy of all data under the archived path entered)
  9. [--subdir SUBDIR] (if a recopy of some data is required under an existing archived folder, or a new folder was added under an archived folder, this option allows entering a path under the archived path to copy only this subfolder,  this can be combined with --recursive option if required.)
  10. [--s3update] File System to S3 Audit Feature compares S3 bucket to PowerScale path and fixes any differences between the folders and files and will delete files in the S3 bucket that no longer exist in the file system path.  
    1. NOTE:  Use this to audit the file system path to the S3 bucket to remove files from S3 that do not exist in the file system. This can be used on folders configured for copy mode versus sync mode that will sync deleted files into the storage bucket.
  11. --snapshot <snapshotName>  if the name of the snapshot is added with this option, then no new snapshot will be created for the archive path and the snapshot name path will be used as the source of the file copy.  Normally, a snapshot is created when a folder archive job is started and data is copied from snapshot.   This option allows existing snapshots to be used as the source of an archive job.
    1. When to use this option:
      1. This option allows an existing snapshot to be used. This option would be used to copy a previous snapshot of a path to S3 storage for long term storage and allow deletion of the snapshot to free up usable space on the PowerScale once the file copy completes.
      2. The path referenced by the snapshot name must match exactly the folder path added to Golden Copy
      3. The path copied in the S3 bucket will match the folder path added to Golden Copy
      4. NOTE: The full inflated size of the path the snapshot is protecting will be copied. Snapshot copies are not space efficient when copied from the PowerScale to external storage.
  12. --skip-acl Requires 1.1.4 update 2 or later. This will skip ACL permissions encoding but will encode all other metadata.  This would be used if it is desired to not copy security information into objects where it is visible.


How to Schedule Jobs on Folders (Full Archive, Incremental Archive or Archive Data Audit jobs)

Overview of Archive Job Types

  1. 2 scheduled job types for backup  
    1. --full-archive-schedule     Use this flag to identify this job type.  This will walk a file path copy all files and will check the S3 target if the file already exists and will skip files if needed while copying.  The comparison is done using the last modified date stamp on a file.
    2. --incremental-schedule  Use this flag to identify this job type.  This job will use the Isilon/Power Scale change list API to snapshot that folder path and offload the comparison to the cluster to return file system changes for incremental always syncing.  The created, modified files will be synced to the S3 target.
      1. NOTE: The default behavior will NOT sync deleted files to the S3 target.  This means deleted files will be detected but will remain on the S3 target.  To change this default behavior see the advanced configuration here.
    3. --archive-data-audit   Use this flag to identify this job type.  This job will audit the file system path and the S3 bucket to identify files that are missing in S3 but it will also identify files that no longer exist no the file system but also exist on the S3 target.  This will delete objects when a file has been deleted from the file system.  Recommendation:  Run this job on demand or on a schedule to audit the S3 object store data.

Configuring Archive Job Schedules on Folders

  1. This feature requires 1.1.4 build > 178
  2. NOTE: times scheduled will be in GMT time zone.
  3. The input is a cron string to create the interval for the schedule.  See the cron web site for assistance in creating a cron string https://crontab.guru/examples.html
  4. The (other parameters) is a place holder for any other options being added or modified in the examples below)

Add an Incremental Schedule to a folder

  1. Run Every Hour
    1. searchctl archivedfolders add (other parameters) --incremental "0 * * * *"
  2. Run Every 2 hours
    1. searchctl archivedfolders add (other parameters) --incremental "0 */2 * * *"
  3. Run Every 6 hours
    1. searchctl archivedfolders add (other parameters) --incremental "0 */6 * * *"
  4. Run Once a Day at midnight
    1. searchctl archivedfolders add (other parameters) --incremental "0 0 * * *"

Modify a Schedule on a Folder

  1. This command uses a different syntax see below
  2. Once per day examples of incremental and data audit job types
    1. searchctl archivedfolders modify --id <folder id> (other parameters)--incremental "0 0 * * 0"
    2. searchctl archivedfolders modify --id <folder id> (other parameters)--archive-data-audit  "0 0 * * 0" 

Disable a Schedule on a folder

  1. searchctl archivedfolders modify --id <folder-id> --full-archive "NEVER" --incremental "NEVER"


Add a Data Audit job Schedule to a folder

This job type compares all data on the file system to the object store target and ensures missing data is added, deleted data is removed from the object store and updated files are updated in the object store. This is a data compare job that can be scheduled to run once a week or month on a folder. NOTE: This can take hours to run.
  1. searchctl archivedfolders add (other parameters)--archive-data-audit "0 0 * * 0"
    1. Suggested schedule is weekly on Sunday 




Monitor, View running Jobs, show Job History, Show folder job history, Summarize Job stats, Monitor Progress, Auto Email Progress and Cancel a Copy Job

  1. These commands will show progress of files as they are copied, with real-time updates every few seconds.
    1. searchctl jobs view --id job-1574623981252553755794 --follow  (replace the job name with the name returned from the archive command).
    2. searchctl archivedfolders stats --id <folder ID>  (NOTE: replace <folder ID> with ID from step #3, for example only: 3fe1c53bdaa2eedd).
    3. searchctl stats --folder xxxx
  2. Learn more about - Monitoring Copy Jobs, View Job History, Viewing folder job History:
    1. searchctl jobs running (shows running jobs).
    2. searchctl jobs view --id (id is returned from the command above, monitors progress).
    3. searchctl jobs history (shows the history of previous jobs including incremental sync jobs, inventory jobs).
      1. [--folderid] - allows filtering the history to a specific folder to list the jobs
      2. [--type TYPE] - allows filtering to show a specific job type "Incremental", "Full Archive", "Inventory"
      3. [--output OUTPUT] - Json format output of the history data
      4. [--tail x  ]  where x is last x jobs executed example --tail 100 (100 last jobs executed)
      5. Example command
        1. searchctl jobs history --folderid <folderid>  - this command can very useful to list all the jobs that have run against a single folder.  This allows viewing the history of all jobs on a single folder.
        2. searchctl jobs history --tail 100  (return the most recent 100 job executions)
    4. searchctl jobs cancel --id  (with a job ID provided a copy job can be canceled. Note: it takes time for a cancel to take effect).
  3. Show job , all folders and golden copy node Summary statistics
    1. This command will provide stats of all jobs executed on all defined folders or all golden copy nodes or summary of all jobs.
    2. This provides a global view statistics based on each of the view options.  This command provides live updates to the CLI based on active jobs. 
    3. searchctl jobs summary [-h] [--no-stream] (--nodes | --folders | --jobs)
    4. searchctl jobs summary --folders  (shows the summary of all jobs stats for all folders)
      1.  
    5. searchctl jobs summary --nodes (shows the summary of all jobs stats for all golden copy VM's)
      1.  
    6. searchctl jobs summary --jobs (shows the summary of all job stats for all nodes and all folders
  4. Export a completed job into an HTML report  with steps here.


How to Enable Auto Progress Emails for copy Jobs

This feature allows administrators to monitor copy job progress from their email.  Once enabled, running job progress will be emailed every 24 hours (this can be changed).  This avoids the need to login to simply monitor progress. It can also assist with support since the emails can be attached to  a support case.    The feature allows a group email to be configured.

Prerequisite:

  1. Enable a notification channel with SMTP see guide here.
    1. Search & Recover Cluster Operations
    2. Get the email channel groups with this command
      1. searchctl notifications groups list
  2. searchctl archivedfolders notifications addgroup --isilon [host] --group [notification-group-name]
    1. Adds a group email to be notified with job progress summary reports
    2. Defaults to disabled status
    3. Enable the notification with:
      1. searchctl archivedfolders notifications modify --isilon <cluster name> --enabled true
  3. searchctl archivedfolders notifications list
    1. lists all notifications that are configured
  4. searchctl archivedfolders notifications removegroup --group [notification-group-name]
    1. removes the email group
  5. searchctl archivedfolders notifications modify --isilon [host] --enabled [true or false] --groups [group-names-comma-separated]
    1. The modify command allows the option to disable the notifications without removing it.
  6. The default interval to receive updates is 24 hours.  To change this default.
    1. Use the schedules CLI to changes the frequency , this examples shows 5 minutes
    2. searchctl schedules modify --id JOBS_SUMMARY --schedule "*/5 * * * *" 
    3. searchctl schedules list 


How to Manage Folders (List , Modify and Remove)

  1. List , modify and remove commands and examples

list (list configured folders)

  1. searchctl archivedfolders list
  2. searchctl archivedfolders list --verbose (adds all flags to the output)

modify (change configuration of an existing folder)

  1. --id ID (folder id)
  2. [--cloudtype {aws,ecs,azure, gcs, other,blackhole}]   (type of target storage) 
  3. [--region REGION]   (Required for AWS)
  4. [--tier]  specify storage tier (Advanced license or backup bundle required)
  5. [--bucket BUCKET]  (Required for all storage targets, except Azure)
  6. --cluster-name - Allows creating an alias for the root folder name in the storage bucket. Use a string to replace the actual cluster name in the folder used to copy all the data under this root folder name. See the use cases for this feature here.
  7. [--container CONTAINER] (Required for Azure only and should list container name).
  8. [--endpoint ENDPOINT] (Required for Azure, AWS, ECS).
  9. [--secretkey SECRETKEY] (Required for all storage targets).
  10. [--accesskey ACCESSKEY] (Required for all storage targets, for ECS this is the user id name, for Azure this is the storage account name) .
  11. [--skip-s3-file-exists {true, false}]  (Optional: see explanation of this in above) .
  12. [--recyclebucket TRASHBUCKET ] (See create explanation in the "How to add folders to be copied or synced" section above) .
  13. [--meta-prefix xxx] - (See add folder in the "How to add folders to be copied or synced" section for explanation ). 
  14. [--endpoint-ips]  (List of ip or range to load balance ECS target only).
  15. [--includes INCLUDES] [--excludes EXCLUDES]  - (see the "How to add folders to be copied or synced" section for detailed explanation and examples ). 
  16. Scheduled jobs - See job schedule examples here.
    1. [--incremental-schedule INCREMENTALCRON] - incremental sync job
    2. [--full-archive-schedule FULLCRON] - full copy job
    3. [--archive-data-audit-schedule ARCHIVEDATAAUDIT] - data audit source and target comparison

How to remove an archive folder 

  1. searchctl archivedfolders remove --id ID (folder id)  
  2. [-h] get help

How to Re-run Only Failed Files from a Copy Job

  1. This feature allows a retry of only the failed files listed in a copy job, versus running the entire copy job again.  This is more efficient and faster process than running the entire job again.
  2. First list the job history for an archive folder.
  3. searchctl archivedfolders history
    1. Then select the job to re-run with a status of failed to get the job ID, then run the command below:
      1. searchctl archivedfolders rerun --idxxxxx (where xxxx is the job-xxxxxx)
    2. This will locate all the filed files in the last job and reattempt to copy these files in a new job. This will generate a new summary and progress report.

How to Recall Data from Object Back to File

Overview:

Once data is archived to objects, you may need to recall some or all the data, to the same or different cluster.  This section covers how that is done from the CLI.  All recalls job recall data to a staging recall path where the administrator would move the tree of data back into the main file system or share the data directly from the recall path.   

The recall NFS mount is on /ifs/goldencopy/recall  this path is where all recall jobs will recall data.  The recalled data will be created  as a fully qualified path /ifs/data/datarecalled under the mount path above.      

Backup and Restore Use Cases

The backup and restore use case capabilities depends on which licenses are installed.  The enhanced recall features require the Backup bundle or upgrade Advanced License key.   

License Key Dependancies 

  1. Base Golden Copy License key allows recall of data only, recall data + metadata, recall to same cluster the data was copied from.  It also allows the NFS recall mount to be changed to a different cluster.  This is a manual process.
  2. Golden Copy Advanced (Backup Bundle) key allows additional options: object version aware recall based on file metadata date range,  redirected recall to an alternate cluster other than the source cluster,  storage tier aware recall example archive tier recall before recall to on site storage.

Limitations:

  1. Golden Copy base license recall to an unlicensed cluster will only recall data, without metadata.


Requirements:

  1. 1.1.4 build 2006x Release or later
  2. Some recall features require the Advanced license key as explained below.  These commands will be identified as requiring the advanced license key.

Prerequisite NFS Export Configuration Steps

  1. Make sure the NFS export is created on the clusters that are targets of a recall and and the mount is added to Golden Copy and all VAN nodes.
  2. Basic Golden Copy licenses can switch the NFS mount using /etc/fstab entry to point at a different cluster to recall the data.
    1. Create NFS mount on the new target cluster. Review installation requirements here.
    2. Change the mount used for recall
      1. sudo -s (enter ecaadmin password)
      2. edit the nano /etc/fstab
      3. find the recall mount entry  '<CLUSTER_NFS_FQDN>:/ifs/goldencopy/recall /opt/superna/mnt/recall/<GUID>/<NAME> nfs defaults,nfsvers=3 0 0
      4. Replace <CLUSTER_NFS_FQDN>  with the ip address or DNS smartconnect name of the cluster target
      5. control+x to save the file
      6. Type exit (to return to ecaadmin session) 
      7.  unmount the recall mount
        1. ecactl cluster exec "sudo umount /opt/superna/mnt/recall/<GUID>/<NAME>"
          1. enter the admin  password when prompted on each node.
      8. mount new cluster recall mount
        1. ecactl cluster exec "sudo mount -a"  
          1. enter the admin  password when prompted on each node.
      9. The recall job will now recall data to the new cluster target.
  3. Advanced or Backup Bundle Licensed appliances can use the add cluster target recall only option and create unique NFS recall mounts for each cluster to allow selecting the target cluster when building the recall job.

Logical Diagram of a Recall


Recall command syntax and options

  1. searchctl archivedfolders recall [-h] --id ID[--subdir SUBDIR]
    1. --ID  - Is the folder id.
    2. [--subdir] - Allows entering a path below the folder path to select a subset of the data example archive folder is /ifs/archive and --subdir can be /ifs/archive/subdirectory.
    3. [--apply-metadata]  If this flag is not used only data will be recalled and the files will not have owner, group or mode bits updated, folders will not have ACL permissions applied. 
      1. NOTE:  Skipping metadata will recall data faster.  The recall starts with data and then separately runs a 2nd job to apply metadata,  this 2 stage approach allows for fast data recall.  The metadata will be applied after the data is recalled completely.  The metadata is placed in a queue and can be applied later using a special job that will run on demand.  If this flag is omitted you can still run the 2nd job type to apply metadata later.
      2. See How to run a metadata job 
    4. (Advanced License) [--target-cluster] - This is the name of the target cluster that the recall will use to store the recalled data.  The cluster must be added to Golden Copy first to use this flag.
    5. (Advanced License) [--start-time STARTTIME] - This allows selection of data to recall based on the meta data stamps on the objects.
      1. This command is version aware and will check the version of objects to select the best match.
      2. This is a date format for the beginning of the date range. example 
        start last modified time. Example: "2020-05-21T17:44:40-04:00"
    6. (Advanced License) [--end-time ENDTIME] - This is used with the --start-time to specify the end date and time or the date range.
      1. This command is version aware and will check the version of objects to select the best match.
      2.   This uses the same date format. Example: "2020-05-21T17:44:40-04:00" 
    7. (Advanced License) [--timestamps-type {modified,created}] - This flag is used with the date range command to specify which time stamp to use the created time stamp or the modified time stamp stored with the object metadata.
      1. This command is version aware and will check the version of objects to select the best match.
  2. Running the job will recall data and re-apply all possible meta data to the file system including ACL's folder and file owner, group and mode bits.

How to recall object data to the Recall Staging Area

  1. This example will recall data to the staging areas located here /ifs/goldencopy/recall.  There are flags to handle overwriting data that already exists on the destination path and applying metadata to files and folders or reapplying metadata. 
    1. Metadata includes the following:
      1. Files - owner, group, mode bits
      2. Folders -  Owner, group ACL on the folder 
    2. Example command to recall only a specific path
      1. searchctl archivedfolders recall --id xxxx  --subdir /ifs/bigfile  --apply-metadata 
      2. NOTE:  Recall jobs will always overwrite data in the staging area if a previous recall job had already been executed.
    3. Recall data and apply metadata to files and folders
      1. searchctl archivedfolders recall --id <folderid> --apply-metadata
      2. Note:  Metadata will be applied after all the data has been recalled first
    4. Recall metadata only or re-apply metadata to a previous recall job
      1. searchctl archivedfolders metadata --jobid <recall Job Id>
      2. NOTE: This command requires the jobid of a previous recall job that has completed already. Use searchctl jobs history to get previous job ID's.



How to Monitor Copy Job Performance and Job Summary logs and Error Logs

  1. Stats Command for real time job throughput monitoring 

    1. The following command will monitor real time stats for an archive folder with bytes copied, files copied and error rates.
    2. searchctl archivedfolders stats --id xxx (where the xxx is the folder id found from searchctl archivedfolders list)
    3. OR use feature rich command searchctl stats --folder xxx  
    4. NOTE: The rate statistics columns are per second but average over the last minute, hour or day.
    5. NOTE: Files Retry Pending, Bytes Retry Pending requires the --auto-rerun flag on archive jobs to queue failed files to be retried at the end of the archive job.  This is available in release 1.1.4 builds > 178.
    6. NOTE: Rerun stats show files that were retried at the end of the archive job and will only display if --auto-rerun flag was used. This is available in release 1.1.4 builds > 178. 
    7. NOTE: Accepted stats is related to tree walking the folder that is configured for archiving and is based on REST API retrieval of files and folders.
    8. NOTE: Full statistics are recorded during full archive jobs.  Incremental stats will appear if incremental jobs are configured on the folder.
    9. NOTE: Files and folders have separate statistics
    10. NOTE: Metadata is (owner, group, mode bits, created, modified, accessed, ACL)

  2. Job View command monitors progress in MB and file count with % completion

    1. This command can check the progress of a running job, shows the MB of files queued for copy, the MB of files archived, and % completed.  The same information is shown for file count queued, archived and %.  Error rate is also shown 
    2. searchctl jobs view --id job-1591744261081-1928956557 --follow    (use searchctl jobs running to get the job id)
    3. New in builds > 1.1.4 178 build is the Retry Pending Stat that tracks files that had an error will be queued to be retried at the end of the archive job.  This new stat is only active if the --auto-rerun  flag is used on the archive job.  


How to Monitor Ethernet Interface Mbps during a Copy Job

  1. This tool will be available in new OVF builds.  In the current product, it must be installed and requires Internet access to repositories
  2. ssh to the Golden Copy vm as ecaadmin
  3. sudo -s (enter ecaadmin password)
  4. zypper install nload  (answer yes to install)
  5. exit
  6. type nload
  7. Once the UI loads use the right arrow key until you see Eth0 displayed at the top left of the UI.  This will now display TX and RX bandwidth current, average, min and max values with a graph showing relative bandwidth usage.

How to Generate the HTML Report After a Copy Job Completes

As of 1.1.4 or later it is required to run the export report command to produce the HTML report for a copy job.  The recommended method to monitor a copy is using the CLI monitor command.  NOTE: This command will only execute when copy job is completed.

  1. Get the job id "searchctl job history"   (Use the job id for the next command) .
  2. searchctl archivedfolders export --jobid <job-id> --errors-only true
    1. Recommended to use the --errors-only  flag to extract only errors and reasons for the error 
    2. NOTE:  The json files created can be very large since they will include each file copied, errored or skipped.    
  3. The report is queued to extract the report data that is stored in the archive report Web folder.  Follow the steps below to view the exported HTML report.
  4. Login to the report page with https://x.x.x.x/downloads/archivereport/<folder name>/Full/ .
  5. Locate the <folder name>-<date and time>-summary-checkpoint.html and click on this file to view it.


  1. How to view the Detailed json Copy Job Logs

    1. The HTML export command also generates detailed json logs that store each file that was copied.
      1. searchctl archivedfolders export --jobid <job-id>  
    2. This will start a job to generate the json files.  These files can be very large and may take many minutes to create. 
    3. The jobs logs can be accessed from https://x.x.x.x/downloads/archivereport (where x.x.x.x is the ip address of the Golden Copy VM).
    4. The default login is: "ecaadmin", with password: "3y3gl4ss".
    5. The log directory for each execution of a job report export command.  It will create a folder based on the path that was archived example ifs-archive:
      1. The job logs will be contained  in the full and incremental folders sorted by date.
        1. 3 files are created for each copy job. See example screen shot below.
        2. date-time.JSON (full log with all files that successfully copied, many numbered versions of this file will exist).
        3. -summary.html (report view that shows the report for the entire job).
        4. -summary.json (JSON version of the summary report used by the HTML file to display).
        5. -errors.json (if there are failed copy files this file extension will appear and stores all the files that failed to copy)
  2. How to view copy job errors on failed copies

    1. Use this command to find the reason for the failed copies.  The job id can be found with searchctl jobs history.  NOTE: Requires 1.1.4 or later.
    2. searchctl archivedfolders errors <--Id JOBID> [--head | --tail | --at TIME] [ --count N ], where:
      1. JOBID is the ID of the job that was run
        --count N prints N records (default 10)
        --head (default) starts printing from the earliest detected error
        --tail prints up to the last detected error in the job
        --at TIME will print errors starting from the given time. Use the same time format as T15023
        --head, --tail, and --at are mutually exclusive.
      2. Example command to quickly find that last 20 reasons files failed to copy. 
        1. searchctl archivedfolders errors --Id xxxx --tail --count 20

How to Manage File Copy Performance

There are 3 methods that can be used to control performance of a copy job. 

  1. The first is the number of virtual accelerator nodes deployed to distribute copy tasks to more virtual machines.  See the installation guide on how to deploy virtual accelerator nodes.  The guide is here.
  2. The 2nd parameter that controls  how many files will be copied concurrently per Golden Copy VM or per Accelerator node.
  3. The 3rd option applies to large file copies (greater than 10 MB), and allows increasing the number of threads that copy byte ranges of the file.  A large value will increase bandwidth requirements.     

Best Practice: Test a copy job and monitor performance statistics command to monitor files per second counter (searchctl archivedfolders stats --id <folder id>.   This command shows bytes per second and files per second over the last minute.  The higher the value indicates higher performance.

How to Increase Copy Performance with concurrent file and large file thread count

Golden Copy VM defaults to the values shown below for concurrent file copies and threads per large file.   This value applies to each VM Golden Copy and Virtual Accelerator nodes.   This can be changed globally, following the steps below:

  1. ssh to Golden Copy VM as ecaadmin .
  2. nano /opt/superna/eca/eca-env-common.conf 
  3. Add the line to this file and set the number of files to desired concurrent copy limit and the parallel thread count for large file copies. NOTE:  Consult with support , increasing these numbers may not increase performance in some scenarios.  
    1. export ARCHIVE_PARALLEL_THREAD_COUNT=100 (number of concurrent files per Golden Copy VM or Accelerator Nodes, increasing this number may not increase performance unless sufficient bandwidth is available) 
    2. export ARCHIVE_PARALLEL_THREAD_SDK=10 (Number of separate threads used to copy a single large file, higher number will increase bandwidth utilization)
  4. control key + x key to exit .
  5. Answer yes to save the file .
  6. ecactl cluster push-config
  7. ecactl cluster services restart --container archiveworker --all
  8. done


How to Shape Bandwidth of Archive Jobs 

Overview:

  1. The feature is using Traffic shaping and not rate limiting.  Traffic shaping (also known as packet shaping) is bandwidth management technique that delays the flow of certain types of network packets in order to ensure network performance.  In the case of Golden Copy the file copies are delayed to ensure that over time bandwidth would average out to the desired configuration.
    1. NOTE: Monitoring the interface will show the network usage will be above the set shaping value which is expected with traffic shaping.  This is because the interface is 10Gbps and allows the data to leave the VM at a high rate for short bursts.
  2. NOTE: Monitoring the interface will show the network usage will be above the set shaping value which is expected with traffic shaping. This is because the interface is 10Gbps and allows the data to leave the VM at a high rate for short bursts.
  3. In Release Build 1.1.4 21002
    1. nano /opt/superna/eca/eca-env-common.conf
    2. export ARCHIVE_NETWORK_RATE_LIMIT_MB=xx (xx is MB per second value to all copy bandwidth)
    3. control+x answer yes to save
    4. ecactl cluster push-config
    5. ecactl cluster services restart archiveworker
    6. done

How to Monitor Network bandwidth Usage from the appliance

To monitor utilization of the Ethernet interface in real - time.


  1. ssh to the appliance as ecaadmin
  2. sudo -s (enter admin password)
  3. zypper in nload         (requires internet access to the appliance to install the package)
    1. answer yes to install
  4. nload -m   (this will display tx and rx bandwidth)
  5. Use the Eth0 values (the other interfaces are the internal docker networks)
    1. Incoming - From the cluster over NFS mounts
    2. Outgoing - Leaving the VM towards the S3 target
  6. Per IP flow tool
    1. zypper in iftop
    2. answer yes
    3. run the tool with as root
    4. iftop



How to Copy a Snapshot to S3 Storage

Because Golden Copy is logged in via ssh any snapshot can be used as a source path to copy data Therefore, copying a snapshot is no different than any other path on the PowerScale

NOTE: This will not support continuous sync mode since that depends on snapshot change tracking. This mode is a copy of a snapshot for a long term archive.

  1. Prerequisites
    1. Add an archived folder with S3 target configuration.  See below on path considerations when copying a snapshot. 
    2. If the snapshot is higher up the file system than the folder archive path  (i.e. a snapshot on /ifs/data and an archived folder on /ifs/data/toarchive) than the archived folder, then the archived folders' base path will be used  for the file system walk of files to copy.
    3. If the snapshot is lower down the file system (i.e. a snapshot on /ifs/data/toarchive/somesubdir and an archived folder of `/ifs/data/toarchive'), then the snapshot path will be used as the root for the file system walk to copy files.
  2. Command to specify a snapshot path to be copied:
    1. searchctl archivedfolders archive --id xxxx --snapshot <snapshotName>  (where xxxx is the id of the archive folder ID and snapshot name is the name of the snapshot in the Onefs Gui).
  3. Done.


How to Configure Delayed Deletes with Sync Mode

  1. This feature allows protecting deleted files that need to be retained for a period of time.  This feature uses a 2nd storage bucket to hold deleted files, and uses the time to live feature on a storage bucket to auto delete files created in the bucket after x days.  This storage bucket life cycle policy is configured manually on the storage bucket following S3 target documentation.
    1. This provides a recovery location for deleted data in sync mode and allows a retention period in days to allow recovery using the life cycle management feature of S3 storage buckets.
  2. Overview

  3. Requirements:

    1. The S3 storage provider must be the same for the target storage bucket and the trash can storage bucket
    2. The TTL expiry policy must be created on the trash can storage bucket using the S3 target device documentation.  This value is generally set in days to retain a file before it is deleted automatically.
  4. How to Configure Delayed Delete Mode

    1. Add a folder and use the --recyclebucket option to specify the name of the S3 storage bucket that will act as the trash can for deleted files
    2. Example only: searchctl archivedfolders add --isilon gcsource --folder /ifs/gcdeletetest --accesskey xxxxxxxxx --secretkey yyyyyyyyyy --endpoint s3.regionname.amazonaws.com --region region --bucket targetbucketname --cloudtype aws --recyclebucket name-of-trashcan-bucket
    3. See Screen shots of a storage bucket source and target after a file is deleted on the source cluster path. NOTE: The incremental job runs to detect file changes to the path including deletes. When a file is detected as deleted and --recyclebucket option is used the file is moved to the trash can bucket and then deleted from the target S3 storage bucket configured on the archive path.
    4. File is deleted from source path:

    5. File in the trash can bucket showing the expiry property is set:

    6. Done

How to list and change System Scheduled Tasks

Job Definitions
  1. INVENTORY -  This collects networking , user information from the cluster
  2. PERSIST_JOBS_HISTORY -  this job syncs job status and history to on disk backup copy
  3. UPDATE_SNAPSHOTS  - Used for Search & Recover 1.1.5 or later releases to make sure content indexing snapshot alias always points to a current snapshot and avoids content ingestion errors due to expired snapshots.   Defaults to daily. 
  4. SYNC_ARCHIVEDFOLDER_SCHEDULES - This job polls the the Golden Copy configuration to update the master scheduling container.   Consult with support.

  1. To list all system scheduled tasks
    1. searchctl schedules list


How to Configure a Folder Alias to Handle: Cluster Decommission Use Case, Source Cluster Name Change, Switch to DR cluster as data source and Data full Copy

  1. Use Cases:  A Cluster is decommissioned, a clusters name is changed, switching to the DR cluster to copy data, or creating a new full copy data under a new the folder.
    1. Each of these use cases leaves the data in the bucket under the old cluster name folder at the root of the storage bucket. This solution allows creating an alias to override the folder name used to store data in the bucket .  The objective is to allow a new cluster to use the older folder name for archive and recall jobs to gain access to data previously copied from another cluster.
      1. See screenshot below shows data is stored under a cluster named gcsource 

    2. The solution is to copy data or recall data under the previous cluster folder name of "gcsource"  but using data stored on a new cluster .  An alias is created on the folder definition to copy data into the same tree structure in the storage bucket when the cluster name and the folder name in the bucket do not match.
      1. Example 1:  old cluster name "gcsource"  and new cluster is "gctarget".  Use this command to change the folder name used to copy or recall data on a folder definition. 
        1. A new folder definition is created to connect to the new cluster but adds an alias that references the old cluster name.
          1. searchctl archivedfolders add --isilon gctarget --folder path yyy (add authentication flags, bucket and endpoint flags)  --cluster-name gcsource 
          2. This folder definition is connecting to cluster gctarget but using a folder alias of gcsource, this will copy data into the storage bucket under the folder gcsource where the old data is stored.
      2. Example 2: Create a new full copy of data under a new folder alias.
        1. In this example a folder definition is modified to add an alias for the cluster name,  this will cause data to be copied under a new folder at the base of storage bucket.
        2. searchctl archivedfolders modify  --id xxxx  --cluster-name newcopy
        3. In this example an existing folder is modified to copy data under a new folder root named newcopy.  The original data is left under the original folder named after the cluster. All copy and archive jobs will now use the new folder name.
        4. NOTE: The old data will remain under the old folder name and will not be used for copy or recall jobs.
           

Storage Target Configuration Examples

Dell ECS Bucket Creation Walk Through

  1. Login to ECS and open the Manage and then users tab.
  2. Edit object_user1 .
  3. Generate a new secret key :

  4. Now click on the buckets menu tab.
  5. Create a new bucket and enter a name along with the the owner set to the object_user1 user. Click next .

  6. Enable meta data search and enter meta data tags that will be indexed.
    1. Click add,  select user, and enter tags as per screen shot below with type set to string:

  7. Done .


Amazon AWS Bucket Creation Walk Through

  1. Complete Steps to the storage bucket
    1. Login to Amazon web console .
    2. Select S3 service .

    3. Click Create bucket.
    4. Enter the bucket name, select a region for your bucket, and click create.


How to setup minimum permissions in AWS for Golden Copy

S3 Policy Permissions Requirements
  1. S3 permissions lists the following policy scope options, indicating optional and mandatory resource scope:
    1. Access Point (Optional) - This is used with AWS SDK, CLI or REST API.  Access point create and usage is documented here https://docs.aws.amazon.com/AmazonS3/latest/dev/using-access-points.html   If you use Access points you need to specify the access point ARN in the policy, and assign the access point to the bucket (see AWS documentation).   
      1. Example:  Access point url has the following syntax
        1. Access point ARNs use the format arn:aws:s3:region:account-id:accesspoint/resource
        2. Example: AWS URL that must be used when adding an archive folder s3-accesspointname.Region.amazonaws.com 
      2. NOTE: Sample policy sets the Access point to *  to all
    2. Bucket (Mandatory) - The sample policy file includes a sample bucketname that must be replaced with your bucket name.  This resource scope is mandatory in a policy.
    3. Jobs (Optional) - The jobs resource is not used by Golden Copy, is not required and the sample policy file sets this to * to all.  Jobs is used to automate tasks against S3 buckets.
    4. Objects (Mandatory) - Golden Copy requires access to all objects with permissions set in the sample policy file.  No restricted access to objects should be applied and this is unsupported to block access to objects in a storage bucket dedicated for Golden copy.  (AWS documentation)
      1. Sample policy sets this resource scope to * to allow access to all objects.


Quick Start Method

NOTE: We recommend following the guide to learn how to create permissions.   AWS supports JSON format policies.   The sample policy can be downloaded here and pasted into the JSON tab of the AWS console.

NOTE:  You must edit this file and replace the sample storage bucket name gcdemosystem with the name of the storage bucket.  Find this string in the file and change the bucket name for your environment "arn:aws:s3:::gcdemosystem"

  1. Quick start Steps:
    1. Login to AWS console
    2. Goto IAM
    3. Click Policies left side menu
    4. Click Create policy
    5. Click JSON tab 
    6. Copy and paste the fixed JSON example file into the policy create
    7. Click Review Policy 
    8. Give the policy a name example Goldencopy 
    9. Click Create Policy
    10. Now click Users on left menu 
    11. Click add user add name goldencopy
    12. Click check box for Access type Programmatic access
    13. Click Next for permissions
    14. click Attach existing policies directly,  search for the policy name you created above ex goldencopy. Select the check box to assign.
    15. Click through the rest of the options to create the user and record the access key and secret key needed to add archive folders.
    16. Done 
Complete Steps to Create User and Policy Following All Steps (skip if you used quick start above)
  1. Open the IAMS User screen in AWS:

  2. Create new user with User name "goldencopy" and select the programmatic check box.

  3. Click Next.
  4. Change to "Attach existing policies directly" option.

  5. Click the Create Policy button.
    1. Click on "Service" and type "S3" and select this service

  6. Now Click "Select Actions".
    1. Select the permissions as per the screen shot.

  7. Now specify the bucket(s) created to store golden copy target data be adding each ARN to the policy.  Click on "Add Arn" next to "bucket" .

  8. The Add ARN window appears, enter the storage "Bucket name" created for Golden Copy (i.e. "gcdemosystem" as shown in the sample screenshot below).

  9. Optional - Source IP Restriction to the Bucket by Selecting the "Request conditions" option
    1. Enter a specific Public IP address or a range of addresses.   This would be the public facing IP address used for any Internet access from the Data Center where the PowerScale is located.   Example s to a specific IP address but a range can be added.
  10. Now Click the "Review Policy" button bottom right of the UI (shown in the screenshot above).
  11. Enter the Name of the policy,  Description,  and click "Create Policy".

  12. Now return to the IAM Create User browser tab and click the Refresh Icon to reload new policies.  Type "Goldencopy" into the "Filter policies" dialog box and select the Goldencopy policy you created above.

  13. Click Next, and then click Next on the tags screen .
  14. Click the "Create User" button .
  15. On the final screen you need to record the Access ID and the secret key for the Goldencopy user. Record this in a secure location for use when adding archive paths to Golden Copy.

  16. NOTE: You will need your bucket name, region, access key and secret key to configure Amazon S3 target.
  17. Done.


How To restrict S3 Bucket Access by Source Public IP of your Data Center

  1. NOTE The role in IAMS can create a single policy that restricts access to a list of IP addresses.  Use 1 method to limit access.  This provides a 2nd option to limit using a bucket policy.
  2. NOTE: This assumes that a proxy or source IP NAT is in use and Cloud provider will only see your public ip address.   If you have a range or pool of IP addresses than you need to include all IP addresses used by your NAT or proxy implementation.
  3. An S3 bucket policy can also be used to allow access from a range of IP's or a specific ip address.  Use this example to restrict access to your Data Center public IP address.
  4. Get your public facing ip address that will be used the by Golden Copy or Virtual Accelerator Nodes.
    1. Method #1 - curl :
      1. Login to Golden Copy VM over ssh and run this command "curl ifconfig.io" .
      2. This should return the IPv4 ip address configured for public Internet access to use with the policy.
    2. Method #2 - visit an ip locate website from a data center subnet :
      1. Google for "what is my ip address" to get the IP v4 ip address.
  5. Replace the x.x.x.x with your ip address in the example policy below.
  6. In the Amazon S3 service console click on the storage bucket configured for Golden Copy:
  7. Replace x.x.x.x with your ip address,  replace goldencopybucketname with your storage bucket name.
  8. NOTE: to get the Bucket ARN for the resource property. You can see this next to the Bucket Policy Editor in the screen shot above.
  9. Edit the policy text shown below and save to the bucket policy .
  10. Done.
{
"Version": "2012-10-17",
"Id": "S3PolicyId1",

"Statement": [

{
"Sid": "IPAllow",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::goldencopybucketname/*",
"Condition": {
"NotIpAddress": {"aws:SourceIp": "x.x.x.x/32"}
}
}
]
}

How to Enable and Use Accelerated Transfer Mode on a bucket

  1. Requirements:
    1. Release 1.1.6 or later
  2. Why use this mode? (AWS documentation reference)
    1. You might want to use Transfer Acceleration on a bucket for various reasons, including the following:

      1. You have customers that upload to a centralized bucket from all over the world.

      2. You transfer gigabytes to terabytes of data on a regular basis across continents.

      3. You are unable to utilize all of your available bandwidth over the Internet when uploading to Amazon S3.

  3. Enabling Transfer Acceleration on an Amazon S3 Bucket Guide Link.

  4. How to Configure in Golden Copy
    1. When adding the folder the --aws-accelerated-mode true flag must be used.

    2. The endpoint must use the Accelerated endpoint FQDN (i.e bucketname.s3-accelerate.amazonaws.com)  to access an acceleration-enabled bucket. 

Google Cloud Storage Creation Walk Through

  1. Requirements:
    1. Storage Buckets should use the Uniform Access Control method when setting up new storage bucket.   See the guide here for details.
  2. Login to the Google Cloud Platform console and open IAM to create a service account, click create and then Done.
  3.  
  4. Click on the created service account to create a json key and click on Create new key select Json
  5.   
  6. The file will be downloaded to your pc and used later.
  7.  
  8. Locate the Storage bucket created to store Golden Copy data (steps not covered to create buckets, see Google Documentation) and select the permissions tab.  Click Add.  NOTE:  You will need the email ID on the service account created in the step above.
  9.  
  10. Use the Storage Object Creator role.
  11.  
  12. Repeat these steps on each Storage Bucket that will be used with Golden Copy.
  13. Copy the json authentication file to Golden Copy node 1 with winscp to the /home/ecaadmin path.
    1. Authenticate using the ecaadmin user.
    2. This completes all the Google console steps and the remaining steps will cover how to add a Google Cloud Storage end point in the quick start section of this guide.




Azure Blob Storage Creation Walk Through

Azure Blob Storage is not compatible with S3 protocol and uses a proprietary REST API.  The configuration is similar to S3 bucket concepts.  A container is similar to a bucket and a blob is the same as an object.

How to create Azure Storage Account 

This process creates a storage account if one does not already exist.  The storage account contains a Container that will store data copied from Golden Copy.

  1. Login to Azure console.
  2. Select  Home --> Storage accounts --> Create storage account --> + sign to create .
  3. Assign or create a "Resource group" (i.e."goldencopy"), enter "Storage account name" (i.e. "goldencopy2"), fill in the remaining fields and click "Next: Networking".

  4. Select a Networking configuration, note authentication will be used.  This section limits the networks or virtual networks this blob will be reachable.  Configuring networking to allow your on site data center network to be able to reach the endpoints.   Consult Azure Documentation on how to secure access from your on premise network to Azure.

  5. Click "Next: Advanced" and accept Default settings or change according to your requirements. 
  6. Click "Next: Tags".
  7. Leave the Tags Screen at defaults click "Next".
  8. On the Final screen review all settings and click create.
  9. The completes the Storage Account Configuration.  Continue to the next section to create a container.


How to create an Azure Blob Container in a Storage Account

  1. Select the Storage Account in the Azure console.
  2. On the menu on the left, under"Blob service" click on the "Containers" option.

  3. Click the + to create a "New container", under "Name" enter a unique name and leave the default setting as "Private no anonymous access". Click "Create".

  4. After creation click on the "Properties" tab and record the URL to the container for use when configuring Golden Copy folders.

  5. Authentication keys for Blob access can be found on the Storage account --> Settings --> Access Keys .

  6. On this screen record the Key 1 access key for use when configuring Golden Copy .
  7. Done,  you will need the following information from these steps
    1. Storage Account.
    2. Container URL.
    3. Access Key for the Storage Account.


Cohesity Walk Through Example

How to Create the Storage View and Service Account User on a Cohesity Storage Array

  1. Create Service Account for Authentication with Golden Copy:
    1. Under the Admin menu --> Access Management.
    2. Create a new "Service Account User" with local user option with the admin role assigned (i.e. "eyeglassSR").
    3. NOTE: To ensure the storage view gets the bucket ACL's assigned, it is required to create the storage view logged in as the service account user created in this step.
  2. Login to the console as the new Service Account User (see note above).
  3. A Storage View is required with only the S3 protocol enabled for write access over the S3 protocol.
  4. The name of the view will generate an S3 bucket name. Use all lowercase and no special characters to create the Storage View name.
  5. This Storage View name will be needed when adding the folder to Golden Copy folder configuration using the bucket parameter.
  6. In the console use the Platform Menu --> View to create the new Storage view for S3
    1. Logged in as the new user,  view the user in  Admin menu --> Access Management --> click on the user and record the access ID and the secret ID.
    2. Use the values shown on the step above when adding the folder using the example above. 
  7. Done.

Appliance Global Settings Configuration

How to set Default Settings for Snapshot Expiry for all Folders

Folders have default settings that can be viewed and set using commands in this section.

  1. How to set the Global Default snapshot Expiry for long running copy jobs
    1. searchctl archivedfolders configure --full-copy-snap-expiry x  (where x is in days)
    2. NOTE: Use this command to extend the source snapshot taken for a long running copy job that could take many days to complete.  The default is 10 days unless changed.
  2. How to view the Global Default settings on folders
    1. searchctl archivedfolders getConfig
  3. NOTE: Each folder add command also allows a per folder override , see folder add command to set per folder.

How to set File Checksum Global Settings for All Folders

  1. This option enables software MD5 checksum and includes the checksum in the S3 headers to allow the AWS, ECS or other S3 targets to recompute the MD5 checksum before saving the object.
    1. Commands
      1. checksum property is controlled by searchctl archivedfolders configure -h
        1. searchctl archivedfolders configure --checksum <ON | OFF | default>
          1. value ON computes the checksum and adds a visible metadata property for files and folders. NOTE: This requires a file to be read twice, once for the checksum and a 2nd time to upload it with the checksum visible on the metadata   This option is preferred since independent data integrity is now possible for stored objects after they have been copied.    
          2. value OFF does not compute the checksum and avoids any in-built default checksum computing
          3. value DEFAULT is the default behavior for each cloud target
            1. OFF for Azure and partial ON
            2. AWS/ S3 Targets of type other, which means it is calculated but no visible property on the object)
            3. Google Cloud - TBD 
      2. checksum property can be viewed with
        1. searchctl archivedfolders getConfig



Advanced Configurations to Appliance Configuration

The parameters allow advanced changes to the appliance and should only be changed when directed by support.  

  1. nano /opt/superna/eca/eca-env-common.conf  Note: Only make changes if advised by support!
    1. export ARCHIVE_PARALLEL_MULTI_UPLOAD=True   (this enables multi part upload for files over a certain size)
    2. export ARCHIVE_MAX_PART_SIZE=   (10MB default for larger files , leave at default)
    3. export ARCHIVE_PARALLEL_THREAD_COUNT=10   (sets the number of concurrent files to copy across all PowerScale nodes in the pool) 
    4. export ARCHIVE_S3_PREFIX="igls-"     OneFS 9 requirement example:  to replace xxx with string to prefix folder objects that store ACL information and avoids a collection with directory names on NAS devices  example "igls-"
      1. Each folder object will have this prefix applied to avoid the collection with duplicate folder names. This flag will apply a prefix to folders that are uploaded that are stored to retain ACL's on folders.  You will see for each folder uploaded, one folder stores files, and an additional object with the same name as the folder.  This is where the ACL is stored for the folder that contains files.
      2. When using the file system based S3 target storage example, OneFS 9 with S3 support, 2 folders with the same name are not supported.  This requires the folder object to have a different name.  This prefix will be applied to folder objects when the target storage enforces unique object keys and object names.  Consult support before using this flag. 
    5. export ARCHIVE_ENABLE_SDK=true   (Enables Native SDK mode - default mode as of 1.1.4 > 178 build)
    6. export ARCHIVE_PARALLEL_THREAD_COUNT=100 (number of concurrent files per Golden Copy VM or Accelerator Nodes, increasing this number may not increase performance unless sufficient bandwidth is available)
    7. export ARCHIVE_PARALLEL_THREAD_SDK=10 (Number of separate threads used to copy a single large file, higher number will increase bandwidth utilization)
    8. export ARCHIVE_SMART_INCREMENTAL=false (default value)
      1. Change the value to true to enable, this will need to be added to common conf file and cluster down and up to take effect.
      2. This enables a fast incremental feature to skip collecting metadata that requires additional api calls to collect,  when this is  true metadata will only support owner, group (none text value, and sid/gid only), no mode bits will be collected.
      3. This is only recommended if a very high change rate of files is expected and the performance of fast incremental is required.
    9. export ARCHIVE_BLOCK_PARALLEL_JOBS=true (default true, Blocking Parallel on single VM Golden copy, multi VM deployments must be used to allow parallel jobs).
      1. Set to false to allow multi VM deployments to run multiple copy jobs.
    10. export ARCHIVE_FULL_PARALLEL_JOBS_ALLOWED=x
      1. Default is 1 change to a value no greater than 30 and requires multi VM deployment for a supported configuration
    11. export ARCHIVE_INCREMENTAL_PARALLEL_JOBS_ALLOWED=x
      1. Default is 1 change to a value no greater than 30 and requires multi VM deployment for a supported configuration
    12. export ARCHIVE_TOTAL_JOBS_ALLOWED=60
      1. Default 2, for 1 Full and 1 Incremental,  Maximum supported value is 60 which allows for 30 folders with full copy jobs and 30 incremental jobs
    13. Special character encoding for some S3 targets that do not support special characters in metadata tags.  Set to true and cluster down and up to take effect.   the User and group metadata will be base64 encoded into the metadata tags.
      1. export ENCODE_OWNER_GROUP=true 


© Superna LLC