Administration Guides

Archive Engine - Configuration


How to Configure Archive Engine


  1. Add a folder definition to archive data to a target S3 bucket with all required parameters, including storage tier for cloud providers
    1. Create 2 buckets
      1. One for holding archived data example smartarchiver-data 
      2. One for temporarily recalled data staging example smartarchiver-staging 
      3. trashcan bucket for deleted recalled data in the cloud example smartarchiver-trash 
  2. Specify the authorized user paths that will be enabled for end-user archive capabilities
  3. Add a pipeline configuration to present Archived data to end users for recall.  

Steps to Prepare the VM

  1. Add the Archive Engine license key
    1. searchctl license add --path /home/ecaadmin/<zip file name>
  2. ​add support for Archive Engine 
  3. nano /opt/superna/eca/eca-env-common.conf
  4. add this variable
    1. export ENABLE_SMART_ARCHIVER=true
  5. nano /opt/superna/eca/docker.compose.overrides.yml
  6. Add this text below and save the file with control+x
  7. ecactl cluster down
  8. ecactl cluster up
version: '2.4'



Steps to Configure a Path for User Archiving

  1. Overview
    1. An archive definition is built using 2 folder definitions
      1. On premise to the cloud
      2. Cloud to on premise
    2. The folder definitions use specific types and features to enable archive and recall workflows.   Security of what data is presented to users for archive and recall is also defined within the folder definitions
    3. The security model allows archive and recall security to be defined separately 
  2. Add the folder definition and change the yellow sections below
    1. <cluster name> the source cluster name
    2. --folder  - The location where archive data will be staged before it's moved to the cloud.  We recommend using /ifs/tocloud
    3. access and secret keys
    4. endpoint URL - See the configuration guide on how to specify different target endpoint URLs
    5. <region>  - the region parameter based on the s3 target used
    6. cloudtype - set the cloud type to match your s3 target
    7. --smart-archiver-sourcepaths -  This parameter is used to authorize any path in the file system to appear as an eligible path to end users to select folders to archive to S3,  enter a comma separated list of paths example below
      1. --smart-archiver-sourcepaths  /ifs/projects, /ifs/home 
    8. --full-archive-schedule  - Set a schedule that will be used to archive data selected for archive.  Recommended daily archive at midnight to move the data copy to off peak hours
      1. example --full-archive-schedule "0 0 * * *" 
    9. --delete-from-source -  this will ensure data is deleted after it's copied,  this is required for move operations, versus end user backup to S3 use case 
    10. --smart-archiver-type Staging  - This flag indicates the configuration definition is on premise to the cloud direction and data is moved here before it's archived to the cloud
    11. --smart-archiver-type Recall - This flag indicates the configuration definition is from the cloud to on premise when users recall data
    12. --storeFileHistory  -  This flags the lifecycle of file data with each archive or recall the history of what happened to the file is encoded into a custom s3 property to allow admins to see what has happened to the data.

  3. On premise to Cloud - example Archive Engine command 
    1. searchctl archivedfolders add --isilon gcsource --folder /ifs/tocloud --accesskey yyyy --secretkey yyyyy --endpoint --region ca-central-1 --bucket smartarchiver-data --cloudtype aws --smart-archiver-sourcepaths /ifs/projects --full-archive-schedule "0 0 * * *" --delete-from-source --smart-archiver-recall-bucket smartarchiver-staging --smart-archiver-type staging  --storeFileHistory 
  4. Cloud to on-premise - Add Pipeline from Object to File configuration for user recall jobs
    1. Create the staging area where recalled data will be copied during the recall process.  This is a location on the cluster.
      1. Login to the cluster with data that will be archived as root
      2. mkdir -p /ifs/fromcloud/sa-staging 
    2. Add the Pipeline configuration for Archive Engine
      1. searchctl archivedfolders add --isilon gcsource --folder /ifs/fromcloud/sa-staging --accesskey xxxxxxxx --secretkey yyyyyyyyy --endpoint --region <region> --bucket smartarchiver-demo --cloudtype aws --source-path gcsource/ifs/projects --recall-schedule "*/5 * * * *" --recall-from-sourcepath  --smart-archiver-type recall --trash-after-recall --recyclebucket smartarchiver-trash --storeFileHistory 
        1. --source-path  This parameter authorizes the data path in the archive bucket that will be presented to the user as recall eligible data the syntax is <clustername>/pathtoarchivedata all data under this path will be visible to end users for recall. 
          1. In the example above gcsource is the cluster name, and /ifs/projects is authorised for the end user to see data in the archive and recall it.
        2. --bucket   <bucket name> This must be the same bucket used in the folder definition above where the data is archived.  This ties the two folder definitions together.
        3. --trash-after-recall   - makes sure staging data is deleted after it's recalled
        4. --recyclebucket smartarchiver-trash   - moves recalled data to a trash bucket with a life cycle policy to delete the data after x days.
        5. --storeFileHistory - This flags the lifecycle of file data with each archive or recall the history of what happened to the file is encoded into a custom s3 property to allow admins to see what has happend to the data.
      2. done

How to Archive data as an end user

  1. Login to the Archive Engine GUI as an Active Directory User
  2. Select the Archive Engine icon
  3. Browse to the path entered above that was authorized for end users to archive.  Folders that are eligible will show Archivable label.
  5. Select a Folder
    2. The Up Cloud icon appears,  
    3. Click the button to archive the data
    5. User action is completed.  The data will not be moved until the scheduled job runs.   
      1. After the job is launched, the folders will show as staged until the archive job completes.  See below.
    6. After the job completes and the user browses to the folder, they will now see the archived data on the right-hand screen.

How to Recall Archived Data back to the File System

  1. A user can log in to the Archive Engine portal and view the file system on the left-hand pane, and on the right, they will see data archived from the currently selected folder on the right side.   In the screenshot below, the child1 folder is visible on the left pane but also visible on the right side.  This is because the data was recalled back to the file system.  The cloud down arrow button on the right is used to recall data with a single click.
    1. NOTE:  If the folder name already exists in the file system, a collision will occur.  This is handled by renaming the folder before it's restored.  The folder will be renamed <original folder name>-smart-archiver-recall-collision-<month>-<day>-<time>-<year>
    2. NOTE: The user adds the recall staging bucket first. After that, the user can add a second folder with -smart-archive-recall-folder  
  3. To recall data click the button next to the folder on the right pane.
  5. This will launch a recall job that will be processed on the next scheduled recall job.
  6. done

How to Track Data Lifecycle Movement

  1. The --storeFileHistory flag adds tracking data to each file's movement from file to object and object back to file.  This provides full traceability of all data movement within the Archive Engine system.  This is encoded directly in the objects making inspection easy and simple to see what happened to data.  Future releases will use this metadata to report data movement across the file and object systems. 
  2. In this example, the data is moved from PowerScale (isi)  to the AWS storage target.  The tag shows the date and time the object was moved from file to object.

Error handling

When the user tries to archive 1M of files, it is possible, for example, to lose the connection and not delete some files (there is always a percentage of error).

But if the user has some recurrent archive process, this information will finally be deleted because there’s less information to archive.

Sometimes the Isilon receive too many requests that cannot be managed, or the token session has expired. However, new INFO-level error messages have been added to describe this a little bit more:

  • Timeout error: Too many requests sent to Isilon

  • Connection error: Isilon token session has expired

© Superna Inc