Administration Guides

Cloud Browser for Data Mobility

Home



Overview

This unique capability allows customers to bring cloud object storage and on premise data storage into a secure, centrally managed , self service portal for end users to copy/move/sync cloud object data into SMB shares.   This portal provides IT a centralized tool to monitor user data transfers along with each users SMB and S3 permissions enforced keeping the data secure.    An audit trail of who did what when is available along with detailed file and object level reporting on what data was orchestrated between cloud and on premise storage. 

  • The web based user interface stores a view of S3 bucket level access dynamically detected from AWS S3 service and the SMB shares are dynamically detected based on Active Directory login.
  • The user can select data from the cloud and have them moved, copied or synced to on premise storage for editing workflows.
  • Golden Copy has advanced data orchestration features to allow JED (Just Enough Disk) editing workflows to cache on premise data for high speed low latency playback and editing.    End of day work is moved back to the cloud automatically and data that stays on premise is backed up automatically.
  • End users see an endless pool of storage in the cloud and can launch data mobility jobs without IT assistance in the self service portal.  The jobs can email the user when the data movement is completed or they can monitor the job from the GUI.
  • Full multi-tenant user access gives administrators the ability to monitor jobs for all users and track security of data management tasks easily from a single pane of glass showing data assets on premise or in the cloud and who did what when.


Use Cases

  1. Media & Entertainment - editors, content creators can select digital assets from cloud repositories to check out the data for on premise workflows, editing
  2. Life Sciences - Researchers can retrieve archived data generated from instruments used in on premise research and analytics workflows
  3. Security Professionals - Security logs, and artifacts archived by security tools can be retrieved from Cloud storage archives and presented locally on SMB file systems for analysis

Capabilities Overview Summary

 


How Data AnyWhere Orchestration Works

NOTE:  This requires 1.1.20 release or later for concurrent user job orchestration



Video Overview


Prerequisites

  1. Pipeline license subscription
  2. Advanced License key
  3. Per Seat license for each user that will use the workflow
  4. AWS s3 service

AWS IAM Policy Example for User and System S3 buckets

  1. The following examples are required minimum permissions needed
  2. These policies were created based on these guidelines: The absolute minimum permissions necessary for a user to use CBDM
    1. Ensure these permissions are scoped to the bucket wherever possible.  NOTE: Not all permissions support resource scoping.  Example Listbuckets is not supported by AWS IAM policies.
  3. Golden Copy Service account IAM:
    1. Added to folder definitions
    2. Staging Bucket: Copies objects from the User bucket to the Staging bucket. This account needs ListBucket and GetObject permission on All User Buckets. The scope can be limited to buckets with certain prefixes.
    3. Staging Data: This account is used to read data from the staging bucket and copy or move the data on-premise.
    4. Trash Bucket: This account is used to Copy data to the Trash bucket and then Delete the data from the staging bucket. The trash bucket provides a way to recovery data if necessary. It is recommended to use a lifecycle policy on the trashbukcet to delete objects every 30 days.
  4. End User IAM Accounts:
    1. Add user key Credentials to the profiles.  A profile is created for users that login to Golden copy with an Active Directory user ID.  This profile allows S3 credential(s) to be added.
    2. Used to Browser buckets.
    3. Used to Move job data on-premise and requires Delete permissions to objects in the source bucket.
    4. No permissions are needed for the Staging and Trash buckets that are managed by Golden Copy service account.  This ensures staging data is secure and invisible to end users.





Golden Copy Service IAM (archivedfolders)

The following policies are attached to a IAM User:

  • Staging Bucket Policy (GET/PUT/DELETE)

  • Trash/Recycle Bucket Policy (PUT)

  • ALL User Bucket (GET)



Policy Name

Staging bucket

Description 

Code 

Comment

Staging Bucket Policy

stagingbucketname

This policy enables the Service IAM to PUT/DELETE objects on the Staging bucket.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ForCopyingObjectsFromUserBuckets",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::stagingbucketname.staging",
"arn:aws:s3:::stagingbucketname.staging/*"
]
}
]
}

ListBucket- List objects inside the Staging bucket.

PutObject- Put object into Staging Bucket for copying files from User Bucket

GetObject- Get object from Staging bucket for `--trash-after-recall`

DeleteObject- Delete object from Staging bucket for `--trash-after-recall`

staging- Apply ListBucket to Staging Bucket

staging/*- Get/Put/Delete objects in Staging Bucket

Trash/Recycle Bucket Policy

trashbucketname

This policy enables the Service IAM to PUT objects on the Trash/Recycle bucket.


{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ForStoringDeletedObjectsFromStaging",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::trashbucketname.trash",
"arn:aws:s3:::trashbucketname.trash/*"
]
}
]
}

ListBucket- List objects inside Trash Bucket.

PutObject- Put object into Trash Bucket

Trash- Apply ListBucket to Trash Bucket

trash/*- Put objects into Trash Bucket

ALL User Bucket Policy

userbucketname


This policy enables the Service IAM to copy from the user buckets. In this case, from All buckets in the region that starts with the prefix userbucketname


{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ToAllowCopyingObjectsFromUserBuckets",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::userbucketname.user*",
"arn:aws:s3:::userbucketname.user*/*"

]
}
]
}

ListBucket- List objects in user buckets.

GetObject- Get objects from user buckets

user*- Constraint for every bucket that starts with `.user`.

user*/*- Constraint for every object inside every bucket that starts with `.user`.




End User IAM (Credentials)

The following policies are attached to a IAM User:

  • User Bucket (GET/DELETE)



IAM Policy attached to IAM User (user01)

userbucketname


{

  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "UserPermissionToTheirBucket",
          "Effect": "Allow",
          "Action": [
              "s3:GetObject",
              "s3:ListBucket",
              "s3:DeleteObject",
              "s3:GetBucketLocation"
          ],
          "Resource": [
              "arn:aws:s3:::userbucketname.user01",
              "arn:aws:s3:::userbucketname.user01/*"
          ]
      },
      {
          "Sid": "CBDMCredentialsRequirement",
          "Effect": "Allow",
          "Action": "s3:ListAllMyBuckets",
          "Resource": "*"
      }
  ]
}

GetObject- Get object. Needed by CBDM Browser to browse into "folders".

ListBucket- List object inside bucket. CBDM Browser needs this.

DeleteObject- Delete Original Objects for a MOVE job.

GetBucketLocation- Get bucket location. Needed to Add Credentials.

user01- Apply ListObject and GetBucketLocation to User Bucket.

user01/*- Get/Delete objects in User Bucket.

ListAllMyBuckets- List all buckets in the region. NOTE: CBDM Credentials need this

*-Cannot be Resource constrained, has to be all 




How to configure Cloud Browser for Data Mobility

Information The prefix for gcdm must be in the lower case as it is case-sensitive.
  1. Add icon to the GUI for users that need to use Cloud Browser for Data Mobility workflows
    1. On Golden Copy node 1​
    2. login ssh as ecaadmin
    3. nano /opt/superna/eca/eca-env-common.conf 
    4. Add this variable
      1. export ENABLE_CONTENT_CREATOR=true
  2. Pipeline NFS mount configured for data recall on Golden Copy
    1. Note covered in this guide is creating the NFS mount on all Golden Copy nodes in fstab to setup the recall NFS mount and use the /ifs/fromcloud/ path on the target cluster.
    2. NOTE: The NFS mount path AND the --recall-sourcepath flag on the searchctl isilons modify command MUST match the NFS mount path above.
    3. See the Guide here.
  3. Set the recall data location for cloud data on the cluster
    1. searchctl isilons modify  --name xxxx  --recall-sourcepath /ifs/fromcloud  (where xxxx is the cluster name added to Golden Copy)
    2. This path will uniquely store all data arriving from the cloud for the CBDM pipeline configuration created below.  All SMB shares to expose data to end users will need to be created under this path.  More information below on exact paths needed for SMB shares for different user projects.
  4. Configure a pipeline from the Cloud to the on premise cluster with specific parameters needed for CBDM
    1. Prerequisites
      1. Access to create 2 x S3 buckets needed as infrastructure for GCDM workflow
      2. Create service account keys used by Golden Copy to manage data operations in the S3 buckets that end users will be using.  The keys used require the following:
        1. Pipeline S3 bucket permissions: full data permissions on the buckets named below
        2. End user data S3 bucket permissions: list bucket, getobject, deleteobject
      3. At least one SMB share for testing user access
        1. Create an SMB share using CBDM naming syntax (Mandatory)
          1. For this example, create an SMB share on the PowerScale on this path for a project called movie-a /ifs/fromcloud/movie-a.   Assign an AD user or AD group to this SMB share to limit access to data in this share. 
          2. Create the test share called gcdm-movie-A
          3. NOTE: The SMB share names shown in the CBDM GUI will be limited to SMB shares that being with gcdm-xxxx. (where xxx is user defined share name)
    2. Configuration Example
      1. Create IAM user with keys to to the infrastructure buckets below
      2. Cloud staging bucket name - gc-cbdm-staging
        1. This bucket is used to stage data for on premise data orchestration, and users don't access this bucket it stores data that has been moved or copied from all user jobs.
      3. Cloud trash bucket name - gc-cbdm-trash
        1. This bucket is where all data that moves from the cloud to on premise will end up after it's copied and trashed for review by administrators.
        2. A life cycle policy should be set to purge this bucket after 1-5 days. 
      4. Path for on the on premise file system on PowerScale to store all cloud data.  SMB shares will be used to secure the data for each user or group of users using AD groups.
        1. /ifs/fromcloud  - the /ifs/fromcloud is the recall nfs mount path,  the /ifs/fromcloud pipeline location is where all cloud data will be stored from user jobs.
        2. NOTE: All smb shares will be created under this path; the data that is copied from the cloud source buckets to the file system will use this naming convention.
        3. NOTE: The pipeline will use the option to delete data after jobs finish from the staging bucket to keep this bucket size auto managed and clean up after jobs.
        4. NOTE:  All SMB shares that should be visible to end users in the GUI need to have the string GCDM- in the name of the share AND fall under the pipeline path /ifs/fromcloud. 
  5. Example pipeline CLI command using the example above
    1. searchctl archivedfolders add --isilon gcsource --folder /ifs/fromcloud --accesskey xxxxx --secretkey yyyyy --endpoint s3.ca-central-1.amazonaws.com --region ca-central-1 --bucket gc-cbdm-staging --cloudtype aws --source-path gcsource/pipeline --recall-from-sourcepath --recyclebucket gc-cbdm-trash --trash-after-recall.  
      1. NOTE:  This property --source-path gcsource/pipeline is a gc-cbdm-staging prefix used to ensure all data directed at the on premise cluster all gets created under a prefix in the staging bucket to avoid any data collections.  We also delete all data under this prefix and is another reason this prefix is used to ensure other data is not accidently deleted.  In this release the prefix is hardcoded and cannot be changed. 
  6. Configuration complete


Testing end-user Cloud Browser Data Mobility


  1. This assumes all configuration steps, SMB shares, AD accounts have already been configured.
  2. Login as an AD user into CBDM using Golden Copy node 1 https://x.x.x.x.
  3. Select profile.
    1.  
    2. Add S3 keys and endpoint
    3.  
    4. Add an email address to be notified about job completions
    5.  
  4. Logout and login again.
  5. Open the Cloud Browser for Data Mobility workflow icon.
    1.  
  6. Select Cloud source in Step 1. Select a bucket from the drop-down list for Step 2. 
    1.  
    2. Browse the bucket in Step 3 to select data to Copy or Move to on premise storage and click add to Job button.  
    3. Click the Job Icon on the top of the workflow
    4. Click the Run Job button and Select Cloud to On Premise Destination,  Select an SMB share to copy  or move the data to on the on premise storage
    5.  
    6. Click Submit and wait for the GUI to show the running job.  The job can be monitored
    7. The user will get an email with the completed copy status.
    8. Administrators can monitor with more detail from the CLI
    9.   
  7. The users mounted SMB share will show the files once they are copied or moved from the cloud.
  8. NOTE:  The data will be copied under a folder that references the source S3 bucket.
    1.   
    2. Email the user receives

    3. In the example above the source bucket was called smartarchiver-demo-xxxxx, the unique number at the end ensures data collisions do not occur on the file system)
    4. In side the bucket the users selected data will appear
    5.   
  9. Cloud infrastructure buckets after a Copy.
    1. The staging bucket has versioning enabled to show the data that was copied here temporarily and then copied to the trash can bucket before it was deleted.  This allows the staging bucket to remain empty unless data is in transit to on premise storage.
    2.   
    3. The trash bucket contains the jobs files
    4.   
  10. Done.

Monitoring User job Activity

  1. Golden copy administrators can monitor all user jobs in the cluster by using the history tab of the Cloud Browser for Data Mobility workflow icon.
  2.  





© Superna Inc