Administration Guides

S3 Storage Bucket Configurations Options , Operations and Settings

Home


Overview

This section covers storage bucket optional configurations and operational procedures.

How to Cancel a Running Archive Job

  1. First get a list of running jobs.
  2. searchctl jobs running   (this command will return job id's).
  3. searchctl jobs cancel --id  job-xxxxxxxx (enter the job id you want to cancel. NOTE: this will stop copying files, but files that are already copied will be left in the S3 storage up to the time the copy was canceled).

How to Configure S3 Storage bucket retention, versioning to meet your File copy Archive Use case

Use Case: Data Retention with Scheduled Copies with or without bucket Versioning 

  1. This copy mode will run on a schedule to copy all folders configured without --manual flag setting. 
  2. The full SmartCopy will check if the target S3 file exists and skip existing files that are the same version.
  3. Files that are modified AND a previous version exists in the S3 storage will be updated in the S3 storage.
  4. Object Retention:
    1. Best practice is to set the S3 retention policy at the bucket level so that all new objects automatically get retention set per object.   Use different storage buckets with different retention policies or use versioning feature on the S3 storage target to set retention per version of the file.   See vendor documentation on how to configure retention policies.  
  5. (Optional) Enable Bucket versioning to keep previous versions with an expiry set for each version in days.  This will will allow modified files to be uploaded while preserving the old version of the file should it need to be restored.
    1. Example: For Amazon S3 enables each version of a file uploaded with 45 day expiry.

    2. NOTE:  If a file changes several times between Full Copy jobs only the last version modified prior to the copy job will be stored.

Use Case: One Time Copy Data Archiving

  1. To clean up data and copy to S3 storage for long term archive and legal retention.  This use case is a 2 step process.
  2. Create a folder target configuration but add the --manual flag which excludes the folder from scheduled copies.
  3. Run the archive job (see guide here on running archive jobs).
  4. The copy job will run and review the copy results for success and failures. See guide here.
  5. Once you have verified all all the data is copied with an S3 browser tool (i.e. Cyberduck, or S3 Browser), run the PowerScale Tree Delete command to submit a job to delete the data that was copied.  Consult Dell Documentation on Tree Delete command.
  6. Recommendation:  Create a storage bucket for long term archive and set the Time to Live on the storage bucket to met your data retention requirements.  Consult the S3 target device documentation on how to configure the TTL.
  7. Copy to new target path: 
    1. The run archive job has a flag to add a prefix with --prefix xxx where xxx is the prefix to add to the S3 path (see guide here) to the S3 storage bucket path which is useful to copy the data to a 2nd path in the S3 storage and not overwrite the existing path.   This allows creating multiple folders of the same source path in the target S3 storage. This can be used for present the S3 data over file sharing protocols (i.e. ECS feature or direct access to the S3 data for application use). 
  8. NOTE:  You can run a copy job multiple times using the CLI if data has changed on the source path and a full smart copy job will run again.

How to clean up orphaned multi part file uploads

Multi part upload can fail for various reasons and leaves orphaned incomplete files in the storage bucket that should be cleaned up since they are not a complete file.  See the procedures below for each S3 provider.  NOTE:  Golden Copy will issue an Abort Multi Part API command to instruct the target to delete partial uploads.   The procedures below should still be enabled in case the S3 target does not complete the clean up or does not support the Abort Multi Part API command.

Amazon S3 Procedure

  1. Login to Amazon S3 console.
  2. Click on the storage bucket name.
  3. Click on Management tab.

  4. Click Add life cycle rule.
  5. Name the rule incomplete uploads:

  6. Leave defaults on this screen:

  7. Configure as per the screenshot below:

  8. click "Next" and save the rule.
  9. Done.


How to Configure Amazon AWS Glacier

  1. Glacier is an S3 service tier that offers long term long cost archive. Once an object has been moved to Glacier it will no longer be accessible from an S3 browser or by Golden Copy for restore until the inflate process has been completed.  
  2. The steps below explain how to edit a single files storage class and how to create life cycle policy to move all uploaded data to Glacier.

How to apply single File storage class change for testing Glacier

  1. Login to the Amazon S3 portal and click on a file in the storage bucket and select the "Properties" tab.

  2. click on the "Storage class" option, select "Glacier" and save the change.

  3. Note: after this change the file will no longer be downloadable from an S3 browser or Golden Copy.

How To recall a file from Glacier:         

  1. Select the file in the Amazon S3 bucket and select the check box and click "Restore".

    1. Note: Bulk operations or api restore is possible and you should consult AWS documentation.
  2. Complete the restore, restore speed options and days the restored file will be available for access to submit the request.
  3. Once the restore has been completed, the property on the file is updated to show how long the file can be downloaded.

  4. Done.

How to configure Glacier Lifecycle policies on a storage bucket

  1. Select the bucket Management tab.
  2. Click "Add Lifecyle Rule".
  3. Enter a rule name.

  4. Select "Current version" or versions check box depending on your version setting on the bucket.
  5. Click "Object creation" rule drop down and select Glacier

  6. Click "Next".
  7. Optional but recommended enable clean up incomplete mulipart uploads and set to 1 day

  8. Click "Next" and save the policy.
  9. Note all newly created objects will move to Glacier tier 1 day after creation.
  10. Done.


How to Configure Storage Tier Lifecycle Policies with Azure Blob Storage

  1. Centralized Azure lifecyle managment allows simple policies to move large numbers of objects with policies.
  2. This simple guide shows how to create a storage policies to move  objects based on last accessed or modified dates on the objects to determine when to move the object to a lower cost storage tier.
  3. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-portal#azure-portal-list-view 


© Superna LLC