Administration Guides

Golden Copy Open File Format Definition

Home

 

Overview

Golden Copy offers an open format to data for backup using native features within S3 target storage to enable self describing metadata architecture.   This benefits customers by avoiding a data locking from a vendor and allows customers to build solutions that extract rich metadata contained within the file system.   Examples of solutions that customers can leverage the metadata open format.

Use Cases

  1. Read the S3 properties to extracted created, modified, or last accessed date stamps from the backed up file system data
  2. View the owner and group of the backed up data directly from the object properties
  3. View Folder ACL's that were backed up separately from the from the data using a json representation of the data
  4. View the files mode bits that existed at the time of backup from the s3 object properties
  5. Restore S3 data to a file system and re-apply the metadata (owern,group, mode bits and folder ACL's)
  6. Validate the data integrity of backed up data using md5 checksum property stored as a property of the object.  The MD5 checksum is stored during the backup process allowing automated data integrity validation or one time validation on specific files.

How to view the file metadata

  1. Numerous tools allow browsing objects in s3 and viewing the custom properties.  Examples below show AWS console and Cyberduck
  2.  
  3.   


MetaData Format Definition

  1. File metadata baseline are fields that all files will have after upload by Golden Copy.  A useful tool to convert epoch encoded dates to human readable dates is available here
  2. owner - the active directory or linux user that owns the file
  3. group - the active directory or linux group that created the file
  4. mode - contains the mode bits from linux for read, write , execute for users and groups
  5. changetime - creation date of the file encoded in epoch time.  
  6. createdtime - creation date of the file encoded in epoch time 
  7. modifiedtime - last modified date of the file encoded in epoch time 
  8. islongpathlink - true or false value that identifies if the file was present on a path that is longer than 1024 bytes path which cannot be uploaded to S3 due to limitations of S3 prefix length.  This allows Golden copy to store this files path in json format preserve 4K length posix paths in S3 and allow the file to restored and preserve the entire path length.
  9. issymlink - (true/false) Media mode in Golden Copy preserve symlinks and hard links to store single instance of the data and preserve pointers to the real file.  This has 2 benefits A) reduces the cost of storing duplicate data during the backup process B) ensures symlinks and hard links can be restored in the file system and avoids data expansion.
  10. ishardlink - (true/false) same as above but covers hard links
  11. inode - (inode from the file system) - stores the inode for symlink and hardlink when media mode is enabled to preserve links.  The .gc_inodes prefix key under the backup folder path in the S3 bucket stores objects named after the inode and contains the real file the sym or hard links are pointing at.  The links will be created in the path they were found and contain a json payload that lists the inode object path in the .gc_nodes prefix path.
  12. isfolder  - indicates if the object in the bucket was a folder on the file system.  Folders have a dedicated object created that contains json permissions from the file system, this flag indicates true or false for the objects that are folders
  13. checksum - If checksum mode is enabled the file checksum is an md5 created during the backup and stored as a property for future data integrity validations
  14. history_full - As data moves from on premise storage to object storage this property stores the data flow history showing Powerscale --> AWS or Azure or ECS etc.. and if data is recalled back on premise this workflow is appended to the property to show that data has been backed up and restored at some point.


How can this metadata be used?

  1. Automation tools can decode the metadata tags using the S3 Head api call to retrieve the tags and write application or logic that can re-apply owner,group, mode bits or folder level ACL's into an alternate file system.   Golden Copy supports this automation for Dell PowerScale and fully integrates the Dell API to apply the metadata during a restore operation.
  2. This open format of metadata allows any developer to read the information and automate the same task for any file system.
© Superna Inc