Administration Guides

Understanding Copy Performance Configurations

Home



Overview

This section should be reviewed when deciding how to configure which Golden Copy for copying files, and which IP pool interfaces will be used to copy files.   In addition, decisions on parallel copy influences the speed of copies and the bandwidth consumed for a copy job.  Golden Copy also supports multipart uploads and downloads.  If the file system contains mostly large files over 64MB than multipart upload memory upgrade is required.


Performance FAQ

  1. Most file systems are small files and throughput is not a metric that can be used to measure throughput.  Golden Copy tracks files per second stats last 15m, hour, 24 hours.
  2. Threads copy files, and Golden Copy defaults to 100 threads per VM, vertical scaling more threads on a vm does not scale as well as scale out.  That is why 6 VM is the starting point for higher throughput requirements. 
    1. Golden Copy will attempt 600 files on each thread, depending on file size, it is possible to reach 1200 files a second or more (NOTE:  this statement is generic and file size determines peak file per second rates)
    2. Scale-out is always the best option for increased small file performance.
    3. Streaming Copy Mode - Coming in a dot release is streamed small file copying, which will require a small VM at the target cloud provider.   Streamed copies nails up a single TCP connection per GC VM and places byte streams of small files onto this single tcp connection with the small VM at the cloud provider allowing unpacking of the stream and local S3 put to the bucket where the latency to the S3 bucket is local , low latency and high bandwidth internal cloud backbone.   
    4. NOTE:  Thread count changes to Golden Copy should be done by Support.   The obvious increase the threads per VM approach is not that simple and has a cross over point where more threads slows down throughput without carefully understanding memory CPU and threaded applications.   Customers should not change any values without support in most cases the bottle neck is somewhere else in the end to end path from source storage to target S3 bucket.  Support has tools to test the end to end performance to uncover bottlenecks.   
      1. Short answer is open a case.
  3. Throughput requires large files at 64 MB or larger to take advantage of multipart uploads (default 10 threads per file) to copy portions of the file in parallel and S3 will combine the parts into an object at the target.  This offers the best throughput option but your data needs to be media or large file size types.
  4. Compression - Golden Copy does not offer compression because the time to compress must out weigh the time to simply send the file.  In general compression will not increase throughput and it will slow down throughput regardless of file size. Storage costs compression?  Cloud tiers offer options to make the effort of compressing data a wasteful time consuming process that slows down copies and based on modern file types example office documents, pdf, are already compressed by the application.  
  5. Encryption -  Inflight encryption is offered by Golden Copy end to end with TLS 1.3 ,  at Rest encryption is offered by all Cloud providers.  There is no value in applying encryption during the copy process due to CPU requirements to maintain throughput.  Cloud providers offer this seamlessly with managed keys option for at rest encryption which is default option from all Cloud Providers.


How to scale out and scale up backup performance

  1. Deploy Virtual Accelerator nodes to increase copy performance.  The configuration is 1 VM or 6 VM's or up to 99 VM's depending on requirements.  The main Golden Copy VM is the control VM and the remaining 5 VM's are only used for copying.    All VM's must have an NFS mount to the source cluster.
    1. See the install guide to deploy VAN nodes here.
  2. Parallel threads can be increased to use more threads on multi part uploads.  The default is 10 threads. Memory must be upgraded before increasing threads, contact support.  This is specific to AWS or generic S3 targets (type other).  It does not apply to Azure or Google Cloud targets.
    1. This can be changed following steps below.
    2. ssh to node 1 as ecaadmin
    3. nano /opt/superna/eca/eca-env-common.conf
    4. Add this tag to increase the threads for file systems that are mostly large files over 64MB. 
    5. export ARCHIVE_PARALLEL_THREAD_SDK=32
    6. export ARCHIVE_PARALLEL_THREAD_COUNT=100
    7. export AWS_HTTP_CONN_POOL_MAX_SIZE=200
    8. Only set the below variable if your Golden Copy cluster is 12 nodes or greater.  Set this to VM count * 4 * 2 example on an 18 node cluster - 18 * 4 * 2 =
      1. export UNDERTOW_WORKER_THREADS=xx
    9. control+x , answer yes to save
    10. ecactl cluster down 
    11. ecactl cluster up
  3. On ANY folder definition that has never been copied before,  disable check before upload to increase the throughput of the copy performance.  This flag can be re-enabled at a later date once the main full copy is completed and enabling incremental mode on the folder.
    1. searchctl archivedfolders list  (get the folder id)
    2. searchctl archivedfolders modify --id xxxxx --skipS3FileExists=true
  4. done. 


How to increase recall (restore) performance

  1. Follow these steps to configure high performance restore
  2. NOTE: Requires 1.1.17 release or later and used with AWS s3 services.
  3. login to Golden copy node 1 as ecaadmin
  4. nano /opt/superna/eca/eca-env-common.conf
  5. Add these variables to increase restore performance.  skipping if files exist on the file system is useful to retry a recall job and avoid recalling data that was already recalled.  Defaulted to enabled and these settings disable it and enables more parallel threads for multipart downloads, and enables 200 files per GC vm along with AWS SDK transfer manager 2.0.  This also ensures that a large Golden Copy cluster has 20M files in the backlog for the worker nodes.
    1. export RECALL_PARALLEL_THREAD_SDK=32
    2. export ARCHIVE_PARALLEL_THREAD_COUNT=200 
    3. export AWS_HTTP_CONN_POOL_MAX_SIZE=200
    4. export INDEX_WORKER_RECALL_SKIP_EXISTENT_FILES=false
    5. export TOPIC_PARTITION_LAG_THRESHOLD=20000000
  6. control + x (to save and exit)
  7. ecactl cluster down
  8. ecactl cluster up
  9. Done



 

How to Increase VM memory and Parallel copy threads on a Single Golden Copy VM OR for Large File - File Systems

Single Golden Copy VM

  1. NOTE: This is lab only system configuration,  should not be used for production use unless throughput is not a concern.
  2. Increase Memory on the Golden Copy VM to 32 GB and 12 vCPU's
  3. Before power on, modify RAM and CPU to match above settings
    1. NOTE: > 4 folder definitions requires additional disk space to store file copy history for each folder. Additional 110 GB for 10 folders added
    2. disk latency read and write latency < 20 ms (test with command iostat -xyz -d 3)
  4. Modify the following file to expand the parallel file copies per VM
  5. nano /opt/superna/eca/eca-env-common.conf
  6. Add a line
    1. export ARCHIVE_PARALLEL_THREAD_COUNT=400
  7. control+x to save and exit
  8. Change memory configuration (note the the spacing must be Exactly as shown below, 2 spaces for the service name and parameters should be 4 spaces)
  9. nano /opt/superna/eca/docker-compose.overrides
version: '2.4'
services:
  indexworker:
    mem_limit: 8GB
    mem_reservation: 8GB
    memswap_limit: 8GB

  archiveworker:
    mem_limit: 14GB
    mem_reservation: 14GB
    memswap_limit: 14GB

  kafka:
    mem_limit: 4GB
    mem_reservation: 4GB
    memswap_limit: 4GB

Large File File System Optimization for Performance

  1. Increase Memory on ALL Golden Copy VM to 24 GB of RAM
  2. Support Requirement:
    1. This is required due to the amount of data that is copied in parallel needs to allow for multipart uploads or downloads where the majority of the data is over 64 MB.
  3. Before power on, modify RAM and CPU to match the above settings
    1. NOTE: disk latency read and write latency < 20 ms (test with command iostat -xyz -d 3)
  4. Change Large file monitor timer to 8 hours to allow large files that take longer to download.
    1. nano /opt/superna/eca/eca-env-common.conf
    2. add this variable
    3. export ECA_KAFKA_LAG_CHECK_MINUTES=480
    4. control+x
    5. save and exit
    6. This change requires a cluster restart.
  5. Change memory configuration (note the the spacing must be Exactly as shown below, 2 spaces for the service name and parameters should be 4 spaces)
    1. nano /opt/superna/eca/docker-compose.overrides
    2. Update the file to contain the sections below
  6. Save the file with Control+x
version: '2.4'
services:
  archiveworker:
    mem_limit: 14GB
    mem_reservation: 14GB
    memswap_limit: 14GB

How to create a Golden Copy Cluster > 6 nodes for high throughput requirements

  1. Boot all the Golden Copy OVA's and allow first boot to complete.  Any number of VM's can be configured up to 99 nodes​
  2. Login to node 1 and update the eca-env-common.conf
    1. nano /opt/superna/eca/eca-env-common.conf
    2. Set the value x to 1 minus 2 times the number of Golden copy nodes, example if 12 nodes enter 22 (12-1 x 2=22)
    3. export ECA_GOLDENCOPY_ARCHIVEWORKER_PARTITIONS=x
    4. export ECA_INDEXWORKER_PARTITIONS=x
    5. add the node entries and ip addresses of each of the Golden Copy VM's
    6. export ECA_LOCATION_NODE_1=x.x.x.x
    7. export ECA_LOCATION_NODE_N=x.x.x.x   (enter an entry for all node in the cluster and increment N value)
    8. control+x to save and exit
  3. Bring up the cluster 
    1. ecactl components configure-nodes    (sets up keyless ssh on all nodes)
    2. ecactl cluster up
  4. Expand an existing cluster (impacts current copy jobs)
  5. Login to kafkahq
    1. https://x.x.x.x/kafkahq  (enter ecaadmin and default password 3y3gl4ss)
    2. click on topics
    3. Click trashcan icon for all topics (NOTE: This will delete all back log files queued for copying and new full job will be required on the expanded cluster)
    4. ecactl cluster down
    5. Boot additional Golden Copy OVA's to get the final cluster count
      1. Follow the steps from step #2 above
      2. Restart a full archive job after the cluster has been restarted.


How to Increase Incremental job Performance

  1. If the change rate on the cluster is high  and produces over 3M files per daily incremental, we recommend switch to fast incremental mode.  This mode skips a 2nd api needed to collect metadata of the file and switches to only use the metadata included in the change list.
    1. Login to node 1 and add the variable below.
    2. nano /opt/superna/eca/eca-env-common.conf
    3. export ARCHIVE_SMART_INCREMENTAL=true
  2. Changing the value to true will enable this new mode. This will need to be added to common conf file and cluster down and up to take effect. This enables a fast incremental feature to skip collecting metadata that requires additional api calls to collect
    1. NOTE: Metadata that will not be supported owner, group , mode bits.
    2. NOTE: Folder ACL's are still collected and takes an extra API call for each folder.
    3. NOTE: The file data stamps are available in the change list and are included in the metadata.



© Superna Inc