Administration Guides

Bulk Ingest old Audit Data from PowerScale to Easy Auditor

Home


Overview

Use these instructions to re-ingest audit data from PowerScale's audit directory into Easy Auditor's index.

Limitations

  1. Not supported for ingesting days, weeks months or years of data.  Only supported for targeted specific date of an event.  Support will not assist in bulk ingestion.   Support of any use case other than targeted day is not supported under the support contract.
  2. No possible method to predict time to ingest audit data.
  3. IMPORTANT: Maximum number of files that can be added to json file to be run at any time = 20. ANY HIGHER NUMBER IS NOT SUPPORTED. Initial testing should be done with only 1 file. Use cron to run at non-busy time.
  4. NOTE: Submitting more than one json file will queue the jobs and only 1 ingestion job will execute at a time.  
  5. NOTE: bulk ingestion is a background task and processing of current audit data has higher priority.  There is no way to change this priority and no way to predict completion times since active audit data will take priority.



PowerScale Steps

  1. SSH to the PowerScale cluster you intend to re-ingest audit logs from
  2. Navigate to the directory you intend to re-ingest audit logs from. This directory is at the bottom of the below path (choose node8 as it was the most recent but yours will vary):
    cd /ifs/.ifsvar/audit/logs/node008/protocol

  3. List the contents. This will assist determining which audit data based on dates/times to re-ingest. The audit logs are listed as .gz files.
    ls -lT
     

  4. Each cluster node will have GZ files on the date you ingesting from.  All cluster nodes will need to have the GZ files ingested.  Repeat the steps above for each cluster node folder and locate the GZ files on the date and make a note of the file names for each node.

  5. Build the file below for ingestion and submit gz files in groups of 20 files at a time.   Multiple gz files can be queued for ingestion but only 1 will ingest at a time​


Eyeglass steps

  1. SSH to Eyeglass CLI as admin
  2. Navigate to
    cd /home/admin
  3. Create a file
    touch bulkingest.json
  4. Open the file in vim editor
    vim bulkingest.json
  5. Copy paste content below (if ingesting from a single node), substituting in the following for your own:
    [{
     "cluster_name": "YOUR_PowerScale_CLUSTER_NAME",
     "cluster_guid": "YOUR_PowerScale_CLUSTER_GUID",
     "node": [{
         "node_id": "node008",
         "audit_files": ["node_audit_file.gz", "node_audit_file.gz"]
       }
     ]
    }]

  6. Save the file
    :wq!
  7. If you wish to ingest from multiple nodes, please use the below code

    EXAMPLE:

    [{
     "cluster_name": "YOUR_PowerScale_CLUSTER_NAME",
     "cluster_guid": "YOUR_PowerScale_CLUSTER_GUID",
     "node": [{
         "node_id": "node008",
         "audit_files": ["node_audit_file.gz", "node_audit_file.gz"]
       },
       {
         "node_id": "node003",
         "audit_files": ["node_audit_file.gz", "node_audit_file.gz", "node_audit_file.gz", "node_audit_file.gz"]
       }
     ]
    }]



  8. Save the file
    :wq!
  9. Execute the bulkingest.json file (must be absolute path, does not matter where the file is located):

    igls rswsignals bulkLoadTAEvents --file=/home/admin/bulkingest.json

    NOTE: Depending on how large a period of time is being ingested, it can take some time to complete.

How to View Progress of Bulk Audit Data Ingestion

  1. NOTE: Each Isilon node has historical audit data and each file is 1GB in size compressed, each node may have multiple files to ingest for a given day.  The higher the audit rate the more GZ files that will need to be ingested.  This can be a slow process for multiple days of historical audit data.
  2. Start up Queue Monitor Process

    1. Login to node 1 of the ECA cluster and run this command
    2. ecactl containers up -d kafkahq
  3. View Ingestion Jobs

    1. Open the Eyeglass Jobs Icon running jobs tab to view the bulk ingestion task to verify it is running, each CLI command will start an audit job to process the files in the json configuration file
      1. RUNNING Jobs Screen
      2. You need to wait for this to finish before submitting more files. Currently processing
      3. Wait for Spark Job step must show a blue checkmark to indicate it is finished processing this job.
        1. Spinning symbol means it is in progress.

  4. View Event Ingestion progress

    1. Using a browser http://x.x.x.x:9000 ( x.x.x.x is node 1 of the ECA cluster IP address)
    2. Topics - in our case we are tracking specific bulkingestion topic
    3. Bulkingestion - The topic used to track current progress on ingestion task
    4. Lag - this can go UP/Down depending on the ingestion speed.  A value of 0 means the ingestion job is finished and no more files events are being processed for the current active job.  Check running Jobs icon to verify the active job shows finished.  Any queued Jobs will start automatically.
    5. Count - this field should always show an increase as new events are processed from all jobs. Once you add more json files to the queue this value will increase as events are ingested.  This field will always increase each time ingestion jobs are run.
    6.  
© Superna LLC