Performance Auditor
Performance Auditor Product Specification
Use of this document
This document is the functional specification definition of the product's functionality, including what the product can do, what it cannot do, operating instructions and functional use cases describing how it works.
Overview
The Performance Auditor product collects audit data and performance real time analysis and produces summary metrics that are in a relational format. This relational format allows viewing performance data from any metric and seeing the impact of the other metrics because the of the relationships stored with the performance data. The metrics that are related are the cluster nodes, users (AD or NFS), files, source subnets and applications.
This performance data can be viewed in real-time and navigate the relationships between the metrics. The performance data is also stored in a circular log to allow replay, pause, fast forward and rewind historical performance data to troubleshoot issues that happened in the past.
The application also provides a view of application protocol requests over SMB, NFS or HDFS protocols. This provides insight into how an application is making requests for data, either a read or write request. This information can be used to determine the relative performance of the application based on how it’s asking for data from the storage.
Terms
Eyeglass GUI - User monitoring, alarms and user interface to interact with product configuration and threats raised.
ECA VM’s - These are VM that process audit data and determine threats to production data.
Functional Specification Description
Dependencies
Cluster REST API
SSH access to cluster CLI
NTP
Cluster can reach AD
Audit data can be processed
License is assigned correctly to the monitored cluster
Event types are set according to documentation for expected product functionality
NFS mount to monitored clusters
DNS
- Only available in vmware OVA, Hyper-v
Installation
- This product is not customer installable and must be installed and configured by professional services for correct deployment and validation of the software integration with the cluster.
Functional Description
Process audit data
Split audit data into 5 different metrics (nodes, users, hosts, subnets, applications, files)
Make relationships between the metics
Identify the top 5 resource consumers in each metric category (find all consumers and sort from highest to lowest, and store top 5 in each category)
Break down all metrics into read, writes in MB’s and application requests (reads and writes)
Perform the above functions over 1 second duration and build a time series record of performance metrics with all relationships between them maintained.
The historical records should automatically purge after a period of time to avoid fill the disk with old historical data
Publish a subset of the performance records into a longer term time series database to be available for graphing with Graphana to trends and graphs of the time series performance data
The product can also target data collection of nodes, users, files, subnets that are not in the top 5, this triggers audit data collection and analysis to be included in the time series data. This is only available with a single endpoint due to the processing demands for the analysis. This endpoint metric can be changed to another endpoint at any time. Data collected will be available in the historical data.
Multiple clusters can be monitored but viewing of a single cluster at time is the only option in the UI.
Switching between clusters is possible in the UI
The UI displays a connection to the stats stream and will show connected or not connected if a network issue is blocking a connection to the backend stats
The UI displays the freshness of the data by showing the timestamp of the data that is being viewed.
Settings allow sorting the UI from high to low or low to high
The lag is also shown in seconds, indicating how far behind the event time the dashboard represents. For example, the processing of audit data could be behind for several minutes and you this information is important to understand to interpret the performance data.
The UI allows display of rates using KB, MB, GB for throughput and can be switched if the metric value is easier to understand with a different rate in the display.
Switching from metrics that are rate based or application protocol requests is possible in the settings.
Applications are defined by file extensions and all performance data is collected and summarized by the application extension when stored.
The UI allows selecting a Main metric “View” The Views align to the 5 key metrics nodes, files, users, subnets and applications
Once selecting a view clicking an entry in the UI will display that selected (node, file, user, subnet, application) details to show read and write MB rate for the selected object.
If you unselect the object you are viewing cluster wide analysis of the top 5 consumers in the selected view.
The display shows 1 second updates of all 5 metric categories related to each other from the real-time analysis engine.
The analysis is using stream based analytics technology
The rewind historical view allows moving through the data forward or backward with control buttons and pause play and step forward backyard buttons.
Jumping to a date and time is also possible with the playback controls. 14 days of data is stored until overwritten.
A time series book mark can be copied and shared with someone to load the exact time period into another session at a later time or someone using a different login.
Operational Expectations for all deployments
Monitor the performance of the UI to verify performance data is being collected.
If you select historical day and time and the displayed date and time is different it indicates the data does not exist in the time series database and that it’s been purged already.
Verify all alarms daily to ensure that audit data is being collected correctly and if not take corrective action.
Ensure audit data is enabled with the required audit event types to support performance auditing product.
Ensure the cluster has been licensed in the GUI and a license has been assigned to the cluster you want to monitor
Ensure the NFS mount is in place for the cluster to be monitored
Ensure auditing is enabled on the cluster for all access zones that require performance monitoring
If network issues block reading audit data or the license was not assigned or if the NFS mount is not inplace, it will not be possible to to collect or re-ingest old audit data to be analyzed.
If the event rate or number of monitored clusters exceeds resources performance data will not be written correctly or will lag behind by minutes or hours. More resources are expected to be added to maintain real time processing.
Patching
The product does not support hotfix patching and requires and complete upgrade of the software version or build number to apply any patch
Operating system patches are not provided and must be downloaded directly from online official open suse repositories
Compatibility
The product does not support forward compatibility with target devices and will require a software upgrade to support a forward version of a target device. This includes minor or build number changes of the target device.
Appliance Modifications
Modifying the operating system packages, removing or adding packages, changing the OS configuration and support of these changes is not covered by support and customers must support OS modifications and perform necessary testing. No support for customer modifications with the exception of applying open suse OS package patches that shipped with the original appliance or published procedure in documentation.
Operational Procedures
If documentation does not list a procedure, it is explicitly unsupported unless support provides a procedure.
© Superna Inc