Index Update Intervals, Index Granularity, Index Ignored words, Indexing feature options
- Indexing Features
- Index Update Interval
- Committing Indexed Documents to Disk Interval
- Index Search Results Interval
- Unsupported Characters in Folder names , Files and content
Indexing Features
- Multiple clusters are supported by a single Search & Recover cluster with Per Path ingestion capabilities.
- Alternate Data Streams
- Only the main data stream will be indexed, any alternate data streams will not be indexed with content
- Per Path Ingestion Capabilities:
- File system metadata only (default mode).
- Full content + file system metadata.
- Snapshot Aware mode ingests files in snapshots for historical searching and file version support in the search results:
- Snapshot Aware mode will index files in a snapshot. This mode will incrementally index each new snapshot discovered and retain the history of files within the snapshot.
- Include or exclude Override Allows paths, plus wild card or file types, to be skipped or added to the index.
Example 1: A path configured for metadata only can add an include option for a specific path, or file type to be full content indexed
Example 2: A path configured for full content can add an exclude option of a path or file types to skip processing time on low value data. This will improve performance and reduce the index size.
Index Update Interval
As full and incremental indexing jobs are running and indexing documents, it is important to understand when documents are committed to disk and when indexed content is available for searching.
Committing Indexed Documents to Disk Interval
As documents are being indexed, results are logged in a transaction log. A hard commit is completed that applies all transaction log entries to the master index when either of the following criteria have been met:
- 20 000 documents have been submitted to the index - OR
- 5 minutes has passed since the last hard commit to the index
Whichever threshold is hit first triggers a disk write to the master index from the transaction logs.
NOTE: These background processes do not mean the documents will appear in search results. This process ensures data integrity of the index and ensures on cluster crash that the boot process does not require a lengthy log replay of uncommitted documents to the index.
Index Search Results Interval
Indexed content becomes available for searching as described below:
- A background process updates the search results cache allowing newly added documents to appear in search results. This operation is an expensive cluster wide process and will be updated based on either of these criteria:
- If 10 Million documents are added to the index, the result cache will be updated to include these new documents in results - OR
- If 30 minutes have passed since document was indexed
Whichever threshold is hit first will trigger new documents to be available in the index for searching.
Unsupported Characters in Folder names , Files and content
The following characters are currently unsupported if they exist in a file name or directory. If a directory has a special character listed below all child subfolders will not have any files indexed.
- { } < > ' / [ ] ( ) : " \ # $