Administration Guides

Index Update Intervals, Index Granularity, Index Ignored words, Indexing feature options

Home


    Indexing Features

    1. Multiple clusters are supported by a single Search & Recover cluster with Per Path ingestion capabilities.
    2. Alternate Data Streams
      1. Only the main data stream will be indexed, any alternate data streams will not be indexed with content
    3. Per Path Ingestion Capabilities:
      • File system metadata only (default mode).
      • Full content + file system metadata.
    4. Snapshot Aware mode  ingests files in snapshots for historical searching and file version support in the search results: 
      • Snapshot Aware mode will index files in a snapshot. This mode will incrementally index each new snapshot discovered and retain the history of files within the snapshot.   
    5. Include or exclude Override Allows paths, plus wild card or file types, to be skipped or added to the index.

    Example 1: A path configured for metadata only can add an include option for a specific path, or file type to be full content indexed

    Example 2: A path configured for full content can add an exclude option of a path or file types to skip processing time on low value data. This will improve performance and reduce the index size.

    Index Update Interval 

    As full and incremental indexing jobs are running and indexing documents, it is important to understand when documents are committed to disk and when indexed content is available for searching.


    Committing Indexed Documents to Disk Interval

    As documents are being indexed, results are logged in a transaction log. A hard commit is completed that applies all transaction log entries to the master index when either of the following criteria have been met:

    • 20 000 documents have been submitted to the index - OR
    • 5 minutes has passed since the last hard commit to the index

    Whichever threshold is hit first triggers a disk write to the master index from the transaction logs.

    NOTE: These background processes do not mean the documents will appear in search results. This process ensures data integrity of the index and ensures on cluster crash that the boot process does not require a lengthy log replay of uncommitted documents to the index.

    Index Search Results Interval

    Indexed content becomes available for searching as described below:

    1. A background process updates the search results cache allowing newly added documents to appear in search results.  This operation is an expensive cluster wide process and will be updated based on either of these criteria:
    • If 10 Million documents are added to the index, the result cache will be updated to include these new documents in results - OR
    • If 30 minutes have passed since document was indexed

    Whichever threshold is hit first will trigger new documents to be available in the index for searching.

    Index Ignored Words

    When documents are indexed stop words are skipped, these are words that fill the index space but offer low value in searching.  For example in the English language stop words include: the, their, a, an etc..    Eyeglass Search has stop words configured for English, German, Spanish, Portuguese and French as defined in the stop words file below.

    stop words


    Unsupported Characters in Folder names , Files and content

    The following characters are currently unsupported if they exist in a file name or directory.  If a directory has a special character listed below all child subfolders will not have any files indexed.  

    1. { } < > ' / [ ] ( ) : " \ # $
    2. Content special characters that cannot be used in search syntax: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ / # 


    © Superna LLC