Using the InsightIQ iiq_data_export Utility

InsightIQ includes a very useful data export tool:  iiq_data_export. It can be used with any version of OneFS beginning with 7.x.  While the tool is compatible with older versions of the operating system, if you’re running OneFS v8.0 or higher it offers a much needed performance improvement.  The improvements allow this to be a much more functional tool that can be used daily, and for quick reports it’s much faster than relying on the web interface.

Applications of this tool could include daily reports for application teams to monitor their data consumption, charge-back reporting processes,  or administrative trending reports. The output is in csv format, so there are plenty of options for data manipulation and reporting in your favorite spreadsheet application.

The utility is a command line tool, so you will need to log in to the CLI with an ssh session to the Linux InsightIQ server.  I generally use putty for that purpose.  The utility works with either root or non-root users, so you won’t need elevated privileges – I log in with the standard administrator user account. The utility can be used to export both performance stats and file system analytics [fsa] data, but I’ll review some uses of iiq_data_export for file system analytics first, more specifically the directory data-module export option.

The default command line option for file system analytics include list, describe, and export:

iiq_data_export fsa [-h] {list,describe,export} ...

 -h, --help Show this help message and exit.

 FSA Sub-Commands
 list List valid arguments for the different options.
 describe Describes the specified option.
 export Export FSA data to a specified .csv file.

Listing FSA results for a specific Cluster

First we’ll need to review the reports that are available on the server. Below is the command to list the available FSA results for the cluster:

iiq_data_export fsa list --reports IsilonCluster1

Here are the results of running that command on my InsightIQ Server:

[administrator@corporate_iq1 ~]$ iiq_data_export fsa list --reports IsilonCluster1

Available Reports for: IsilonCluster1 Time Zone: PST
 | ID    | FSA Job Start         | FSA Job End           | Size     |
 | 57430 | Jan 01 2018, 10:01 PM | Jan 01 2018, 10:03 PM | 115.49M  |
 | 57435 | Jan 02 2018, 10:01 PM | Jan 02 2018, 10:03 PM | 115.53M  |
 | 57440 | Jan 03 2018, 10:01 PM | Jan 03 2018, 10:03 PM | 114.99M  |
 | 57445 | Jan 04 2018, 10:01 PM | Jan 04 2018, 10:03 PM | 116.38M  |
 | 57450 | Jan 05 2018, 10:00 PM | Jan 05 2018, 10:03 PM | 115.74M  |
 | 57456 | Jan 06 2018, 10:00 PM | Jan 06 2018, 10:03 PM | 114.98M  |
 | 57462 | Jan 07 2018, 10:01 PM | Jan 07 2018, 10:03 PM | 113.34M  |
 | 57467 | Jan 08 2018, 10:00 PM | Jan 08 2018, 10:03 PM | 114.81M  |

The ID column is the job number that is associated with that particular FS Analyze job engine job.  We’ll use that ID number when we run the iiq_data_export to extract the capacity information.

Using iiq_data_export

Below is the command to export the first-level directories under /ifs from a specified cluster for a specific FSA job:

iiq_data_export fsa export -c <cluster_name> --data-module directories -o <jobID>

If I want to view the /ifs subdirectores from job 57467, here’s the command syntax and it’s output:

[administrator@corporate_iq1 ~]$ iiq_data_export fsa export -c IsilonCluster1 --data-module directories -o 57467

Successfully exported data to: directories_IsilonCluster1_57467_1515522398.csv

Below is the resulting file. The output shows the directory count, file counts, logical, and capacity consumption.

[administrator@corporate_iq1 ~]$ cat directories_IsilonCluster1_57467_1515522398.csv

path[directory:/ifs/],dir_cnt (count),file_cnt (count),ads_cnt,other_cnt (count),log_size_sum (bytes),phys_size_sum (bytes),log_size_sum_overflow,report_date: 1515470445

While that is a useful top level report, we may want to dive a bit deeper and report on 2nd or 3rd level directories as well. To gather that info, use the directory filter option, which is “-r”:

iiq_data_export fsa export -c <cluster_name> --data-module directories -o <jobID> -r directory:<directory_path_in_ifs>

As an example, if we wanted more detail on the subfolders under the /NFS_exports/warehouse/ directory, we’d run the following command:

[administrator@corporate_iq1 ~]$ iiq_data_export fsa export -c IsilonCluster1 --data-module directories -o 57467 -r directory:/NFS_exports/warehouse/warehouse_dec2017

Successfully exported data to: directories_IsilonCluster1_57467_1515524307.csv

Below is the output from the csv file that I generated:

[administrator@corporate_iq1 ~]$ cat directories_IsilonCluster1_57467_1515524307.csv

path[directory:/ifs/NFS_exports/warehouse/warehouse_dec2017/],dir_cnt (count),file_cnt (count),ads_cnt,other_cnt (count),log_size_sum (bytes),phys_size_sum (bytes),log_size_sum_overflow,report_date: 1515470445

Diving Deeper into subdirectories

Note that how deep you can go down the /ifs subdirectory tree depends on the FSA configuration in InsightIQ. By default InsightIQ will configure the “directory filter maximum depth” option to 5, allowing directory information as low as
/ifs/dir1/dir2/dir3/dir4/dir5. If you need to dive deeper the FSA config will need to be updated. To do so, go to the Configuration Page, FSA Configuration, then the “Directory Filter path_squash) maximum depth setting. Note that the larger the maximum depth the more storage space an individual FSA result will use.

Scripting Reports

For specific subdirectory reports it’s fairly easy to script the output.

First, let’s create a text file with a list of the subdirectories under /ifs that we want to report on. I’ll create a file named “directories.txt” in the /home/administrator folder on the InsightIQ server. You can use vi to create and save the file.

[administrator@corporate_iq1 ~]$ vi directories.txt

[add the following in the vi editor...]


I’ll then use vi again to create the script itself.   You will need to substitute the cluster name and the job ID to match your environment.

[administrator@corporate_iq1 ~]$ vi

[add the following in the vi editor...]

for i in `cat directories.txt`
 echo "Processing Directory $i..."
 j=`basename $i`;
 echo "Base Folder Name is $j"
 date_time="`date +%Y_%m_%d_%H%M%S_`";
 iiq_data_export fsa export -c IsilonCluster1 --data-module directories -o 57467 -r directory:$i -n direxport_$date_time$j.csv

We can now change the permissions and set the file to executable, then run the script.  An output example is below.

[administrator@corporate_iq1 ~]$ chmod 777
 [administrator@corporate_iq1 ~]$ chmod +X
 [administrator@corporate_iq1 ~]$ ./

Processing NFS_exports/warehouse/warehouse_dec2017/dir_t01...
 Base Folder Name is dir_t01

Successfully exported data to: direxport_2017_01_19_085528_dir_t01.csv

Processing NFS_exports/warehouse/warehouse_dec2017/dir_cat...
 Base Folder Name is dir_cat

Successfully exported data to: direxport_2017_01_19_0855430_dir_cat.csv

Processing NFS_exports/warehouse/warehouse_dec2017/dir_set...
 Base Folder Name is dir_set

Successfully exported data to: direxport_2017_01_19_085532_dir_set.csv

Performance Reporting

As I mentioned at the beginning of this post, this command can also provide performance related information. Below are the default command line options.

usage: iiq_data_export perf list [-h] [--breakouts] [--clusters] [--data-modules]

 -h, --help Show this help message and exit.

Mutually Exclusive Options:
 --breakouts Displays the names of all breakouts that InsightIQ supports for
 performance data modules. Each data module supports a subset of
 --clusters Displays the names of all clusters that InsightIQ is monitoring.
 --data-modules Displays the names of all available performance data modules.
 iiq_data_export perf list: error: One of the mutually exclusive arguments are

Here are the data modules you can export:

 iiq_data_export perf list --data-modules
 | Data Module Label                       | Key 
 | Active Clients                          | client_active 
 | Average Cached Data Age                 | cache_oldest_page_age 
 | Average Disk Hardware Latency           | disk_adv_access_latency 
 | Average Disk Operation Size             | disk_adv_op_size 
 | Average Pending Disk Operations Count   | disk_adv_io_queue 
 | Blocking File System Events Rate        | ifs_blocked
 | CPU % Use                               | cpu_use 
 | CPU Usage Rate                          | cpu_usage_rate 
 | Cache Hits                              | cache_hits 
 | Cluster Capacity                        | ifs_cluster_capacity 
 | Connected Clients                       | client_connected 
 | Contended File System Events Rate       | ifs_contended 
 | Deadlocked File System Events Rate      | ifs_deadlocked 
 | Deduplication Summary (Logical)         | dedupe_logical 
 | Deduplication Summary (Physical)        | dedupe_physical 
 | Disk Activity                           | disk_adv_busy 
 | Disk IOPS                               | disk_iops 
 | Disk Operations Rate                    | disk_adv_op_rate 
 | Disk Throughput Rate                    | disk_adv_bytes 
 | External Network Errors                 | ext_error 
 | External Network Packets Rate           | ext_packet 
 | External Network Throughput Rate        | ext_net_bytes 
 | File System Events Rate                 | ifs_heat 
 | File System Throughput Rate             | ifs_total_rate 
 | Job Workers                             | worker 
 | Jobs                                    | job 
 | L1 Cache Throughput Rate                | cache_l1_read 
 | L1 and L2 Cache Prefetch Throughput Rate| cache_all_prefetch 
 | L2 Cache Throughput Rate                | cache_l2_read 
 | L3 Cache Throughput Rate                | cache_l3_read 
 | Locked File System Events Rate          | ifs_lock 
 | Overall Cache Hit Rate                  | cache_all_read_hitrate 
 | Overall Cache Throughput Rate           | cache_all_read 
 | Pending Disk Operations Latency         | disk_adv_io_latency
 | Protocol Operations Average Latency     | proto_latency 
 | Protocol Operations Rate                | proto_op_rate 
 | Slow Disk Access Rate                   | disk_adv_access_slow 

As an example, if I want to review the CPU utilization for the cluster, I’d type in the command below.   It will show all of the CPU performance information for the specified cluster name.  Once I’ve had more time to dive in to the performance reporting aspect of InsightIQ I’ll revisit and add to this post.

[administrator@corporate_iq1~]$ iiq_data_export perf export -c IsilonCluster1 -d cpu_use

Successfully exported data to: cpu_IsilonCluster1_1515527709.csv

Below is what the output looks like:

[administrator@corporate_iq1 ~]$ cat cpu_STL-Isi0091_1515527709.csv
 Time (Unix) (America/Chicago),cpu (percent)

2 thoughts on “Using the InsightIQ iiq_data_export Utility”

  1. Thanks for the handy article 🙂  I wanted to do a recursive “du”, which amazingly doesn’t seem to be implemented with the FSA report tool, so wrote a small script to call it repeatedly:

    # if you have to manage an EMC Isilon cluster and want to use the
    # File System Analytics (FSA) tool to do a "du –depth X", you'll quickly
    # realise they unbelievably haven't implemented a recursive scan.
    # This script does that.
    # Must be run on the isilon cluster machines with sufficient privileges to
    # access the FSA reports
    # known issues:
    # – probably won't like funny characters in directory names..
    # Refs:
    if [ $# -ne 4 ] ; then
    echo >&2
    echo " CLUSTER_NAME = tcenas, eumetsat [=DSNNAS]" >&2
    echo " BASE_DIR = /ifs/ERA-CLIM/Repro" >&2
    echo " MAX_DEPTH = depth to scan to, limited also by FSA resolution" >&2
    echo " OUTPUT_FILE = where to write the final outputs to" >&2
    echo >&2
    echo "e.g. $0 eumetsat /ifs/ERA-CLIM/Repro 2 results.csv" >&2
    exit 1
    # must be /ifs/MODULE/dir; verify this
    echo $BASE_DIR | grep -q '^/ifs/[^/]*/[^/]'
    if [ $? -ne 0 ] ; then
    echo "BASE_DIR must be at 3 levels deep (e.g. /ifs/MODULE/xxx) because the FSA export tool requires a different usage for the MODULE level" >&2
    echo "(actually, I've not tested this, so maybe it works – edit the script and try if you like..)" >&2
    exit 1
    # make a unique place to dump temporary files
    TMPOUT=$(mktemp -d)
    # need to get the id number of the latest File System Analytics (FSA) report
    # output looks like this
    # get the last id
    # Available Reports for: tme-sandbox Time Zone: EDT
    # |ID |FSA Job Start |FSA Job End |Size |
    # |473 |Jun 10 2016, 10:00 PM |Jun 10 2016, 10:30 PM |92.933G |
    # |492 |Jun 13 2016, 10:00 PM |Jun 13 2016, 10:32 PM |4.794G |
    # |498 |Jun 14 2016, 10:00 PM |Jun 14 2016, 10:30 PM |4.816G |
    #(space/empty line)
    iiq_data_export fsa list –reports ${CLUSTER_NAME} > ${TMPOUT}/reports.txt
    REPORT_ID=$(tail -n 3 ${TMPOUT}/reports.txt | head -n 1 | cut -f2 -d\|)
    echo "Using report id $REPORT_ID ($(tail -n 3 ${TMPOUT}/reports.txt | head -n 1))"
    # first get the base dir contents
    iiq_data_export fsa export -c ${CLUSTER_NAME} –data-module directories -o ${REPORT_ID} -r "directory:${BASE_DIR}" -n ${TMPOUT}/basedir_with_header.csv
    # extract header for later
    head -n 1 ${TMPOUT}/basedir_with_header.csv > ${TMPOUT}/header.csv
    # strip header for following work
    tail -n +2 ${TMPOUT}/basedir_with_header.csv > ${TMPOUT}/level0.csv
    # output of all these reports looks like
    #path[directory:/ifs/ERA-CLIM/Repro_Temp/mviri/],dir_cnt (count),file_cnt (count),ads_cnt,other_cnt (count),log_size_sum (bytes),phys_size_sum (bytes),log_size_sum_overflow,report_date: 1558306942
    # for each depth after the first, request reports on each item listed in the previous depth
    for depth in $(seq 1 ${MAX_DEPTH}); do
    echo Depth $depth
    # scan through previous report, request dumps on each directory listed and combine into a single report for this level
    for dir in $(cut -f1 -d, ${TMPOUT}/level$(($depth 1)).csv); do
    iiq_data_export fsa export -c ${CLUSTER_NAME} –data-module directories -o ${REPORT_ID} -r "directory:${dir}" -n ${TMPOUT}/temp_fsa_dump.csv
    tail -n +2 ${TMPOUT}/temp_fsa_dump.csv >> ${TMPOUT}/level${depth}.csv
    rm -f ${TMPOUT}/temp_fsa_dump.csv
    # final step, combine all levels, sort and add the header. Dump to stdout.
    cat ${TMPOUT}/header.csv > $OUTPUT_FILE
    sort ${TMPOUT}/level*.csv >> $OUTPUT_FILE
    # clean up
    rm -rf $TMPOUT

  2. Hello,

    I am interested in doing this data collection daily for a list of directories within /ifs, but I am not sure how I can get it to run daily when the report ID changes and it is not going up in any consistent increment. How do you account for that when doing the -o with the report ID?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.