Diving in to Isilon SyncIQ and SnapshotIQ Management

In this post I’m going to review the most useful commands for managing SyncIQ replication jobs and SnapshotIQ snapshots on the Isilon.  While this will primarily be a CLI administration reference, I’ll look at some WebUI options as well when I get to Snapshots, as well as some additional notes and caveats regarding snapshot management.  I’d highly recommend reviewing EMC’s SnapshotIQ best practices page, as well as the SyncIQ best practices guide if you’re just starting a new implementation.  For a complete Isilon Command line reference you can reference this post.

Creating a Replication policy

# isi sync policies create sync –schedule “” –target-snapshot-archive on –target-snapshot-pattern “%{PolicyName}-%{SrcCluster}-%Y-%m-%d_%H-%M”

Viewing active replication jobs

# isi sync jobs list

Policy Name ID State Action Duration
 Replica1 32375 running run 1M1W5D14H55m
 Total: 1

# isi sync jobs view

Policy Name: Replica1
 ID: 32375
 State: running
 Action: run
 Duration: 1M1W5D14H55m9s
 Start Time: 2017-10-27T17:00:25

# isi_classic sync job rep

Name | Act | St | Duration | Transfer | Throughput
 Replica1 | sync | Running | 42 days 14:59:23 | 3.0 TB | 6.8 Mb/s

# isi_classic sync job rep –v [Provides a more verbose report]

Creating a SyncIQ domain [Required for failback operations]

# isi job jobs start –root –dm-type SyncIQ

Reviewing a replication Job before starting it

Replication policy status can be reviewed with the ‘test’ option. It is useful for previewing the size of the data set that will be transferred if you run the policy.

# isi sync jobs start –test
# isi sync reports view 1

Replication policy Enable/Disable/Delete

# isi sync policies enable # isi sync policies disable # isi sync policies delete

Replication Job Management

# isi sync jobs start # isi sync jobs pause # isi sync jobs resume # isi sync jobs cancel

Replication Policy Management

# isi sync policies list
# isi sync policies view

Viewing replication policies that target the local cluster

# isi sync target list
# isi sync target view

Managing replication performance rules

# isi sync rules create

Create network traffic rules that limit replication bandwidth

# isi sync rules create bandwidth 00:00-23:59 Sun-Sat 19200 [Limit consumption to 19200 kbps per second, 24×7]
# isi sync rules create file_count 08:00-18:00 M-F 7 [Limit file-send rate to 7 files per second 8-6 on weekdays]

Managing replication performance rules

# isi sync rules list
# isi sync rules view –id bw-0
# isi sync rules modify bw-0 –enabled true
# isi sync rules modify bw-0 –enabled false

Managing replication reports

# isi sync reports list
# isi snapshots list | head -200 [list the first 200 snapshots]
# isi sync reports view 2
# isi sync reports subreports list 1 [view sub-reports]

Managing failed replication jobs

# isi sync policies resolve [Resolve a policy error]
# isi sync policies reset If the issue can’t be resolved, the job can be reset. Resetting a policy results in a full or differential replication the next time the policy is run.

Creating Snapshots

# isi snapshot snapshots create

# isi snapshot snapshots delete {|–schedule |–type {alias|real}|–all}
[{–force|-f}] [{–verbose|-v}]

Modifing Snapshots

# isi snapshot snapshots modify

Listing Snapshots

# isi snapshot snapshots list –state {all | active | deleting}
# isi snapshot snapshots list –limit | -l [Number of snapshots to display]
# isi snapshot snapshots list –descending | -d [Sort data in descending order]

Viewing Snapshots

# isi snapshot snapshots view

Deleting Snapshots

Deleting a snapshot from OneFS is an all-or-nothing event, an existing snapshot cannot be partially deleted. Snapshots are created at the directory level, not at the volume level, which allows for a higher degree of granularity. Because they are a point in time copy of a specific subset of OneFS data they can’t be changed, only fully deleted. When deleting a snapshot OneFS immediately modifies some of the tracking data and the snapshot dissappears from view. Despite the fact that the snap is no longer visible, the behind the scenes cleanup of the snapshot will still be pending. It is performed in the ‘SnapshotDelete’ job.

OneFS frees disk space occupied by deleted snapshots only when the snapshot delete job is run. If a snapshot is deleted that contains clones or cloned files, the data in a shadow store may no longer be referenced by files on the cluster. OneFS deletes unreferenced data in a shadow store when the shadow store delete job is run. OneFS automatically runs both the shadow store delete and snapshot delete jobs, but you can also run them manually any time. Follow the procedure below to force the snapshot delete job to more quickly reclaim array capacity.

Deleting Snapshots from the WebUI

Go to Data Protection > SnapshotIQ > Snapshots and specify the snapshots that you want to delete.

• For each snapshot you want to delete, in the Saved File System Snapshots table, in the row of a snapshot, select the check box.
• From the Select an action list, select Delete.
• In the confirmation dialog box, click Delete.
• Note that you can select more than one snapshot at a time, and clicking the delete button on any of the snapshots will result in the entire checked list being deleted.
• If you have a large number of snapshots and want to delete them all, you can run a command from the CLI that will delete all of them at once: isi snapshot snapshots delete –all.

Increasing the Speed of Snapshot Deletion from the WebUI

It’s important to note that the SnapshotDelete will only run if the cluster is in a fully available state. There can be no drives or nodes down and it cannot be in a degraded state. To increase the speed at which deleted snapshot data is freed on the cluster, run the snapshot delete job.

• Go to Cluster Management > Operations.
• In the Running Jobs area, click Start Job.
• From the Job list, select SnapshotDelete.
• Click Start.

Increasing the Speed of Cloned File deletion from the WebUI

Run the shadow store delete job only after you run the snapshot delete job.

• Go to Cluster Management > Operations.
• In the Running Jobs area, click Start Job.
• From the Job list, select ShadowStoreDelete.
• Click Start.

Reserved Space

There is no requirement for reserved space for snapshots in OneFS. Snapshots can use as much or little of the available file system space as desirable. The oldest snapshot can be deleted very quickly. An ordered deletion is the deletion of the oldest snapshot of a directory, and is a recommended best practice for snapshot management. An unordered deletion is the removal of a snapshot that is not the oldest in a directory, and can often take approximately twice as long to complete and consume more cluster resources than ordered deletions.

The Delete Sequence Matters

As I just mentioned, avoid deleting snapshots from the middle of a time range whenever possible. Newer snapshots are mostly pointers to older snapshots, and they look like they are consuming more capacity than they actually are. Removing the newer snapshots will not free up much space, while deleting the oldest snapshot will ensure you are actually freeing up the space. You can determine snapshot order by using the isi snapshot list -l command.

Watch for SyncIQ Snaps

Avoid deleting SyncIQ snapshots if possible. They are easily identifiable, as they will all be prefixed with SIQ. It is ok to delete them if they are the only remaining snapshots on the cluster, and the only way to free up space is to delete them. Be aware that deleting SyncIQ snapshots resets the SyncIQ policy state, which requires a reset of the policy and may result in either a full sync or initial differential sync. A full sync or initial diff sync could take many times longer than a regular snapshot-based incremental sync.


Using the InsightIQ iiq_data_export Utility

InsightIQ includes a very useful data export tool:  iiq_data_export. It can be used with any version of OneFS beginning with 7.x.  While the tool is compatible with older versions of the operating system, if you’re running OneFS v8.0 or higher it offers a much needed performance improvement.  The improvements allow this to be a much more functional tool that can be used daily, and for quick reports it’s much faster than relying on the web interface.

Applications of this tool could include daily reports for application teams to monitor their data consumption, charge-back reporting processes,  or administrative trending reports. The output is in csv format, so there are plenty of options for data manipulation and reporting in your favorite spreadsheet application.

The utility is a command line tool, so you will need to log in to the CLI with an ssh session to the Linux InsightIQ server.  I generally use putty for that purpose.  The utility works with either root or non-root users, so you won’t need elevated privileges – I log in with the standard administrator user account. The utility can be used to export both performance stats and file system analytics [fsa] data, but I’ll review some uses of iiq_data_export for file system analytics first, more specifically the directory data-module export option.

The default command line option for file system analytics include list, describe, and export:

iiq_data_export fsa [-h] {list,describe,export} ...

 -h, --help Show this help message and exit.

 FSA Sub-Commands
 list List valid arguments for the different options.
 describe Describes the specified option.
 export Export FSA data to a specified .csv file.

Listing FSA results for a specific Cluster

First we’ll need to review the reports that are available on the server. Below is the command to list the available FSA results for the cluster:

iiq_data_export fsa list --reports IsilonCluster1

Here are the results of running that command on my InsightIQ Server:

[administrator@corporate_iq1 ~]$ iiq_data_export fsa list --reports IsilonCluster1

Available Reports for: IsilonCluster1 Time Zone: PST
 | ID    | FSA Job Start         | FSA Job End           | Size     |
 | 57430 | Jan 01 2018, 10:01 PM | Jan 01 2018, 10:03 PM | 115.49M  |
 | 57435 | Jan 02 2018, 10:01 PM | Jan 02 2018, 10:03 PM | 115.53M  |
 | 57440 | Jan 03 2018, 10:01 PM | Jan 03 2018, 10:03 PM | 114.99M  |
 | 57445 | Jan 04 2018, 10:01 PM | Jan 04 2018, 10:03 PM | 116.38M  |
 | 57450 | Jan 05 2018, 10:00 PM | Jan 05 2018, 10:03 PM | 115.74M  |
 | 57456 | Jan 06 2018, 10:00 PM | Jan 06 2018, 10:03 PM | 114.98M  |
 | 57462 | Jan 07 2018, 10:01 PM | Jan 07 2018, 10:03 PM | 113.34M  |
 | 57467 | Jan 08 2018, 10:00 PM | Jan 08 2018, 10:03 PM | 114.81M  |

The ID column is the job number that is associated with that particular FS Analyze job engine job.  We’ll use that ID number when we run the iiq_data_export to extract the capacity information.

Using iiq_data_export

Below is the command to export the first-level directories under /ifs from a specified cluster for a specific FSA job:

iiq_data_export fsa export -c <cluster_name> --data-module directories -o <jobID>

If I want to view the /ifs subdirectores from job 57467, here’s the command syntax and it’s output:

[administrator@corporate_iq1 ~]$ iiq_data_export fsa export -c IsilonCluster1 --data-module directories -o 57467

Successfully exported data to: directories_IsilonCluster1_57467_1515522398.csv

Below is the resulting file. The output shows the directory count, file counts, logical, and capacity consumption.

[administrator@corporate_iq1 ~]$ cat directories_IsilonCluster1_57467_1515522398.csv

path[directory:/ifs/],dir_cnt (count),file_cnt (count),ads_cnt,other_cnt (count),log_size_sum (bytes),phys_size_sum (bytes),log_size_sum_overflow,report_date: 1515470445

While that is a useful top level report, we may want to dive a bit deeper and report on 2nd or 3rd level directories as well. To gather that info, use the directory filter option, which is “-r”:

iiq_data_export fsa export -c <cluster_name> --data-module directories -o <jobID> -r directory:<directory_path_in_ifs>

As an example, if we wanted more detail on the subfolders under the /NFS_exports/warehouse/ directory, we’d run the following command:

[administrator@corporate_iq1 ~]$ iiq_data_export fsa export -c IsilonCluster1 --data-module directories -o 57467 -r directory:/NFS_exports/warehouse/warehouse_dec2017

Successfully exported data to: directories_IsilonCluster1_57467_1515524307.csv

Below is the output from the csv file that I generated:

[administrator@corporate_iq1 ~]$ cat directories_IsilonCluster1_57467_1515524307.csv

path[directory:/ifs/NFS_exports/warehouse/warehouse_dec2017/],dir_cnt (count),file_cnt (count),ads_cnt,other_cnt (count),log_size_sum (bytes),phys_size_sum (bytes),log_size_sum_overflow,report_date: 1515470445

Diving Deeper into subdirectories

Note that how deep you can go down the /ifs subdirectory tree depends on the FSA configuration in InsightIQ. By default InsightIQ will configure the “directory filter maximum depth” option to 5, allowing directory information as low as
/ifs/dir1/dir2/dir3/dir4/dir5. If you need to dive deeper the FSA config will need to be updated. To do so, go to the Configuration Page, FSA Configuration, then the “Directory Filter path_squash) maximum depth setting. Note that the larger the maximum depth the more storage space an individual FSA result will use.

Scripting Reports

For specific subdirectory reports it’s fairly easy to script the output.

First, let’s create a text file with a list of the subdirectories under /ifs that we want to report on. I’ll create a file named “directories.txt” in the /home/administrator folder on the InsightIQ server. You can use vi to create and save the file.

[administrator@corporate_iq1 ~]$ vi directories.txt

[add the following in the vi editor...]


I’ll then use vi again to create the script itself.   You will need to substitute the cluster name and the job ID to match your environment.

[administrator@corporate_iq1 ~]$ vi direxport.sh

[add the following in the vi editor...]

for i in `cat directories.txt`
 echo "Processing Directory $i..."
 j=`basename $i`;
 echo "Base Folder Name is $j"
 date_time="`date +%Y_%m_%d_%H%M%S_`";
 iiq_data_export fsa export -c IsilonCluster1 --data-module directories -o 57467 -r directory:$i -n direxport_$date_time$j.csv

We can now change the permissions and set the file to executable, then run the script.  An output example is below.

[administrator@corporate_iq1 ~]$ chmod 777 direxport.sh
 [administrator@corporate_iq1 ~]$ chmod +X direxport.sh
 [administrator@corporate_iq1 ~]$ ./direxport.sh

Processing NFS_exports/warehouse/warehouse_dec2017/dir_t01...
 Base Folder Name is dir_t01

Successfully exported data to: direxport_2017_01_19_085528_dir_t01.csv

Processing NFS_exports/warehouse/warehouse_dec2017/dir_cat...
 Base Folder Name is dir_cat

Successfully exported data to: direxport_2017_01_19_0855430_dir_cat.csv

Processing NFS_exports/warehouse/warehouse_dec2017/dir_set...
 Base Folder Name is dir_set

Successfully exported data to: direxport_2017_01_19_085532_dir_set.csv

Performance Reporting

As I mentioned at the beginning of this post, this command can also provide performance related information. Below are the default command line options.

usage: iiq_data_export perf list [-h] [--breakouts] [--clusters] [--data-modules]

 -h, --help Show this help message and exit.

Mutually Exclusive Options:
 --breakouts Displays the names of all breakouts that InsightIQ supports for
 performance data modules. Each data module supports a subset of
 --clusters Displays the names of all clusters that InsightIQ is monitoring.
 --data-modules Displays the names of all available performance data modules.
 iiq_data_export perf list: error: One of the mutually exclusive arguments are

Here are the data modules you can export:

 iiq_data_export perf list --data-modules
 | Data Module Label                       | Key 
 | Active Clients                          | client_active 
 | Average Cached Data Age                 | cache_oldest_page_age 
 | Average Disk Hardware Latency           | disk_adv_access_latency 
 | Average Disk Operation Size             | disk_adv_op_size 
 | Average Pending Disk Operations Count   | disk_adv_io_queue 
 | Blocking File System Events Rate        | ifs_blocked
 | CPU % Use                               | cpu_use 
 | CPU Usage Rate                          | cpu_usage_rate 
 | Cache Hits                              | cache_hits 
 | Cluster Capacity                        | ifs_cluster_capacity 
 | Connected Clients                       | client_connected 
 | Contended File System Events Rate       | ifs_contended 
 | Deadlocked File System Events Rate      | ifs_deadlocked 
 | Deduplication Summary (Logical)         | dedupe_logical 
 | Deduplication Summary (Physical)        | dedupe_physical 
 | Disk Activity                           | disk_adv_busy 
 | Disk IOPS                               | disk_iops 
 | Disk Operations Rate                    | disk_adv_op_rate 
 | Disk Throughput Rate                    | disk_adv_bytes 
 | External Network Errors                 | ext_error 
 | External Network Packets Rate           | ext_packet 
 | External Network Throughput Rate        | ext_net_bytes 
 | File System Events Rate                 | ifs_heat 
 | File System Throughput Rate             | ifs_total_rate 
 | Job Workers                             | worker 
 | Jobs                                    | job 
 | L1 Cache Throughput Rate                | cache_l1_read 
 | L1 and L2 Cache Prefetch Throughput Rate| cache_all_prefetch 
 | L2 Cache Throughput Rate                | cache_l2_read 
 | L3 Cache Throughput Rate                | cache_l3_read 
 | Locked File System Events Rate          | ifs_lock 
 | Overall Cache Hit Rate                  | cache_all_read_hitrate 
 | Overall Cache Throughput Rate           | cache_all_read 
 | Pending Disk Operations Latency         | disk_adv_io_latency
 | Protocol Operations Average Latency     | proto_latency 
 | Protocol Operations Rate                | proto_op_rate 
 | Slow Disk Access Rate                   | disk_adv_access_slow 

As an example, if I want to review the CPU utilization for the cluster, I’d type in the command below.   It will show all of the CPU performance information for the specified cluster name.  Once I’ve had more time to dive in to the performance reporting aspect of InsightIQ I’ll revisit and add to this post.

[administrator@corporate_iq1~]$ iiq_data_export perf export -c IsilonCluster1 -d cpu_use

Successfully exported data to: cpu_IsilonCluster1_1515527709.csv

Below is what the output looks like:

[administrator@corporate_iq1 ~]$ cat cpu_STL-Isi0091_1515527709.csv
 Time (Unix) (America/Chicago),cpu (percent)