Category Archives: scripting-pure

Pure Storage data reduction report

I had a request to publish a daily report that outlines our data reduction numbers for each LUN on our production Pure storage arrays.  I wrote a script that will log in to the Pure CLI and issue the appropriate command, using ‘expect’ to do a screen grab and output the data to a csv file.  The csv file is then converted to an HTML table and published on our internal web site.

The ‘expect’ commands (saved as purevol.exp in the same directory as the bash script)

#!/usr/bin/expect -f
spawn ssh pureuser@
expect “logon as: ”
send “pureuser\r”
expect “pureuser@’s password: ”
send “password\r”
expect “pureuser@pure01> ”
send “purevol list –space\r”
expect “pureuser@pure01> ”
send “exit\r”


The bash script (saved as


# Pure Data Reduction Report Script
# 11/28/16

#Define a timestamp function
#The output looks like this: 6-29-2016/8:45:12
timestamp() {
date +”%m-%d-%Y/%H:%M:%S”

#Remove existing output file
rm /home/data/pure/purevol_532D.txt

#Run the expect script to create the output file
/usr/bin/expect -f /home/data/pure/purevol.exp > /home/data/pure/purevol_532D.txt

#Remove the first ten lines of the output file
#The first 12 lines contain login and command execution info not needed in the report
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt
sed -i ‘1d’ /home/data/pure/purevol_532D.txt

#Remove the last line of the output file
#This is because the expect script leaves a CLI prompt as the last line of the output
sed -i ‘$ d’ /home/data/pure/purevol_532D.txt

#Add date to output file, remove previous temp files
rm /home/data/pure/purevol_532D-1.csv
rm /home/data/pure/purevol_532D-2.csv
echo -n “Run time: ” > /home/data/pure/purevol_532D-1.csv
echo $(timestamp) >> /home/data/pure/purevol_532D-1.csv

#Add titles to new csv file
echo “Volume”,”Size”,”Thin Provisioning”,”Data Reduction”,” “,” “,”Total Reduction”,” “,” “,”Volume”,”Snapshots”,”Shared Space”,”System”,”Total” >> /home/data/pure/purevol_532D-1.csv

#Convert the space delimited file into a comma delimited file
sed -r ‘s/^\s+//;s/\s+/,/g’ /home/data/pure/purevol_532D.txt > /home/data/pure/purevol_532D-2.csv

#Combine the csv files into one
cat /home/data/pure/purevol_532D-1.csv /home/data/pure/purevol_532D-2.csv > /home/data/pure/purevol_532D.csv

#Use the csv2htm perl script to convert the csv to an html table
#csv2html script available here:
./ -e -T -i /home/data/pure/purevol_532D.csv -o /home/data/pure/purevol_532D.html

#Copy the html file to the www folder to publish it
cp /home/data/pure/purevol_532D.html /cygdrive/C/inetpub/wwwroot

Below is an example of what the output looks like after the script is run and the output is converted to an HTML table.  Note there are columns missing to the right in order to fit the formatting of this post.  Also included are the numbers for Total reduction and snapshots.

Name Size Thin Provisioning Data Reduction
LUN001_PURE_0025_ESX_5T 5T 78% 16.4 to 1
LUN002_PURE_0025_ESX_5T 5T 75% 7.8 to 1
LUN003_PURE_0025_ESX_5T 5T 71% 9.3 to 1
LUN004_PURE_0025_ESX_5T 5T 87% 10.5 to 1





Gathering performance data on a virtual windows server

When troubleshooting a potential storage related performance problem on a virtual windows server, it’s a bit more difficult to anaylze a because many virtual hosts share the same LUN for a datastore in ESX.  Using EMC’s analyzer or Control Center Performance Manager only gives me statistics on specific disks or LUNs, I have no visibility into a specific virtual server with those tools.  When this situation arises, I use a windows batch script to gather data with the typeperf command line utility for a specific time period and run it directly on the server.  Typically I’ll let it run for 24 hours and then analyze the data in Excel, where it’s easy to make charts and graphs to get a visual view of what’s going on.

Sometimes the most difficult thing to figure out is the correct syntax for the command and which parameters to use.  For reference, here is the command and it’s parameters:


Typeperf [Path [path ...]] [-cf FileName] [-f {csv|tsv|bin}] [-si interval] [-o FileName] [-q [object]] [-qx [object]] [-sc samples] [-config FileName] [-s computer_name]


-c { Path [ path ... ] | -cf   FileName } : Specifies the performance counter path to log. To list multiple counter paths, separate each command path by a space.
 -cf FileName : Specifies the file name of the file that contains the counter paths that you want to monitor, one per line.
 -f { csv | tsv | bin } : Specifies the output file format. File formats are csv (comma-delimited), tsv (tab-delimited), and bin (binary). Default format is csv.
 -si interval [ mm: ] ss   : Specifies the time between samples, in the [mm:] ss format. Default is one second.
 -o FileName   : Specifies the pathname of the output file. Defaults to stdout.
 -q [ object ] : Displays and queries available counters without instances. To display counters for one object, include the object name.
 -qx [ object ] : Displays and queries all available counters with instances. To display counters for one object, include the object name.
 -sc samples : Specifies the number of samples to collect. Default is to sample until you press CTRL+C.
 -config FileName : Specifies the pathname of the settings file that contains command line parameters.
 -s computer_name : Specifies the system to monitor if no server is specified in the counter path.
 /? : Displays help at the command prompt.

EMC’s Analyzer vs. Windows Perfmon Metrics

I tend to look at Response time, disk queue length, Total/Read/Write IO, and Service time first.   I dive into how to interpret many of the SAN performance metrics in my older post here. 

The counters you’ll choose in Windows performance monitor don’t precisely line up with what we commonly look at using EMC’s tools in how they are named, and in addition you can choose ‘LogicalDisk’ and ‘PhysicalDisk’ when selecting the counters.

What is the difference between the Physical Disk vs. Logical Disk performance objects in Perfmon, and why monitor both? Their counters are calculated the same way but their scope is different. I generally use both “\LogicalDisk(*)\” and “\PhysicalDisk(*)\” when I run my perfmon script.

The Physical Disk performance object monitors disk drives on the computer. It identifies the instances representing the physical hardware, and the counters are the sum of the access to all partitions on the physical instance.

The Logical Disk Performance object monitors logical partitions. Performance monitor identifies logical disks by their drive letter or mount point. If a physical disk contains multiple partitions, this counter will report the values just for the partition selected and not for the entire disk. On the other hand, when using Dynamic Disks the logical volumes may span more than one physical disk, in this scenario the counter values will include the access to the logical disk in all the physical disks it spans.

Here are the performance monitor counters that I frequently use, and how they compare to EMC’s navisphere analyzer (or ECC):

“\LogicalDisk(*)\Avg. Disk Queue Length” – (Named the same as EMC) The average number of outstanding requests when the disk was busy
“\LogicalDisk(*)\%% Disk Time” – (No direct EMC equivalent) The “% Disk Time” counter is the “Avg. Disk Queue Length” counter multiplied by 100. It is the same value displayed in a different scale.
“\LogicalDisk(*)\Disk Transfers/sec” – Total Throughput (IO/sec) – the total number of individual disk IO requests completed over a period of one second.  We’ll use this value to help determine Disk Service Time.
“\LogicalDisk(*)\Disk Reads/sec” – Read Throughput (IO/sec)
“\LogicalDisk(*)\Disk Writes/sec” – Write Throughput (IO/sec)
“\LogicalDisk(*)\%% Idle Time” –  (No direct EMC equivalent) This counter provides a very precise measurement of how much time the disk remained in idle state, meaning all the requests from the operating system to the disk have been completed and there are zero pending requests. We’ll also use this to calculate disk service time.
“\LogicalDisk(*)\Avg. Disk sec/Transfer” – Response time (sec) – EMC uses milliseconds, windows uses seconds, so you’ll see 8ms represented as .008 in the results.
“\LogicalDisk(*)\Avg. Disk sec/Read” – Response times for read IO
“\LogicalDisk(*)\Avg. Disk sec/Write” – Response times for write IO

Disk Service Time is caculated with this formula:  Disk Utilization = 100 – %Idle Time, then Disk Utilization  /  Disk Transfers/Sec. = Disk Service Time.

Configuring the Script

This batch script collects all of the relevant data for disk activity.  After 24 hours, it will dump the data into a csv file.  The length of time is controller by the combination of the “-sc” and “-si” parameters.  To collect data in one minute intervals for 24 hours, you’d set si to 60 (collect data every 60 seconds), and sc to 1440 (1440 minutes = 24 hours). To collect data every one minute for 30 minutes, you’d enter “-si 60 -sc 30”.  This script assumes you have a local directory on the C: Drive named ‘Collection’.

@echo off
cd c:\collection

@for /f "tokens=1,2,3,4 delims=/ " %%A in ('date /t') do @(set all=%%A%%B%%C%%D)
@for /f "tokens=1,2,3 delims=: " %%A in ('time /t') do @(set allm=%%A%%B%%C)

typeperf “\LogicalDisk(*)\Avg. Disk Queue Length” “\LogicalDisk(*)\%% Disk Time” “\LogicalDisk(*)\Disk 
Transfers/sec” “\LogicalDisk(*)\Disk Reads/sec” “\LogicalDisk(*)\Disk Writes/sec” “\LogicalDisk(*)\%% Idle Time” 
“\LogicalDisk(*)\Avg. Disk sec/Transfer” “\LogicalDisk(*)\Avg. Disk sec/Read” “\LogicalDisk(*)\Avg. Disk sec/Write” 
“\PhysicalDisk(*)\Avg. Disk Queue Length” “\PhysicalDisk(*)\%% Disk Time” “\PhysicalDisk(*)\Disk Transfers/sec” “\PhysicalDisk(*)\Disk Reads/sec” “\PhysicalDisk(*)\Disk Writes/sec” \PhysicalDisk(*)\%% Idle Time” “\PhysicalDisk(*)\Avg. Disk sec/Transfer” “\PhysicalDisk(*)\Avg. Disk sec/Read” "\PhysicalDisk(*)\Avg. Disk sec/Write” -si 60 -sc 1440 -o PerfCounters-%All%-%Allm%.csv