Category Archives: Clariion / VNX Block Specific

Archiving NAZ and NAR files from EMC VNX and Clariion arrays

It can be useful to copy and archive naz and nar files from all arrays to a local server.  It’s useful for helping EMC with troubleshooting efforts, general health checks, and researching historical trends.   I use them often with our EMC technical rep when a workload analysis is done, and it’s much faster to simply have them all copied somewhere automatically on a daily basis.

Not all of our arrays have an analyzer license, so the files are stored in “naz” format rather than “nar” format.  The naz files need to be sent to emc for decryption before they can be used by a customer.

The windows shell script below will store the current date in a variable, attempt to start analyzer and then pull the current file.  Arrays that don’t have an analyzer license will only run data collection for a maximum of 7 days.  The script attempts to start the service every day, so if it happens to have been 7 days it will start back up.  I set the archive interval to 600 seconds and run the script every 24 hours.

 

@ECHO OFF
 
For /f “tokens=2-4 delims=/ ” %%a in (‘date /t’) do (set date=%%a-%%b-%%c)
For /f “tokens=1-3 delims=: ” %%a in (‘time /t’) do (set time=%%a-%%b-%%c)
for /f “tokens=1-7 delims=:/-, ” %%i in (‘echo exit^|cmd /q /k”prompt $d $t”‘) do (
   for /f “tokens=2-4 delims=/-,() skip=1” %%a in (‘echo.^|date’) do (
      set dow=%%i
      set %%a=%%j
      set %%b=%%k
      set %%c=%%l
      set hh=%%m
      set min=%%n
      set ss=%%o
   )
)
 
echo Array01a
naviseccli -h Array01a analyzer -start
echo Array02a
naviseccli -h Array02a analyzer -start
echo Array03a
naviseccli -h Array03a analyzer -start
echo Array04a
naviseccli -h Array04a analyzer -start
echo Array05a
naviseccli -h Array05a analyzer -start
echo Array06a
naviseccli -h Array05a analyzer -start
 
NaviSECCli.exe -h Array01a analyzer -archiveretrieve -file APM00111100006_SPA_%date%-%time%.naz -Location D:\SAN\narcollection\Array01
NaviSECCli.exe -h Array01b analyzer -archiveretrieve -file APM00111100006_SPB_%date%-%time%.naz -Location D:\SAN\narcollection\Array01
 
NaviSECCli.exe -h Array02a analyzer -archiveretrieve -file APM00111000005_SPA_%date%-%time%.naz -Location D:\SAN\narcollection\Array02
NaviSECCli.exe -h Array02b analyzer -archiveretrieve -file APM00111000005_SPB_%date%-%time%.naz -Location D:\SAN\narcollection\Array02
 
NaviSECCli.exe -h Array03a analyzer -archiveretrieve -file APM00182700004_SPA_%date%-%time%.nar -Location D:\SAN\narcollection\Array03
NaviSECCli.exe -h Array03b analyzer -archiveretrieve -file APM00182700004_SPB_%date%-%time%.nar -Location D:\SAN\narcollection\Array03
 
NaviSECCli.exe -h Array04a analyzer -archiveretrieve -file APM00122600000_SPA_%date%-%time%.naz -Location D:\SAN\narcollection\Array04
NaviSECCli.exe -h Array04b analyzer -archiveretrieve -file APM00122600000_SPB_%date%-%time%.naz -Location D:\SAN\narcollection\Array04
 
NaviSECCli.exe -h Array05a analyzer -archiveretrieve -file APM00122700001_SPA_%date%-%time%.nar -Location D:\SAN\narcollection\Array05
NaviSECCli.exe -h Array05b analyzer -archiveretrieve -file APM00122700001_SPB_%date%-%time%.nar -Location D:\SAN\narcollection\Array05
 
NaviSECCli.exe -h Array06a analyzer -archiveretrieve -file APM00132900002_SPA_%date%-%time%.naz -Location D:\SAN\narcollection\Array06
NaviSECCli.exe -h Array06b analyzer -archiveretrieve -file APM00132900003_SPB_%date%-%time%.naz -Location D:\SAN\narcollection\Array06

 

Advertisements

Reporting on the Estimated Job Completion Times for FAST VP Data Relocation

Another one of the many daily reports I run reports on the current time remaining on the FAST VP data relocation times for all of our arrays.  I also make a single backup copy of the report to show the times for the previous day so I can get a quick view of progress that was made over the previous 24 hours.  Both reports are presented side by side on my intranet report page for easy comparison.

I made a post last year regarding how to deal with long running FAST VP data relocation jobs (http://emcsan.wordpress.com/2012/01/18/long-running-fast-vp-relocation-job/), and this report helps identify any arrays that could be falling behind.  If your estimated completion time is longer than the time window you have defined for your data relocation job you may need to make some changes, see my previous post for more information about that.

You can get the current status of the data relocation job at any time by running the following command:

naviseccli -h [array_hostname] autotiering -info -state -rate -schedule -opStatus -poolID [Pool_ID_Number]
 

The output looks like this:

Auto-Tiering State:  Enabled
Relocation Rate:  Medium
 
Schedule Name:  Default Schedule
Schedule State:  Enabled
Default Schedule:  Yes
Schedule Days:  Sun Mon Tue Wed Thu Fri Sat
Schedule Start Time:  20:00
Schedule Stop Time:  6:00
Schedule Duration:  10 hours
Storage Pools:  Array1_Pool1_SPB, Array1_Pool0_SPA
 
Storage Pool Name:  Array1_Pool0_SPA
Storage Pool ID:  0
Relocation Start Time:  08/15/13 20:00
Relocation Stop Time:  08/16/13 6:00
Relocation Status:  Inactive
Relocation Type:  Scheduled
Relocation Rate:  Medium
Data to Move Up (GBs):  8.00
Data to Move Down (GBs):  8.00
Data Movement Completed (GBs):  2171.00
Estimated Time to Complete:  4 minutes
Schedule Duration Remaining:  None
 
Storage Pool Name:  Array1_Pool1_SPB
Storage Pool ID:  1
Relocation Start Time:  08/15/13 20:00
Relocation Stop Time:  08/16/13 6:00
Relocation Status:  Inactive
Relocation Type:  Scheduled
Relocation Rate:  Medium
Data to Move Up (GBs):  14.00
Data to Move Down (GBs):  14.00
Data Movement Completed (GBs):  1797.00
Estimated Time to Complete:  5 minutes
Schedule Duration Remaining:  None
 

The output of the command is very verbose, I want to trim it down to only show me the pool name and the estimated time for the relocation job to complete.   This bash script will trim it down to only show the pool names and estimated completion times.

The final output of the script generated report looks like this: 

Runtime: Thu Aug 11 07:00:01 CDT 2013
Array1_Pool0:  9 minutes
Array1_Pool1:  6 minutes
Array2_Pool0:  1 hour, 47 minutes
Array2_Pool1:  3 minutes
Array2_Pool2:  2 days, 7 hours, 25 minutes
Array2_Pool3:  1 day, 9 hours, 58 minutes
Array3_Pool0:  1 minute
Array4_Pool0:  N/A
Array4_Pool1:  2 minutes
Array5_Pool1:  5 minutes
Array5_Pool0:  5 minutes
Array6_Pool0:  N/A
Array6_Pool1:  N/A

 

Below is the bash script that generates the report. The script is set up to report on six different arrays, it can be easily modified to suit your environment. 

TODAY=$(date)
echo “Runtime: $TODAY” > /reports/tierstatus.txt
echo $TODAY
#
naviseccli -h [array_hostname1] autotiering -info -state -rate -schedule -opStatus -poolID 0 > /reports/[array_hostname1]_tierstatus0.out
naviseccli -h [array_hostname1] autotiering -info -state -rate -schedule -opStatus -poolID 1 > /reports/[array_hostname1]_tierstatus1.out
#
echo `grep “Pool Name:” /reports/[array_hostname1]_tierstatus0.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname1]_tierstatus0.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
echo `grep “Pool Name:” /reports/[array_hostname1]_tierstatus1.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname1]_tierstatus1.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
#
naviseccli -h [array_hostname2] autotiering -info -state -rate -schedule -opStatus -poolID 0 > /reports/[array_hostname2]_tierstatus0.out
naviseccli -h [array_hostname2] autotiering -info -state -rate -schedule -opStatus -poolID 1 > /reports/[array_hostname2]_tierstatus1.out
naviseccli -h [array_hostname2] autotiering -info -state -rate -schedule -opStatus -poolID 2 > /reports/[array_hostname2]_tierstatus2.out
naviseccli -h [array_hostname2] autotiering -info -state -rate -schedule -opStatus -poolID 3 > /reports/[array_hostname2]_tierstatus3.out
#
echo `grep “Pool Name:” /reports/[array_hostname2]_tierstatus0.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname2]_tierstatus0.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
echo `grep “Pool Name:” /reports/[array_hostname2]_tierstatus1.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname2]_tierstatus1.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
echo `grep “Pool Name:” /reports/[array_hostname2]_tierstatus2.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname2]_tierstatus2.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
echo `grep “Pool Name:” /reports/[array_hostname2]_tierstatus3.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname2]_tierstatus3.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
#
naviseccli -h [array_hostname3] autotiering -info -state -rate -schedule -opStatus -poolID 0 > /reports/[array_hostname3]_tierstatus0.out
#
echo `grep “Pool Name:” /reports/[array_hostname3]_tierstatus0.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname3]_tierstatus0.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
#
naviseccli -h [array_hostname4] autotiering -info -state -rate -schedule -opStatus -poolID 0 > /reports/[array_hostname4]_tierstatus0.out
naviseccli -h [array_hostname4] autotiering -info -state -rate -schedule -opStatus -poolID 1 > /reports/[array_hostname4]_tierstatus1.out
#
echo `grep “Pool Name:” /reports/[array_hostname4]_tierstatus0.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname4]_tierstatus0.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
echo `grep “Pool Name:” /reports/[array_hostname4]_tierstatus1.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname4]_tierstatus1.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
#
naviseccli -h [array_hostname5] autotiering -info -state -rate -schedule -opStatus -poolID 0 > /reports/[array_hostname5]_tierstatus0.out
naviseccli -h [array_hostname5] autotiering -info -state -rate -schedule -opStatus -poolID 1 > /reports/[array_hostname5]_tierstatus1.out
#
echo `grep “Pool Name:” /reports/[array_hostname5]_tierstatus0.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname5]_tierstatus0.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
echo `grep “Pool Name:” /reports/[array_hostname5]_tierstatus1.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname5]_tierstatus1.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
#
naviseccli -h [array_hostname6] autotiering -info -state -rate -schedule -opStatus -poolID 0 > /reports/[array_hostname6]_tierstatus0.out
naviseccli -h [array_hostname6] autotiering -info -state -rate -schedule -opStatus -poolID 1 > /reports/[array_hostname6]_tierstatus1.out
#
echo `grep “Pool Name:” /reports/[array_hostname6]_tierstatus0.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname6]_tierstatus0.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
echo `grep “Pool Name:” /reports/[array_hostname6]_tierstatus1.out |awk ‘{print $4}’`”:  “`grep Complete: /reports/[array_hostname6]_tierstatus1.out |awk ‘{print $5,$6,$7,$8,$9,$10}’` >> /reports/tierstatus.txt
#
#Copy the current report to a new file and rename it, one prior day is saved.
cp /cygdrive/c/inetpub/wwwroot/tierstatus.txt /cygdrive/c/inetpub/wwwroot/tierstatus_yesterday.txt
#Remove the current report on the web page.
rm /cygdrive/c/inetpub/wwwroot/tierstatus.txt
#Copy the new report to the web page.
cp /reports/tierstatus.txt /cygdrive/c/inetpub/wwwroot

 

 

Reporting on Clariion / VNX Block Storage Pool capacity with a bash script

I recently added a post about how to report on Celerra & VNX File pool sizes with a bash script. I’ve also been doing that for a long time with our Clariion and VNX block pools so I thought I’d share that information as well.

I use a cron job to schedule the report daily and copy it to our internal web server. I then run the csv2html.pl perl script (from http://www.jpsdomain.org/source/perl.html) to convert it to an HTML output file to add to our intranet report page. This is likely the most viewed report I create on a daily basis as we always seem to be running low on available capacity.

The output of the script looks similar to this:

PoolReport

Here is the bash script that creates the report:

TODAY=$(date)
#Add the current time/date stamp to the top of the report 
echo $TODAY > /scripts/PoolReport.csv

#Create a file with the standard navisphere CLI output for each storage pool (to be processed later into the format I want)

naviseccli -h VNX_1 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_0.csv

naviseccli -h VNX_1 storagepool -list -id 1 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_1.csv

naviseccli -h VNX_1 storagepool -list -id 2 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_2.csv

naviseccli -h VNX_1 storagepool -list -id 3 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_3.csv

naviseccli -h VNX_1 storagepool -list -id 4 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_6.csv

#

naviseccli -h VNX_2 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_4.csv

naviseccli -h VNX_2 storagepool -list -id 1 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_5.csv

naviseccli -h VNX_2 storagepool -list -id 2 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_7.csv

naviseccli -h VNX_2 storagepool -list -id 3 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_8.csv

#

Naviseccli -h VNX_3 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site1_6.csv

#

naviseccli -h VNX_4 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site2_0.csv

naviseccli -h VNX_4 storagepool -list -id 1 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site2_1.csv

naviseccli -h VNX_4 storagepool -list -id 2 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site2_2.csv

naviseccli -h VNX_4 storagepool -list -id 3 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site2_3.csv

#

naviseccli -h VNX_5 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site2_4.csv

#

naviseccli -h VNX_6 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site3_0.csv

naviseccli -h VNX_6 storagepool -list -id 1 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site3_1.csv

#

naviseccli -h VNX_7 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site3_0.csv

naviseccli -h VNX_7 storagepool -list -id 1 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site3_1.csv

#

naviseccli -h VNX_8 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site4_0.csv

naviseccli -h VNX_8 storagepool -list -id 1 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site4_1.csv

#

naviseccli -h VNX_9 storagepool -list -id 0 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site5_0.csv

naviseccli -h VNX_9 storagepool -list -id 1 -availableCap -consumedCap -UserCap -prcntFull >/scripts/report_site5_1.csv

#

#Create a new file for each site's storage pool (from the file generated in the previous step) hat contains only the info that I want.

#

cat /scripts/report_site1_4.csv | grep Name > /scripts/Site1Pool4.csv

cat /scripts/report_site1_4.csv | grep GBs >>/scripts/Site1Pool4.csv

cat /scripts/report_site1_4.csv | grep Full >>/scripts/Site1Pool4.csv

#

cat /scripts/report_site1_5.csv | grep Name > /scripts/Site1Pool5.csv

cat /scripts/report_site1_5.csv | grep GBs >>/scripts/Site1Pool5.csv

cat /scripts/report_site1_5.csv | grep Full >>/scripts/Site1Pool5.csv

#

cat /scripts/report_site1_0.csv | grep Name > /scripts/Site1Pool0.csv

cat /scripts/report_site1_0.csv | grep GBs >>/scripts/Site1Pool0.csv

cat /scripts/report_site1_0.csv | grep Full >>/scripts/Site1Pool0.csv

#

cat /scripts/report_site1_1.csv | grep Name > /scripts/Site1Pool1.csv

cat /scripts/report_site1_1.csv | grep GBs >;;>;;/scripts/Site1Pool1.csv

cat /scripts/report_site1_1.csv | grep Full >;;>;;/scripts/Site1Pool1.csv

#

cat /scripts/report_site1_2.csv | grep Name > /scripts/Site1Pool2.csv

cat /scripts/report_site1_2.csv | grep GBs >>/scripts/Site1Pool2.csv

cat /scripts/report_site1_2.csv | grep Full >>/scripts/Site1Pool2.csv

#

cat /scripts/report_site1_7.csv | grep Name > /scripts/Site1Pool7.csv

cat /scripts/report_site1_7.csv | grep GBs >>/scripts/Site1Pool7.csv

cat /scripts/report_site1_7.csv | grep Full >>/scripts/Site1Pool7.csv

#

cat /scripts/report_site1_8.csv | grep Name > /scripts/Site1Pool8.csv

cat /scripts/report_site1_8.csv | grep GBs >>/scripts/Site1Pool8.csv

cat /scripts/report_site1_8.csv | grep Full >>/scripts/Site1Pool8.csv

#

cat /scripts/report_site1_3.csv | grep Name > /scripts/Site1Pool3.csv

cat /scripts/report_site1_3.csv | grep GBs >>/scripts/Site1Pool3.csv

cat /scripts/report_site1_3.csv | grep Full >>/scripts/Site1Pool3.csv

#

cat /scripts/report_site1_6.csv | grep Name > /scripts/Site1Pool6.csv

cat /scripts/report_site1_6.csv | grep GBs >>/scripts/Site1Pool6.csv

cat /scripts/report_site1_6.csv | grep Full >>/scripts/Site1Pool6.csv

#

cat /scripts/report_site1_6.csv | grep Name > /scripts/Site1Pool6.csv

cat /scripts/report_site1_6.csv | grep GBs >>/scripts/Site1Pool6.csv

cat /scripts/report_site1_6.csv | grep Full >>/scripts/Site1Pool6.csv

#

cat /scripts/report_site2_0.csv | grep Name > /scripts/Site2Pool0.csv

cat /scripts/report_site2_0.csv | grep GBs >>/scripts/Site2Pool0.csv

cat /scripts/report_site2_0.csv | grep Full >>/scripts/Site2Pool0.csv

#

cat /scripts/report_site2_1.csv | grep Name > /scripts/Site2Pool1.csv

cat /scripts/report_site2_1.csv | grep GBs >>/scripts/Site2Pool1.csv

cat /scripts/report_site2_1.csv | grep Full >>/scripts/Site2Pool1.csv

#

cat /scripts/report_site2_2.csv | grep Name > /scripts/Site2Pool2.csv

cat /scripts/report_site2_2.csv | grep GBs >>/scripts/Site2Pool2.csv

cat /scripts/report_site2_2.csv | grep Full >>/scripts/Site2Pool2.csv

#

cat /scripts/report_site2_3.csv | grep Name > /scripts/Site2Pool3.csv

cat /scripts/report_site2_3.csv | grep GBs >>/scripts/Site2Pool3.csv

cat /scripts/report_site2_3.csv | grep Full >>/scripts/Site2Pool3.csv

#

cat /scripts/report_site2_4.csv | grep Name > /scripts/Site2Pool4.csv

cat /scripts/report_site2_4.csv | grep GBs >>/scripts/Site2Pool4.csv

cat /scripts/report_site2_4.csv | grep Full >>/scripts/Site2Pool4.csv

#

cat /scripts/report_site3_0.csv | grep Name > /scripts/Site3Pool0.csv

cat /scripts/report_site3_0.csv | grep GBs >>/scripts/Site3Pool0.csv

cat /scripts/report_site3_0.csv | grep Full >>/scripts/Site3Pool0.csv

#

cat /scripts/report_site3_1.csv | grep Name > /scripts/Site3Pool1.csv

cat /scripts/report_site3_1.csv | grep GBs >>/scripts/Site3Pool1.csv

cat /scripts/report_site3_1.csv | grep Full >>/scripts/Site3Pool1.csv

#

cat /scripts/report_site3_0.csv | grep Name > /scripts/Site4Pool0.csv

cat /scripts/report_site3_0.csv | grep GBs >>/scripts/Site4Pool0.csv

cat /scripts/report_site3_0.csv | grep Full >>/scripts/Site4Pool0.csv

#

cat /scripts/report_site3_1.csv | grep Name > /scripts/Site4Pool1.csv

cat /scripts/report_site3_1.csv | grep GBs >>/scripts/Site4Pool1.csv

cat /scripts/report_site3_1.csv | grep Full >>/scripts/Site4Pool1.csv

#

cat /scripts/report_site4_0.csv | grep Name > /scripts/Site5Pool0.csv

cat /scripts/report_site4_0.csv | grep GBs >>/scripts/Site5Pool0.csv

cat /scripts/report_site4_0.csv | grep Full >>/scripts/Site5Pool0.csv

#

cat /scripts/report_site4_1.csv | grep Name > /scripts/Site5Pool1.csv

cat /scripts/report_site4_1.csv | grep GBs >>/scripts/Site5Pool1.csv

cat /scripts/report_site4_1.csv | grep Full >>/scripts/Site5Pool1.csv

#

cat /scripts/report_site5_0.csv | grep Name > /scripts/Site6Pool0.csv

cat /scripts/report_site5_0.csv | grep GBs >>/scripts/Site6Pool0.csv

cat /scripts/report_site5_0.csv | grep Full >>/scripts/Site6Pool0.csv

#

cat /scripts/report_site5_1.csv | grep Name > /scripts/Site6Pool1.csv

cat /scripts/report_site5_1.csv | grep GBs >>/scripts/Site6Pool1.csv

cat /scripts/report_site5_1.csv | grep Full >>/scripts/Site6Pool1.csv

#

#The last section creates the final output for the report before it is processed into an html table. It creates a single line for each storage pool with the total GB available, total GB used, available GB, and the percent utilization of the pool.

#

echo 'Pool Name','Total GB ','Used GB ','Available GB ','Percent Full ' >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool0.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool0.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool0.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool0.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool0.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool1.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool1.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool1.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool1.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool1.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool2.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool2.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool2.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool2.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool2.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool3.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool3.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool3.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool3.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool3.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool6.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool6.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool6.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool6.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool6.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool4.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool4.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool4.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool4.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool4.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool5.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool5.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool5.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool5.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool5.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool7.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool7.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool7.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool7.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool7.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site1Pool8.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool8.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool8.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool8.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool8.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

#

echo `grep Name /scripts/Site1Pool6.csv |awk '{print $3}'`","`grep -i User /scripts/Site1Pool6.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site1Pool6.csv |awk '{print $4}'`","`grep -i Available /scripts/Site1Pool6.csv |awk '{print $4}'`","`grep -i Full /scripts/Site1Pool6.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

#

echo `grep Name /scripts/Site2Pool0.csv |awk '{print $3}'`","`grep -i User /scripts/Site2Pool0.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site2Pool0.csv |awk '{print $4}'`","`grep -i Available /scripts/Site2Pool0.csv |awk '{print $4}'`","`grep -i Full /scripts/Site2Pool0.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site2Pool1.csv |awk '{print $3}'`","`grep -i User /scripts/Site2Pool1.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site2Pool1.csv |awk '{print $4}'`","`grep -i Available /scripts/Site2Pool1.csv |awk '{print $4}'`","`grep -i Full /scripts/Site2Pool1.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site2Pool2.csv |awk '{print $3}'`","`grep -i User /scripts/Site2Pool2.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site2Pool2.csv |awk '{print $4}'`","`grep -i Available /scripts/Site2Pool2.csv |awk '{print $4}'`","`grep -i Full /scripts/Site2Pool2.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site2Pool3.csv |awk '{print $3}'`","`grep -i User /scripts/Site2Pool3.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site2Pool3.csv |awk '{print $4}'`","`grep -i Available /scripts/Site2Pool3.csv |awk '{print $4}'`","`grep -i Full /scripts/Site2Pool3.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

#

echo `grep Name /scripts/Site2Pool4.csv |awk '{print $3}'`","`grep -i User /scripts/Site2Pool4.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site2Pool4.csv |awk '{print $4}'`","`grep -i Available /scripts/Site2Pool4.csv |awk '{print $4}'`","`grep -i Full /scripts/Site2Pool4.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

#

echo `grep Name /scripts/Site3Pool0.csv |awk '{print $3}'`","`grep -i User /scripts/Site3Pool0.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site3Pool0.csv |awk '{print $4}'`","`grep -i Available /scripts/Site3Pool0.csv |awk '{print $4}'`","`grep -i Full /scripts/Site3Pool0.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site3Pool1.csv |awk '{print $3}'`","`grep -i User /scripts/Site3Pool1.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site3Pool1.csv |awk '{print $4}'`","`grep -i Available /scripts/Site3Pool1.csv |awk '{print $4}'`","`grep -i Full /scripts/Site3Pool1.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

#

echo `grep Name /scripts/Site4Pool0.csv |awk '{print $3}'`","`grep -i User /scripts/Site4Pool0.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site4Pool0.csv |awk '{print $4}'`","`grep -i Available /scripts/Site4Pool0.csv |awk '{print $4}'`","`grep -i Full /scripts/Site4Pool0.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site4Pool1.csv |awk '{print $3}'`","`grep -i User /scripts/Site4Pool1.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site4Pool1.csv |awk '{print $4}'`","`grep -i Available /scripts/Site4Pool1.csv |awk '{print $4}'`","`grep -i Full /scripts/Site4Pool1.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

#

echo `grep Name /scripts/Site5Pool0.csv |awk '{print $3}'`","`grep -i User /scripts/Site5Pool0.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site5Pool0.csv |awk '{print $4}'`","`grep -i Available /scripts/Site5Pool0.csv |awk '{print $4}'`","`grep -i Full /scripts/Site5Pool0.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site5Pool1.csv |awk '{print $3}'`","`grep -i User /scripts/Site5Pool1.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site5Pool1.csv |awk '{print $4}'`","`grep -i Available /scripts/Site5Pool1.csv |awk '{print $4}'`","`grep -i Full /scripts/Site5Pool1.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#

echo " ",'Total GB','Used GB','Available GB','Percent Full' >> /scripts/PoolReport.csv

#

#

echo `grep Name /scripts/Site6Pool0.csv |awk '{print $3}'`","`grep -i User /scripts/Site6Pool0.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site6Pool0.csv |awk '{print $4}'`","`grep -i Available /scripts/Site6Pool0.csv |awk '{print $4}'`","`grep -i Full /scripts/Site6Pool0.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

echo `grep Name /scripts/Site6Pool1.csv |awk '{print $3}'`","`grep -i User /scripts/Site6Pool1.csv |awk '{print $4}'`","`grep -i Consumed /scripts/Site6Pool1.csv |awk '{print $4}'`","`grep -i Available /scripts/Site6Pool1.csv |awk '{print $4}'`","`grep -i Full /scripts/Site6Pool1.csv |awk '{print $3}'` >> /scripts/PoolReport.csv

#

#Convert the file to HTML for use on the internal web server 

#

./csv2htm.pl -e -T -i /scripts/PoolReport.csv -o /webfolder/PoolReport.html

Frequent 0x622 and 0x606 errors in the SP Event Logs

During some routine checking of the SP Event logs on our NS-40 I was noticing a large number of alerts. Every few seconds I was seeing these three alerts pop in:

0x60a Internal Information Only. A logical unit has been enabled
0x622 Background Verify Aborted
0x606 Unit Shutdown for trespass
 

After a bit of investigation, I narrowed down the cause to several large LUNs that had just been added to a new ESX host.  It turns out that the LUNs were still running the background zeroing process, and that’s what was causing all the alerts in the SP Log. When you create a new LUN and the disks have been previously used for other LUNs, the new LUN needs to be “zeroed” (filled with all zeros to clear data). This takes place in the background and it is part of the LUN initialization.  Once this background zeroing process completed on my new LUNs the alert messages stopped.  I was unaware of that process, so I did a bit of research on it.

LUNs are immediately available for use after a bind (using “Fastbind”), however all the operations associated with a bind can take a long time to finish.  The duration of a LUN bind is dependent on these things:

  • LUN’s bind time background verify priority (rate)
  • Size of the LUN being bound
  • Type of drives in the LUN’s RAID Group
  • Potential disabling of initial verify on bind
  • State of the Storage System (Idle or Load)
  • Position of the LUN on the hard disks of the RAID Group

From that list, priority, LUN size, drive type and verification selection all have the greatest effect on duration.  You can calculate the approximate duration of the bind process with this formula:

Time = Bound LUN Capacity * Bind Rate

Here are the Average Bind Rates for FC and SATA disks:

Disk Type ASAP Bind Rate High Bind Rate Medium (default) Bind Rate Low Bind Rate
FC 83 MB/s 7.54 MB/s 5.02 MB/s 4.02 MB/s
SATA 61.7 MB/s 7.47 MB/s 5.09 MB/s 3.78 MB/s

If we were to calculate how many hours it would take to bind a 2000GB LUN on a five disk RAID5 group composed of SATA drives set to a medium (default) bind rate, here’s what the formula would look like:

Time = 2000 GB * ((1/5.09 MB/s) * 1024 MB->GB * (1/3600 sec->hrs) = 111.76 Hours.

There is a detailed white paper that covers this topic from EMC called “The Effect of Priorities on LUN Management Operations” that you can view here:  http://www.emc.com/collateral/hardware/white-papers/h4153-influence-priorities-emc-clariion-lun-wp.pdf.  That’s where I gathered the information above.

Automating VNX Storage Processor Percent Utilization Alerts

Note:  The original post describes a method that requires EMC Control Center and Performance Manager.  That tool has been deprecated by EMC in favor of ViPR SRM.  There is still a method you can use to gather CPU information for use in bash scripts. I don’t have script examples that use this command, but if anyone needs help send me a comment and I’ll help. The Navisphere CLI command to get busy/idle ticks for the Storage processors is naviseccli -h getcontrol -cbt.

The output looks like this:

Controller busy ticks: 1639432
Controller idle ticks: 1773844

The SP utilization statistics outputted are an average of the utilization across all the cores of the SP’s processors since the last reset. To get the actual point-in-time SP CPU utilization from this output requires a calculation. You need to poll twice, create a delta for the individual counters by subtracting the earlier value from the later, and apply this formula:

Utilization = Busy Ticks / (Busy Ticks + Idle Ticks)

What follows is the original method I posted that requries EMC Control Center.

I was tasked with coming up with a way to get email alerts whenever our SP utilization breaks a certain threshold.  Since none of the monitoring tools that we own will do that right now, I had to come up with a way using custom scripts.  This is my 2nd post on the same subject, I removed my post from yesterday as it didn’t work as I intended.  This time I used EMC’s Performance Manager rather than pulling data from the SP with the Navisphere CLI.

First, I’m running all of my bash scripts on a windows sever using cygwin.  These should run fine on any linux box as well, however.  Because I don’t have a native sendmail configuration set up on the windows server, I’m using the control station on the Celerra to actually do the comparison of the utilization numbers in the text files and then email out an alert.  The Celerra control station automatically pulls the file via FTP from the windows server every 30 minutes and sends out an email alert if the numbers cross the threshold.  A description of each script and the schedule is below.

Windows Server:

Export.cmd:

This first windows batch script runs an export (with pmcli) from EMC Performance Manager that does a dump of all the performance stats for the current day.

For /f "tokens=2-4 delims=/ " %%a in ('date /t') do (set date=%%c%%a%%b)

C:\ECC\Client.610\PerformanceManager\pmcli.exe -export -out c:\cygwin\home\scripts\sputil999_interval.csv -type interval -class clariion -date %date% -id APM00400500999

Data.cmd:

This cygwin/bash script manipulates the file export from above and ultimately creates two single text files (one for SPA and one for SPB) with a single numerical value of the most recent SP Utilization.  There are a few extra steps at the beginning of the script that are irrelevant to the SP utilization, they’re there for other purposes.

#This will pull only the timestamp line from the top

grep -m 1 "/" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/timestamp.csv

# This will pull out only the "disk utilization" line.

grep -i "^% Utilization" /home/scripts/sputil/0999_interval.csv >> /home/scripts/sputil/stats.csv

# This will pull out the disk/LUN title info for the first column

grep -i "Data Collected for DiskStats -" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/diskstats.csv

grep -i "Data Collected for LUNStats -" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/lunstats.csv

# This will create a column with the disk/LUN number

cat /home/scripts/sputil/diskstats.csv /home/scripts/sputil/lunstats.csv > /home/scripts/sputil/data.csv

# This combines the disk/LUN column with the data column

paste /home/scripts/sputil/data.csv /home/scripts/sputil/stats.csv > /home/scripts/sputil/combined.csv

cp /home/scripts/sputil/combined.csv /home/scripts/sputil/utilstats.csv
 

#  This removes all the temporary files
rm /home/scripts/sputil/timestamp.csv
rm /home/scripts/sputil/stats.csv
rm /home/scripts/sputil/diskstats.csv
rm /home/scripts/sputil/lunstats.csv
rm /home/scripts/sputil/data.csv
rm /home/scripts/sputil/combined.csv

# This next line strips the file of all but the last two rows, which are SP Utilization.

# The 1 looks at the first character in the row, the D specifies "starts with D", then deletes rows meeting those conditions.

awk -v FS="" -v OFS="" '$1 != "D"' < /home/scripts/sputil/utilstats.csv > /home/scripts/sputil/sputil.csv

#This pulls the values from the last column, which would be the most recent.

awk -F, '{print $(NF-1)}' < /home/scripts/sputil/sputil.csv > /home/scripts/sputil/sp_util.csv

#pull 1st line (SPA) into separate file

sed -n 1,1p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spAutil.txt

#pull 2nd line (SPB) into separate file

sed -n 2,2p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spButil.txt

#The spAutil.txt/spButil.txt files now contain only a single numerical value, which would be the most recent %utilization from the Control Center/Performance Manager dump file.

#Copy files to web server root directory

cp /home/scripts/sputil/*.txt /cygdrive/c/inetpub/wwwroot

Celerra Control Station:

CelerraArray:/home/nasadmin/sputil/ftpsp.sh

The script below connects to the windows server and grabs the current SP utilization text files via FTP every 30 minutes (via a cron job).

#!/bin/bash
cd /home/nasadmin/sputil
ftp windows_server.domain.net <<SCRIPT
get spAutil.txt
get spButil.txt
quit
SCRIPT
 CelerraArray:/home/nasadmin/sputil/spcheck.sh:

This script does the comparison check to see if the SP utilization is over our threshold. If it is, it sends an email alert that includes the %Utilization number in the subject line of the email. To change the threshold setting, you’d need to change the THRESHOLD=<XX> line in the script.  The line containing printf “%2.0f” converts the floating point value to an integer, as bash scripts don’t recognize floating point values.

#!/bin/bash

SPB=`cat /home/nasadmin/sputil/spButil.txt` 
SPBcheck= printf "%2.0f" $SPB > /home/nasadmin/sputil/spButil2.txt 
SPB=`cat /home/nasadmin/sputil/spButil2.txt`

echo $SPB
THRESHOLD=50
if [ $SPB -eq 0 ] && [ $THRESHOLD -eq 0 ] 
then 
        echo "Both are zero"
 elif [ $SPB -eq $THRESHOLD ]
 then         
        echo "Both Values are equal"
 elif [ $SPB -gt $THRESHOLD ]
 then          
        echo "SPB is greater than the threshold.  Sending alert" 

        uuencode spButil.txt | mail -s "<array_name> SPB Utilization Alert: $SPB % above threshold of $THRESHOLD %" notify@domain.com
else         
echo "$SPB is lesser than $THRESHOLD" 
fi

CelerraArray Crontab schedule:

The FTP script is currently set to pull SP utilization files.  Run “crontab –e” to edit the scheduler.  I’ve got the alert script set to run at the top of the hour and half past the hour, and the updated SP files from the web server are FTP’d in a few minutes prior.

[nasadmin@CelerraArray sputil]$ crontab –l
58,28 * * * * /home/nasadmin/sputil/ftpsp.sh
0,30 * * * * /home/nasadmin/sputil/spcheck.sh
 Overall Scheduling:

Windows Server:

Performance Manager Dump runs 15 minutes past the hour (exports data)
Data script runs at 20 minutes past the hour (processes data to get SP Utilization)

Celerra Server:

FTP script pulls new SP utilization text files at 28 minutes past the hour
Alert script runs at 30 minutes past the hour

The cycle then repeats at minute 45, minute 50, minute 58, and minute 0.

 

What’s new in FLARE 32?

We recently installed a new VNX5700 at the end of July and EMC was on-site to install it the day after the general release of FLARE 32. We had planned our pool configuration around this release, so the timing couldn’t have been more perfect. After running the new release for about a month now it’s proven to be rock solid, and to date no critical security patches have been released that we needed to apply.

The most notable new feature for us was the addition of mixed RAID types within the same pool.  We can finally use RAID6 for the large NL-SAS drives in the pool and not have to make the entire pool RAID6.  There also are several new performance enhancements that should make a big impact, including load balancing within a tier and pool rebalancing.

Below is an overview of the new features in the initial release (05.32.000.5.006).

· VNX Block to Unified Services Plug-n-play: This feature allows customers to perform Block to Unified upgrades.

· Support for In-Family upgrades: This feature allows for the In-family Data In Place (DIP) Conversion kits that are to be available for the VNX OE for File 7.1 and VNX OE for Block 05.32 release. In-family DIP conversions across the VNX family will re-use the installed SAS DAEs with associated drives and SLICs

· Windows 2008 R2: Branch Cache Support: This feature provides support for the branch cache feature in Windows 2008 R2. This feature allows a Windows client to cache a file locally in one of the branch office servers and then use the cached copy whenever the same file is being requested. The VNX for File CIFS Server operates as the Central Office Content Server in support of the Branch Cache feature.

· VAAI for NFS: Snaps of snaps: Supports snap of snap of VMDK files on a NFS file systems to at least 1 level of depth (source to snap to snap). Though the functionality will initially only be available through the VAAI for NFS interface, it may in future be exposed through the VSI plug-in as well. This feature allows VMware View 5.0 to use VAAI for NFS on a VNX system. This feature also requires that file systems be enabled during creation for “VMware VAAI nested clone support”, and the installation of the EMCNasPlugin-1-0.10.zip on the ESX servers.”

· Load balance within a tier: This feature allows for redistribution of slices across all drives in a given tier of a pool to improve performance. This also includes proactive load balancing, such that slices are relocated from one RAID group in a tier to another, based on activity level, to balance the load within the tier. These relocations will occur within the user-defined tiering relocation window.

· Improve block compression performance: This feature provides for improved block compression performance. This includes increasing speed of compression operations, decreasing storage system impact, and decreasing host response time impact.

· Deeper File Compression Algorithm: This feature provides an option to utilize deeper compression algorithm for greater capacity savings, when using file-level compression. This option can be leveraged by 3rd party application servers such as the FileMover Appliance Server, based on data types (i.e. per metadata definitions) that are best suited for the deeper compression algorithm).

· Rebalance when adding drives to Pools: This feature provides for the redistribution of slices across all drives in a newly expanded pool to improve performance.

· Conversion of DLU to TLU and back, when enabling Block Compression: This feature provides an internal mechanism (not user-invoked), when enabling Compression on a thick pool LUN, that would result in an in-place conversion from Thick (“direct”) pool LUNs to Thin Pool LUNs, rather than performing a migration. Additionally, for LUNs that were originally Thick and then converted to Thin, it provides an internal mechanism, upon disabling compression, to convert the LUNs back to Thick, without requiring a user-invoked migration.

· Mixed RAID Types in Pools: This feature allows a user to define RAID types per storage tier in pool, rather than requiring a single RAID type across all drives in a single pool.

· Improved TLU performance, no worse than 115% of a FLU: This feature provides improved TLU performance. This includes decreasing host response time impact and potentially decreasing storage system impact.

· Distinguished Compression Capacity Savings: This feature provides a display of the capacity savings due to compression. This display will inform the user of the savings due to compression, distinct from the savings due to thin provisioning. The benefit for the user is that he can determine the incremental benefit of using compression. This is necessary because there is currently a performance impact when using Compression, so users need to be able to make a cost/benefit analysis.

· Additional Tiering Policies: This feature provides additional tiering options in pools: namely, “Start High, then Auto-tier”. When the user selects this policy for a given LUN, the initial allocation is on highest tier, and subsequent tiering is based on activity.

· Additional RAID Options in Pools: This feature provides 2 additional RAID options in pools, for better efficiency: 8+1 for RAID 5, and 14+2 for RAID 6. These options will be available for new pools. These options will be available in both the GUI and the CLI.

· E-Trace Enhancements: Top files per fs and other stats: This feature allows the customer to identify the top files in a file system or quota tree. The files can be identified by pathnames instead of ids.

· Support VNX Snapshots in Unisphere Quality of Service Manager: This feature provides Unisphere Quality of Service Manager (UQM) support for both the source LUN and the snapshot LUN introduced by the VNX Snapshots feature.

· Support new VNX Features in Unisphere Analyzer: This feature provides support for all new VNX features in Unisphere Analyzer, including but not limited to VNX Snapshots and 64-bit counters.

· Unified Network Services: This feature provides several enhancements that will improve user experience. The various enhancements delivered by UNS in VNX OE for File 7.1 and VNX OE for Block 05.32 release are as follows:

· Continuous Monitoring: This feature provides the ability to monitor the vital statistics of a system continuously (out-of-box) and then take appropriate action when specific conditions are detected. The user can specify multiple statistical counters to be monitored – the default counters that will be monitored are CPU utilization, memory utilization and NFS IO latency on all file systems. The conditions when an event would be raised can also be specified by the user in terms of a threshold value and time interval during which the threshold will need to be exceeded for each statistical counter being monitored. When an event is raised, the system can perform any number of actions – possible choices are log the event, start detailed correlated statistics collection for a specified time period, send email or send a SNMP trap.

· Unisphere customization for VNX: This feature provides the addition of custom references within Unisphere Software (VNX) via editable source files for product documentation and packaging, custom badges and nameplates.

· VNX Snapshots: This feature provides VNX Snapshots (a.k.a write-in-place pointer-based snapshots) that in their initial release will support Block LUNs only and require pool-based LUNs. VNX Snapshots will support File Systems in a later release. The LUNs referred to below are pool-based LUNs (thick LUNs and Thin LUNs.)

· NDMP V4 IPv6 extension: This feature provides support for the Symantec, NetApp, and EMC authored and approved NDMP v4 IPv6 extension in order to back up using NDMP 2-way and 3-way in IPv6 networked environments.

· NDMP Access Time: This feature provides the last access time (atime). In prior releases, this was not retained during an NDMPCopy and was thus set to the time of migration. So, after a migration, the customer lost the ability to archive “cold” or “inactive” data. This feature adds an optional NDMP variable (RETAIN_ATIME=y/n, the default being ‘n’) which if set includes the atime in the NDMP data stream so that it can be restored properly on the destination.

· SRDF Interoperability for Control Station: This feature provides SRDF (Symmetrix Remote Data Facility) with the ability to manage the failover process between local and remote VNX Gateways. VNX Gateway needs a way to give up control of the failover and failback to an external entity and to suppress the initiation of these processes from within the Gateway.

 

Long Running FAST VP relocation job

I’ve noticed that our auto-tier data relocation job that runs every evening consistently shows 2+days for the estimated time of completion. We have it set to run only 8 hours per day, so with our current configuration it’s likely the job will never reach a completed state. Based on that observation I started investigating what options I had to try and reduce the amount of time that the relocation jobs runs.

Running this command will tell you the current amount of time estimated to complete the relocation job data migrations and how much data is queued up to move:

Naviseccli –h <clarion_ip> autotiering –info –opStatus

Auto-Tiering State: Enabled
Relocation Rate: Medium
Schedule Name: Default Schedule
Schedule State: Enabled
Default Schedule: Yes
Schedule Days: Sun Mon Tue Wed Thu Fri Sat
Schedule Start Time: 22:00
Schedule Stop Time: 6:00
Schedule Duration: 8 hours
Storage Pools: Clariion1_SPB, Clariion2_SPA
Storage Pool Name: Clariion2_SPA
Storage Pool ID: 0
Relocation Start Time: 12/05/11 22:00
Relocation Stop Time: 12/06/11 6:00
Relocation Status: Inactive
Relocation Type: Scheduled
Relocation Rate: Medium
Data to Move Up (GBs): 2854.11
Data to Move Down (GBs): 1909.06
Data Movement Completed (GBs): 2316.00
Estimated Time to Complete: 2 days, 9 hours, 12 minutes
Schedule Duration Remaining: None
 

I’ll review some possibilities based on research I’ve done in the past few days.  I’m still in the evaluation process and have not made any changes yet, I’ll update this blog post once I’ve implemented a change myself.  If you are having issues with your data relocation job not finishing I would recommend opening an SR with EMC support for a detailed analysis before implementing any of these options.

1. Reduce the number of LUNs that use auto-tiering by disabling it on a LUN-by-LUN basis.

I would recommend monitoring which LUNs have the highest rate of change when the relocation job runs and then evaluate if any can be removed from auto-tiering altogether.  The goal of this would be to reduce the amount of data that needs to be moved.  The one caveat with this process is that when a LUN has auto-tiering disabled, the tier distribution of the LUN will remain exactly the same from the moment it is disabled.  If you disable it on a LUN that is using a large amount of EFD it will not change unless you force it to a different tier or re-enable auto-tiering later.

This would be an effective way to reduce the amount of data being relocated, but the process of determining which LUNs should have auto-tiering disabled is subjective and would require careful analysis.

2. Reset all the counters on the relocation job.

Any incorrectly labeled “hot” data will be removed from the counters and all LUNs would be re-evaluated for data movement.  One of the potential problems with auto-tiering is with servers that have IO intensive batch jobs that run infrequently.  That data would be incorrectly labeled as “hot” and scheduled to move up even though the server is not normally busy.  This information is detailed in emc268245.

To reset the counters, use the command to stop and start autotiering:

Naviseccli –h <clarion_ip> autotiering –relocation -<stop | start>

If you need to temporarily stop replication and do not want to reset the counters, use the pause/resume command instead:

Naviseccli –h <clarion_ip> autotiering –relocation -<pause | resume>

I wanted to point out that changing a specific LUN from “auto-tier” to “No Movement” also does not reset the counters, the LUN will maintain it’s tiering schedule. It is the same as pausing auto-tiering just for that LUN.

3. Increase free space available on the storage pools.

If your storage pools are nearly 100% utilized there may not be enough space to effectively migrate the data between the tiers.  Add additional disks to the pool, or migrate LUNs to other RAID groups or storage pools.

4. Increase the relocation rate.

This of course could have dramatic effects on IO performance if it’s increased and it should only be changed during periods of measured low IO activity.

Run this command to change the data relocation rate:

Naviseccli –h <clarion_ip> autotiering –setRate –rate <high | medium | low>

5. Use a batch or shell script to pause and restart the job with the goal of running it more frequently during periods of low IO activity.

There is no way to set the relocation schedule to run at different times on different days of the week, a script is necessary to accomplish that.  I currently run the job only in the middle of the night during off peak (non-business) hours, but I would be able to run it all weekend as well.  I have done that manually in the past.

You would need to use an external windows or unix server to schedule the scripts.  The relocation schedule should be set to run 24×7, then add the pause/resume command to have the job pause during the times you don’t want it to run.  To have it run on weekends and overnight, set up two separate scripts (one for pause and one for resume), then schedule each with task scheduler or cron to run throughout the week.

The cron schedule below would allow it to run from 10PM to 6AM on weeknights and from 10PM to 6AM on Monday over the weekend.

pause.sh:       Naviseccli –h <clarion_ip> autotiering –relocation –pause

resume.sh:   Naviseccli –h <clarion_ip> autotiering –relocation -resume

0 6 * * *  /scripts/pause.sh        @6AM on Monday – pause
0 10 * * * /scripts/resume.sh    @10PM on Monday – resume
0 6 * * *  /scripts/pause.sh        @6AM on Tuesay – pause
0 10 * * * /scripts/resume.sh    @10PM on Tuesday – resume
0 6 * * *  /scripts/pause.sh        @6AM on Wednesday – pause
0 10 * * * /scripts/resume.sh    @10PM on Wednesday – resume
0 6 * * *  /scripts/pause.sh        @6AM on Thursday – pause
0 10 * * * /scripts/resume.sh    @10PM on Thursday – resume
0 6 * * *  /scripts/pause.sh        @6AM on Friday – pause
0 10 * * * /scripts/resume.sh    @10PM on Friday – resume
<Do not pause again until Monday morning>
 

Reporting on the state of VNX auto-tiering

 

To go along with my previous post (reporting on LUN tier distribution) I also include information on the same intranet page about the current state of the auto-tiering job.  We run auto-tiering from 10PM to 6AM in the morning to avoid the movement of data during business hours or our normal backup window in the evening.

Sometimes the auto-tiering job will get very backed up and would theoretically never finish in the time slot that we have for data movement.  I like to keep tabs on the amount of data that needs to move up or down, and the amount of time that the array estimates until it’s completion.  If needed, I will sometimes modify the schedule to run 24 hours a day over the weekend and change it back early on Monday morning.  Unfortunately, EMC did not design the auto-tiering scheduler to allow for creating different time windows on different days. It’s a manual process.

This is a relatively simple, one line CLI command, but it provides very useful info and it’s convenient to add it to a daily report to see it at a glance.

I run this script at 6AM every day, immediately following the end of the window for data to move:

naviseccli -h clariion1_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion1_hostname.autotier.txt

naviseccli -h clariion2_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion2_hostname.autotier.txt

naviseccli -h clariion3_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion3_hostname.autotier.txt

naviseccli -h clariion4_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion4_hostname.autotier.txt

 ....
 The output for each individual clariion looks like this:
Auto-Tiering State: Enabled
Relocation Rate: Medium

Schedule Name: Default Schedule
Schedule State: Enabled
Default Schedule: Yes
Schedule Days: Sun Mon Tue Wed Thu Fri Sat
Schedule Start Time: 22:00
Schedule Stop Time: 6:00
Schedule Duration: 8 hours
Storage Pools: Clariion1_SPB, Clariion2_SPA

Storage Pool Name: Clariion2_SPA
Storage Pool ID: 0
Relocation Start Time: 12/05/11 22:00
Relocation Stop Time: 12/06/11 6:00
Relocation Status: Inactive
Relocation Type: Scheduled
Relocation Rate: Medium
Data to Move Up (GBs): 1854.11
Data to Move Down (GBs): 909.06
Data Movement Completed (GBs): 2316.00
Estimated Time to Complete: 9 hours, 12 minutes
Schedule Duration Remaining: None

Storage Pool Name: Clariion1_SPB
Storage Pool ID: 1
Relocation Start Time: 12/05/11 22:00
Relocation Stop Time: 12/06/11 6:00
Relocation Status: Inactive
Relocation Type: Scheduled
Relocation Rate: Medium
Data to Move Up (GBs): 1757.11
Data to Move Down (GBs): 878.05
Data Movement Completed (GBs): 1726.00
Estimated Time to Complete: 11 hours, 42 minutes
Schedule Duration Remaining: None
 
 

Reporting on LUN auto-tier distribution

We have auto-tiering turned on in all of our storage pools, which all use EFD, FC, and SATA disks.  I created a script that will generate a list of all of our LUNs and the current tier distribution for each LUN.  Note that this script is designed to run in unix.  It can be run using cygwin installed on a Windows server if you don’t have access to a unix based server.

You will first need to create a text file with a list of the hostnames for your arrays (or the IP to one of the storage processors for each array).  Separate lists must be made for VNX vs. older Clariion arrays, as the naviseccli output was changed for VNX.  For example, “Flash” in the text output on a CX was changed to “Extreme Performance” as the output from a VNX when you run the same command.  I have one file named san.list for the older arrays, and another named san2.list for the VNX arrays.

As I mentioned in my previous post, our naming convention for LUNs includes the pool ID, LUN number, server name, filesystem/drive letter, last four digits of the array’s serial number, and size (in GB). Having all of this information in the LUN name makes for very easy reporting.  This information is what truly makes this report useful, as simply having a list of LUNs gives me all the information I need for reporting.  If I need to look at tier distribution for a certain server from this report, I simply filter the list in the spreadsheet for the server name (which is included in the LUN name).

Here’s what our LUN names looks like: P1_LUN100_SPA_0000_servername_filesystem_150G

As I said earlier, because of output differences from the naviseccli command on VNX arrays vs. older CX’s, I have two separate scripts.  I’ll include the complete scripts first, then explain in more detail what each section does.

Here is the script for CX series arrays:

for san in `/bin/cat /reports/tiers/san.list`
do
naviseccli -h $san lun -list -tiers |grep LUN |awk '{print $2}' > $san.out 
     for lun in `cat $san.out`
        do
        sleep 2
        echo $san
        naviseccli -h $san -np lun -list -name $lun -tiers > $lun.$san.dat &
     done 

mv $san.report.csv $san.report.`date +%j`.csv 
echo "LUN Name","FLASH","FC","SATA" > $san.report.csv 
     for lun in `cat  $san.out`
        do
        echo $lun
        echo `grep Name $lun.$san.dat |awk '{print $2}'`","`grep -i flash $lun.$san.dat |awk '{print $2}'`","`grep -i fc $lun.$san.dat |awk '{print $2}'`","`grep -i sata $lun.$san.dat |awk '{print $2}'` >> $san.report.csv
     done
 done

./csv2htm.pl -e -T -i /reports/clariion1_hostname.report.csv -o /reports/clariion1_hostname.report.html

./csv2htm.pl -e -T -i /reports/clariion2_hostname.report.csv -o /reports/clariion2_hostname.report.html

./csv2htm.pl -e -T -i /reports/clariion3_hostname.report.csv -o /reports/clariion3_hostname.report.html

Here is the script for VNX series arrays:

for san in `/bin/cat /reports/tiers2/san2.list`
do
naviseccli -h $san lun -list -tiers |grep LUN |awk '{print $2}' > $san.out
   for lun in `cat $san.out`
     do
     sleep 2
     echo $san.Generating-LUN-List
     naviseccli -NoPoll -h $san lun -list -name $lun -tiers > $lun.$san.dat &
  done

mv $san.report.csv $san.report.`date +%j`.csv
echo "LUN Name","FLASH","FC","SATA" > $san.report.csv
   for lun in `cat  $san.out`
      do
      echo $lun
      echo `grep Name $lun.$san.dat |awk '{print $2}'`","`grep -i extreme $lun.$san.dat |awk '{print $3}'`","`grep -i Performance $lun.$san.dat |grep -v Extreme|awk '{print $2}'`","`grep -i Capacity $lun.$san.dat |awk '{print $2}'` >> $san.report.csv
   done
 done

./csv2htm.pl -e -T -i /reports/VNX1_hostname.report.csv -o /reports/VNX1_hostname.report.html

./csv2htm.pl -e -T -i /reports/VNX2_hostname.report.csv -o /reports/VNX2_hostname.report.html

./csv2htm.pl -e -T -i /reports/VNX3_hostname.report.csv -o /reports/VNX3_hostname.report.html
 Here is a more detailed explanation of the script.

Section 1:

The entire script runs in a loop based on the SAN hostname entries.   We’ll use this list in the next section to get the LUN information from each SAN that needs to be monitored.

for san in `/bin/cat /reports/tiers/san.list`

do

naviseccli -h $san lun -list -tiers |grep LUN |awk '{print $2}' > $san.out
 Section 2:

This section will run the naviseccli command for every lun in each of the <san_hostname>.out files, and output a single text file with the tier distribution for every LUN.  If you have 500 LUNs, then 500 text files will be created in the same directory that your run the script in.

     for lun in `cat $san.out`
        do
        sleep 2
        echo $san
        naviseccli -h $san -np lun -list -name $lun -tiers > $lun.$san.dat &
     done
 Each file will be named <lun_name>.dat, and the contents of the file looks like this:
LOGICAL UNIT NUMBER 962
Name:  P1_LUN962_0000_SPB_servername_filesystem_350G
Tier Distribution: 
Flash:  4.74%
FC:  95.26%
 Section 3:

This line simply makes a copy of the previous day’s output file for archiving purposes.  The %j adds the Julian date to the file (which is 1-365, the day of the year), so the files will automatically be overwritten after one year.  It’s a self cleaning archive directory.  🙂

mv $san.report.csv $san.report.`date +%j`.csv

Section 4:

This section then processes each individual LUN file pulling out only the tier information that we need, and then combines the list into one large output file in csv format.

The first line creates a blank CSV file with the appropriate column headers.

echo "LUN Name","FLASH","FC","SATA" > $san.report.csv

This block of code parses each individual LUN file, doing a grep for each column item that we need added to the report, and awk to only grab the specific text that we want from that line.  For example, if the LUN output file has “Flash:  4.74%” in one line, and we only want the “4.74%” and the word “Flash:” stripped off, we would do an awk ‘{print $2}’ to grab only the second line item.

     for lun in `cat  $san.out`
        do
        echo $lun
        echo `grep Name $lun.$san.dat |awk '{print $2}'`","`grep -i flash $lun.$san.dat |awk '{print $2}'`","`grep -i fc $lun.$san.dat |awk '{print $2}'`","`grep -i sata $lun.$san.dat |awk '{print $2}'` >> $san.report.csv
     done
done
 Once every LUN file has been processed and added to the report, I run the csv2html.pl perl script (from http://www.jpsdomain.org/source/perl.html) to add to our intranet website.  The csv files are also added as download links on the site.
./csv2htm.pl -e -T -i /reports/clariion1_hostname.report.csv -o /reports/clariion1_hostname.report.html

./csv2htm.pl -e -T -i /reports/clariion2_hostname.report.csv -o /reports/clariion2_hostname.report.html

./csv2htm.pl -e -T -i /reports/clariion3_hostname.report.csv -o /reports/clariion3_hostname.report.html
 And finally, the output looks like this:
LUN Name FLASH FC SATA
P0_LUN101_0000_SPA_servername_filesystem_100G

24.32%

67.57%

8.11%

P0_LUN102_0000_SPA_servername_filesystem_100G

5.92%

58.77%

35.31%

P1_LUN103_0000_SPA_servername_filesystem_100G

7.00%

81.79%

11.20%

P1_LUN104_0000_SPA_servername_filesystem_100G

1.40%

77.20%

21.40%

P0_LUN200_0000_SPA_servername_filesystem_100G

5.77%

75.06%

19.17%

P0_LUN201_0000_SPA_servername_filesystem_100G

6.44%

71.21%

22.35%

P0_LUN202_0000_SPA_servername_filesystem_100G

4.55%

90.91%

4.55%

P0_LUN203_0000_SPA_servername_filesystem_100G

10.73%

80.76%

8.52%

P0_LUN204_0000_SPA_servername_filesystem_100G

8.62%

88.31%

3.08%

P0_LUN205_0000_SPA_servername_filesystem_100G

10.88%

82.65%

6.46%

P0_LUN206_0000_SPA_servername_filesystem_100G

7.00%

81.79%

11.20%

P0_LUN207_0000_SPA_servername_filesystem_100G

1.40%

77.20%

21.40%

P0_LUN208_0000_SPA_servername_filesystem_100G

5.77%

75.06%

19.17%

Reporting on Trespassed LUNs

 

All of our production clariions are configured with two large tiered storage pools, one for LUNs on SPA and one for LUNs on SPB.  When storage is created on a server, two identical LUNs are created (one in each pool) and are striped at the host level.  I do it that way to more evenly balance the load on the storage processors.

I’ve noticed that LUNs will occassionally trespass to the other SP.  In order to keep the SP’s balanced how I want them, I will routinely check and trespass them back to their default owner.  Our naming convention for LUNs includes the SP that the LUN was initially configured to use, as well as the pool ID, server name, filesystem/drive letter, last four digits of serial number, and size.  Having all of this information in the LUN name makes for very easy reporting.  Having the default SP in the LUN name is required for this script to work as written.

Here’s what our LUN names looks like:     P1_LUN100_SPA_0000_servername_filesystem_150G

To quickly check on the status of any mismatched LUNs every morning, I created a script that generates a daily report.  The script first creates output files that list all of the LUNs on each SP, then uses simple grep commands to output only the LUNs whose SP designation in the name does not match the current owner.   The csv output files are then parsed by the csv2html perl script, which converts the csv into easy to read HTML files that are automatically posted on our intranet web site.  The csv2html perl script is from http://www.jpsdomain.org/source/perl.html and is under a GNU General Public License.  Note that this script is designed to run in unix.  It can be run using cygwin installed on a Windows server if you don’t have access to a unix based server.

Here’s the shell script (I have one for each clariion/VNX):

naviseccli -h clariion_hostname getlun -name -owner |grep -i name > /reports/sp/lunname.out

sleep 5

naviseccli -h clariion_hostname getlun -name -owner |grep -i current >  /reports/sp/currentsp.out

sleep 5

paste -d , /reports/sp/lunname.out /reports/sp/currentsp.out >  /reports/sp/clariion_hostname.spowner.csv

./csv2htm.pl -e -T -i /reports/sp/clariion_hostname.spowner.csv -o /reports/sp/clariion_hostname.spowner.html

#Determine SP mismatches between LUNs and SPs, output to separate files

cat /reports/sp/clariion_hostname.spowner.csv | grep 'SP B' > /reports/sp/clariion_hostname_spb.csv

grep SPA /reports/sp/clariion_hostname_spb.csv > /reports/sp/clariion_hostname_spb_mismatch.csv

cat /reports/sp/clariion_hostname.spowner.csv | grep 'SP A' > /reports/sp/clariion_hostname_spa.csv

grep SPB /reports/sp/clariion_hostname_spa.csv > /reports/sp/clariion_hostname_spa_mismatch.csv

#Convert csv output files to HTML for intranet site

./csv2htm.pl -e -d -T -i /reports/sp/clariion_hostname_spa_mismatch.csv -o /reports/sp/clariion_hostname_spa_mismatch.html

./csv2htm.pl -e -d -T -i /reports/sp/clariion_hostname_spb_mismatch.csv -o /reports/sp/clariion_hostname_spb_mismatch.html
 The output files look like this (clariion_hostname_spa_mismatch.html from the script):
Name: P1_LUN100_SPA_0000_servername_filesystem1_150G       Current Owner: SPB

Name: P1_LUN101_SPA_0000_servername_filesystem2_250G      Current Owner: SPB

Name: P1_LUN102_SPA_0000_servername_filesystem3_350G      Current Owner: SPB

Name: P1_LUN103_SPA_0000_servername_filesystem4_450G
Current Owner: SPB

Name: P1_LUN104_SPA_0000_servername_filesystem5_550G      
Current Owner: SPB
 The 0000 represents the last four digits of the serial number of the Clariion.

That’s it, a quick and easy way to report on trespassed LUNs in our environment.

How to scrub/zero out data on a decommissioned VNX or Clariion

datawipe

Our audit team needed to ensure that we were properly scrubbing the old disks before sending our old Clariion back to EMC on a trade in.  EMC of course offers scrubbing services that run upwards of $4000 for an array.  They also have a built in command that will do the same job:

navicli -h <SP IP> zerodisk -messner B E D
B Bus
E Enclosure
D Disk

usage: zerodisk disk-names [start|stop|status|getzeromark]

sample: navicli -h 10.10.10.10 zerodisk -messner 1_1_12

This command will write all zero’s to the disk, making any data recovery from the disk impossible.  Add this command to a windows batch file for every disk in your array, and you’ve got a quick and easy way to zero out all the disks.

So, once the disks are zeroed out, how do you prove to the audit department that the work was done? I searched everywhere and could not find any documentation from emc on this command, which is no big surprise since you need the engineering mode switch (-messner) to run it.  Here were my observations after running it:

This is the zeromark status on 1_0_4 before running navicli -h 10.10.10.10 zerodisk -messner 1_0_4 start:

 Bus 1 Enclosure 0  Disk 4

 Zero Mark: 9223372036854775807

 This is the zeromark status on 1_0_4 after the zerodisk process is complete:

(I ran navicli -h 10.10.10.10 zerodisk -messner 1_0_4 getzeromark to get this status)

 Bus 1 Enclosure 0  Disk 4

Zero Mark: 69704

 The 69704 number indicates that the disk has been successfully scrubbed.  Prior to running the command, all disks will have an extremely long zero mark (18+ digits), after the zerodisk command completes the disks will return either a 69704 or 69760 depending on the type of disk (FC/SATA).  That’s be best I could come up with to prove that the zeroing was successful.  Running the getzeromark option on all the disks before and after the zerodisk command should be sufficient to prove that the disks were scrubbed.

Strategies for implementing Multi-tiered FAST VP Storage Pools

After speaking to our local rep and attending many different classes at the most recent EMC World in Vegas, I came away with some good information and a very logical best practice for implementing multi-tiered FAST VP storage pools.

First and foremost, you have to use Flash.  High RPM Fiber Channel drives are neighter capactiy efficient or performance efficient, the highest IO data needs to be hosted on Flash drives.  The most effective split of drives in a storage pool is 5% Flash, 20% Fiber Channel, and 75% SATA.

Using this example, if you have an existing SAN with 167 15,000 RPM 600GB Fiber Channel Drives, you would replace them with 97 drives in the 5/20/75 blend to get the same capacity with much improved performance:

  • 25 200GB Flash Drives
  • 34 15K 600GB Fiber Channel Drives
  • 38 2TB SATA Drives

The ideal scenario is to implement FAST Cache along with FAST VP.  FAST Cache continously ensures that the hottest data is serverd from Flash Drives.  With FAST Cache, up to 80% of your data IO will come from Cache (Legacy DRAM Cache served up only about 20%).

It can be a hard pill to swallow when you see how much the Flash drives cost, but their cost is negated by increased disk utilization and reduction in the number of total drives and DAEs that you need to buy.   With all FC drives, disk utilization is sacrificed to get the needed performance – very little of the capacity is used, you just buy tons of disks in order to get more spindles in the raid groups for better performance.  Flash drives can achieve much higher utilization, reducing the effective cost.

After implementing this at my company I’ve seen dramatic performance improvements.  It’s an effective strategy that really works in the real world.

In addition to this, I’ve also been implementing storage pools in pairs of two, each sized identically.  The first pool is designated only for SP A, the second is for SPB.  When I get a request for data storage, in this case let’s say for 1 TB, I will create a 500GB LUN in the first pool on SP A, and a 500GB LUN in the second pool on SP B.  When the disks are presented to the host server, the server administrator will then stripe the data across the two LUNs.  Using this method, I can better balance the load across the storage processors on the back end.

Reporting on Soft media errors

 

Ah, soft media errors.  The silent killer.  We had an issue with one of our Clariion LUNs that had many uncorrectable sector errors.  Prior to the LUN failure, there were hundreds of soft media errors reported in the navisphere logs.  Why weren’t we alerted about them?  Beats me.  I created my own script to pull and parse the alert logs so I can manually check for these type of errors.

What exactly is a soft media error?  Soft Media errors indicate that the SAN has identified a bad sector on the disk and is reconstructing the data from RAID parity data  in order to fulfill the read request.   It can indicate a failing disk.

To run a report that pulls only soft media errors from the SP log, put the following in a windows batch file:

naviseccli -h <SP IP Address> getlog >textfile.txt

for /f "tokens=1,2,3,4,5,6,7,8,9,10,11,12,13,14" %%i in ('findstr Soft textfile.txt') do (echo %%i %%j %%k %%l %%m %%n %%o %%p %%q %%r %%s %%t %%u %%v)  >>textfile_mediaerrors.txt

The text file output looks like this:

10/25/2010 19:40:17 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:22 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:22 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:27 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:27 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:33 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:33 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:38 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:38 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:44 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:44 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:49 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5
 10/25/2010 19:40:49 Enclosure 6 Disk 7 (820) Soft Media Error [0x00] 0 5

If you see lots of soft media errors, do yourself a favor and open a case with EMC.  Too many can lead to the failure of one of your LUNs.

The script can be automated to run and send an email with daily alerts, if you so choose.  I just run it manually about once a week for review.

Tiering reports for EMC’s FAST VP

Note: On a separate blog post, I shared a script to generate a report of the tiering status of all LUNs.

One of the items that EMC did not implement along with FAST VP is the ability to run a canned report on how your LUNs are being allocated among the different tiers of storage.  While there is no canned report, alas, it is possible to get this information from the CLI.

The naviseccli –h {SP IP or hostname} lun –list –tiers command fits the bill. It shows how a specific LUN is distributed across the different drive types.  I still need to come up with a script to pull out only the information that I want, but the info is definitely in the command’s output.

Here’s the sample output:

LOGICAL UNIT NUMBER 6
 Name:  LUN 6
 Tier Distribution:
 Flash:  13.83%
 FC:  86.17%

The storagepool report gives some good info as well.  Here’s an excerpt of what you see with the naviseccli –h {SP IP or hostname} storagepool –list –tiers command:

SPA

Tier Name:  Flash
 Raid Type:  r_5
 User Capacity (GBs):  1096.07
 Consumed Capacity (GBs):  987.06
 Available Capacity (GBs):  109.01
 Percent Subscribed:  90.05%
 Data Targeted for Higher Tier (GBs):  0.00
 Data Targeted for Lower Tier (GBs):  11.00

Tier Name:  FC
 Raid Type:  r_5
 User Capacity (GBs):  28981.77
 Consumed Capacity (GBs):  10592.65
 Available Capacity (GBs):  18389.12
 Percent Subscribed:  36.55%

Tier Name:  SATA
 Raid Type:  r_5
 User Capacity (GBs):  11004.67
 Consumed Capacity (GBs):  260.02
 Available Capacity (GBs):  10744.66
 Percent Subscribed:  2.36%
 Data Targeted for Higher Tier (GBs):  3.00
 Data Targeted for Lower Tier (GBs):  0.00
 Disks (Type):

SPB

Tier Name:  Flash
 Raid Type:  r_5
 User Capacity (GBs):  1096.07
 Consumed Capacity (GBs):  987.06
 Available Capacity (GBs):  109.01
 Percent Subscribed:  90.05%
 Data Targeted for Higher Tier (GBs):  0.00
 Data Targeted for Lower Tier (GBs):  25.00

Tier Name:  FC
 Raid Type:  r_5
 User Capacity (GBs):  28981.77
 Consumed Capacity (GBs):  10013.61
 Available Capacity (GBs):  18968.16
 Percent Subscribed:  34.55%
 Data Targeted for Higher Tier (GBs):  25.00
 Data Targeted for Lower Tier (GBs):  0.00

Tier Name:  SATA
 Raid Type:  r_5
 User Capacity (GBs):  11004.67
 Consumed Capacity (GBs):  341.02
 Available Capacity (GBs):  10663.65
 Percent Subscribed:  3.10%
 Data Targeted for Higher Tier (GBs):  20.00
 Data Targeted for Lower Tier (GBs):  0.00

Good stuff in there.   It’s on my to-do list to run these commands periodically, and then parse the output to filter out only what I want to see.  Once I get that done I’ll post the script here too.

Note: I did create and post a script to generate a report of the tiering status of all LUNs.