Category Archives: EMC General

Best Practices for FAST Cache

I recently received a comment asking for more information on EMC’s FAST Cache, specifically about why increased CPU Utilization was observed after a FAST Cache expansion. It’s likely due to the rebuilding of the cache after the expansion and possibly having it enabled on LUNs that shouldn’t, like those with high sequential I/O. It’s hard to pinpoint the exact cause of an issue like that without a thorough analysis of the array itself, however.   I thought I’d do a quick write-up of EMC’s best practices for implementing FAST Cache and the caveats to consider when implementing it.

What is FAST Cache?

First, a quick overview of what it is.  EMC’s FAST Cache uses a RAID set of EFD drives that sits between DRAM Cache and the disks themselves. It holds a large percentage of the most often used data in high performance EFD drives.  It hits a price/performance sweet spot between DRAM and traditional spinning disks for cache, and can greatly increase array performance.

The theory behind FAST Cache is simple:  we divide the array’s storage up in 64KB blocks, we count the number of hits on those blocks, and then we create a cache page on the FAST Cache EFDs if there have been three read (or write) hits on that block.  If FAST Cache fills up, the array will start to seek pages in the EFDs that will make a full stripe write to the spinning disks in the array, and then force flush out to traditional spinning disks.

FAST Cache uses a “three strikes” algorithm.  If you are moving large amounts of data, the FAST Cache algorithm does not activate, which is by design, as cache does not help at all in large copy transactions.  Random hits on active blocks, however,  will ultimately cache those blocks into FAST Cache.  This is where the 64KB granularity makes a difference.  Typical workloads I/O are 64KB or less, and there is a significant chance that even if a workload is performing 4KB reads and writes to different blocks, they will still hit the same 64KB FAST Cache block, resulting in the promotion of that data into FAST Cache.  Cool, right?  It works very well in practice.  With all that said, there are still plenty of implementation considerations for an ideal FAST Cache configuration.  Below is an overview of EMC’s best practices.

Best Practices for LUNs and Pools

  • Only use it where you need it. The FAST Cache driver has to track every I/O to calculate whether a block needs promotion to FAST Cache, which then adds to the SP CPU utilization.  As a best practice, you should disabling FAST Cache for LUNs that won’t need it.  It will cut this overhead and thus can improve overall performance levels.  Having a separate storage pool for LUNs that don’t need FASTCache would be ideal.

Disable FASTCache for the following LUN types:

– Secondary Mirror and Clone destination LUNs
– LUNs with small, high sequential I/O, such as Oracle Database Logs & snapsure dvols
– LUNs in the reserved LUN pool.
– Recoverpoint Journal LUNs
– SnapView Clones and MirrorView Secondary Mirrors

  • Analyze where you need it most.  Based on a workload analysis, I’d consider restricting the use of FAST Cache to the LUNs or Pools that need it the most.  For every new block that is added into FAST Cache, old blocks that are the oldest in terms of the most recent access are removed.  If your FAST Cache capacity is limited, even frequently accessed blocks may be removed before they’re accessed again.
  • Upgrade to the latest OS Release. On the VNX platform, upgrading to the latest FLARE or MCx release can greatly improve the performance of FAST Cache.  It’s been a few years now, but as an example r32 recovers performance much faster after a FAST Cache drive failure compared to r31, as well as automatically avoiding the promotion of small sequential block I/O to FAST Cache.  It’s always a good idea to run a current version of the code.

Best Practices For VNX arrays with MCx:

  • Spread it out. Spread the drives as evenly as possible across the available backend busses.  Be careful, though, as you shouldn’t add more than 8 FAST Cache flash drives per bus including any unused flash drives for use as hot-spares.
  • Always use DAE 0. Try and use DAE 0 on each bus for flash drives as it provides for the lowest latency.

Best Practices for VNX and CX4 arrays with FLARE 30-32: 

  • CX4? No more than 4 per bus. If you’re still using an older CX4 series array, don’t use more than 4 FAST Cache drives per bus, and don’t put all of them on bus 0. If they are all on the same bus, they could completely saturate this bus with I/O.
  • Spread it out. Spread the FAST Cache drives over as many buses as possible. This would especially be an issue if the drives were all on bus 0, because it is used to access the vault drives.  Note that the VNX has six times the back-end bandwidth per bus compared to a CX, so it’s less of a concern.
  • Match the drive sizes. All the drives in FAST Cache must be of the same capacity; otherwise the workload on each drive would rise proportionately with its capacity.  In other words, a 200GB drive would have double the workload of a 100Gb drive.
  • VNX? Use enclosure 0. Put the EFD drives in the first DAE on any bus (i.e. Enclosure 0).  The I/O has to pass through the LCC of each DAE between the drive and the SP, and each extra LCC it passes through will add a small amount of latency. The latency would normally be negligible, but is significant for flash drives.  Note that on the CX4, all I/O has to pass through every LCC anyway.
  • Mind the order the disks are added.  The order the drives are added dictates which drives are primary & secondary. The first drive added is the primary for the first mirror, the next drive added is its secondary for the first mirror, the third drive is the primary for the second mirror, etc.
  • Location, Location, Location. It’s a more advanced configuration and requires the use of the CLI, but for highest availability place the primary and secondary for each FAST Cache RAID 1 pair are on different buses.

 

 

 

 

Advertisements

XtremIO Manual Log File Collection Procedure

If you have a need to gather XtremIO logs for EMC to analyze and they are unable to connect via ESRS, there is a method to gather them manually.  Below are the steps on how to do it.

1. Log in to the XtremIO Management System (XMS) GUI interface with the ‘admin‘ user account.

2. Click on the ‘Administration‘ tab, which is on the top of the XtremIO Management System (XMS) GUI banner bar.

3. On the left side of the Administration window, choose the ‘CLI Terminal‘ option.

4. Once you have the CLI terminal up, enter the following CLI command at the ‘xmcli (admin)>‘ prompt.  This command will generate a set of XtremIO dossier log files: create-debug-info.  Note that it may take a little while to complete.  Once the command completes and returns you to the ‘xmcli (admin)>’ prompt, a complete package of XtremIO dossier log files will be available for you to download.

Example:

xmcli (admin)> create-debug-info
The process may take a while. Please do not interrupt.
Debug info collected and could be accessed via http:// <XMS IP Address> /XtremApp/DebugInfo/104dd1a0b9f56adf7f0921d2f154329a.tar.xz

Important Note: If you have more than one cluster managed by the XMS server, you will need to select the specific cluster.

xmcli (e012345)> show-clusters

Cluster-Name Index State  Gates-Open Conn-State Num-of-Vols Num-of-Internal-Volumes Vol-Size UD-SSD-Space Logical-Space-In-Use UD-SSD-Space-In-Use Total-Writes Total-Reads Stop-Reason Size-and-Capacity

XIO-0881     1     active True       connected  253         0                       60.550T  90.959T      19.990T              9.944T              44

2.703T     150.288T    none        4X20TB

XIO-0782     2     active True       connected  225         0                       63.115T  90.959T      20.993T              9.944T              20

7.608T     763.359T    none        4X20TB

XIO-0355     3     active True       connected  6           0                       2.395T   41.111T      1.175T               253.995G            6.

251T       1.744T      none        2X40TB

xmcli (e012345)> create-debug-info cluster-id=3

5. Once the ‘create-debug-info‘ command completes, you can use a web browser to navigate to the HTTP address link that’s provided in the terminal session window.  After navigating to the link, you’ll be presented with a pop-up window asking you to save the log file package to your local machine.  Save the log file package to your local machine for later upload.

6. Attach the XtremIO dossier log file package you downloaded to the EMC Service Request (SR) you currently have open or are in the process of opening.  Use the ‘Attachments’ (the paperclip button) area located on the Service Request page for upload.

7. You also have the ability to view a historical listing of all XtremIO dossier log file packages that are currently available on your system. To view them, issue the following XtremIO CLI command: show-debug-info. A series of log file packages will be listed.  It’s possible EMC may request a historical log file package for baseline information when troubleshooting.  To download, simply reference the HTTP links listed under the ‘Output-Path‘ header and input the address into your web browser’s address bar to start the download.

Example:

xmcli (tech)> show-debug-info
 Name  Index  System-Name   Index   Debug-Level   Start-Time                 Create-Time               Output-Path
 1      XtremIO-SVT   1       medium        Mon Aug 14 15:55:10 2017   Mon Aug 14 16:09:40 2017  http://<XMS IP Address>/XtremApp/ DebugInfo/1aaf4b1acd88433e9aca5b022b5bc43f.tar.xz
 2      XtremIO-SVT   1       medium        Mon Aug 14 15:55:10 2017   Mon Aug 14 16:09:40 2017  http://<XMS IP Address>/XtremApp/ DebugInfo/af5001f0f9e75fdd9c0784c3d742531f.tar.xz

That’s it! It’s a fairly straightforward process.

 

 

VNX NAS Files incorrectly report as Locked for Editing

When opening a shared Microsoft Office file, you may see the error message “File in Use, file_name is locked for editing by user_name“, when in fact no other user is currently using the file.

We had users that would view files with the preview pane, it would create a lock on the file, and then when the explorer window was closed the lock would remain.  The next time the file was accessed it would state that it was locked even though the user didn’t have it open.  Below are some steps you can take to troubleshoot and resolve the issue.  Note that changing some of these parameters can have a performance impact, so make these changes at your own risk.  Oplocks let clients lock files and locally cache information while preventing another user from changing the file. This increases performance for many file operations.

1. Disable Oplocks on the VNX

Disabling oplocks can affect client performance. It will increase the number of metadata requests that are sent to the server because when you use SMB with oplocks, the client caches the data that is locked to speed up access to frequently accessed files. When oplocks are disabled, the client does not cache data and all reads are made directly to the NAS server.

Syntax for disabling oplocks and verifying the change:

[nasadmin@VNX1 ~]$ server_mount vdm_file_system -o nooplock test_file_system01_fs /test_file_system01
vdm_file_system : done

[nasadmin@VNX1 ~]$ server_mount vdm_file_system | grep test_file_system01_fs
test_file_system01_fs on /test_file_system01 uxfs,perm,rw,noprefetch,nonotify,accesspolicy=NATIVE,nooplock

2. Disable caching on the Windows client

The Windows client setting controls the cache lifetime. As stated earlier, if caching is disabled on the windows client then all reads are to the NAS server directly. In order to to disable caching on the windows client rather than disabling oplocks on the VNX Data Mover, the following three registry changes would need to be made:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters\

– Directory cache,  set DirectoryCacheLifetime to Zero.
– File Not Found cache, set FileNotFoundCacheLifetime to Zero.
– File information cache, set FileInfoCacheLifetime to Zero.

3. Apply a Microsoft hotfix

The Microsoft KB article http://support.microsoft.com/kb/942146 describes the problem and the fix in detail.  It directly addresses the issue with files locking from the preview pane.  It applies to all versions of Windows Vista and 7 as well as Windows Server 2008.

VNX Block and File Password Change Procedure

Below is the procedure for changing the passwords on a Unified VNX on both the block and file sides.

Please note that Changing the Global VNX administrator password can cause communication failures between the Control Station and the array, the issue is documented in emc261195 and is the reason I’m adding this post, I was researching how to avoid the issue. The article notes that in DART OS versions newer than v7.0.14 the synchronization was automated and the cached credentials are updated automatically, in DART OS v7.0.14 and prior you must do it manually on the active control station.

Whenever a change is made to the active Control Station always verify that the standby control station configuration matches on takeover.  Takeover is initiated by the standby control station, failover is initiated by the active control station. As an example, if the time zone is changed on the active control station it is not part of the synchronization during the failover process. Time zone changes needs to be configured separately on each one and is a setting that requires a reboot to take effect. Unisphere will prompt you to do so as a reminder, however on takeover/failover the newly promoted control station never reboots.

Block Side: Change the sysadmin global domain account

A) Updating global domain account password

1) Log into Unisphere with the sysadmin global account, using the control station IP

2) From the “All Systems” page select “Domains”

3) Select “Manage Global Users”

4) Highlight sysadmin and click on “Modify”

5) Change the password

B) Update Security on Control Station

1) Open a putty session to the primary control station and run the commands below. They should be as is, however a possible exception would be the first 2 might make you add userid/password info before it would be accepted (add -user sysadmin –password “pswd ” –scope 0 to the commands below).

/nas/sbin/naviseccli -h spa -AddUserSecurity -user sysadmin -scope 0

/nas/sbin/naviseccli -h spb -AddUserSecurity -user sysadmin -scope 0

nas_storage –modify id=1 –security

 C) Verify the updated sysadmin password in the following locations:

1) Via Unisphere (Log in with the newly changed password)

2) Verify communication between the control station and storage processors on each array:

* Log in to the active control station via putty using the nasadmin local account

* Run the following commands:

/nas/sbin/navicli -h SPA getagent

nas_storage -check -all

The NAS storage check command should respond with “done”.

File Side: Change the nasadmin and root local accounts

A) Local accounts need to be modified on each array individually

1) Log into Unisphere with the sysadmin global account

2) Select the desired array

3) Click on “Settings” -> “Security” -> “Local Users for File”

4) Highlight nasadmin click on “Properties”

5) Change the password

6) Highlight root, click on Properties

7) Change the password

Note that the password for the local nasadmin and root accounts can also be changed from the CLI:

[root@fakevnxprompt ~]# passwd nasadmin
Changing password for user nasadmin. 
New UNIX password: <enter a password>
BAD PASSWORD: it is based on a dictionary word 
Retype new UNIX password: 
passwd: all authentication tokens updated successfully. 
[root@fakevnxprompt ~]#

B) Verify both passwords

1) Log in to the active control station via putty as nasadmin, verify your newly changed password

2) run the su command and verify your newly changed root password

C) Propagate changes to the standby control station

At this point the standby control station local account passwords have not yet been updated. It’s now time to test control station failover.  You can review one of my related prior blog posts on control station failover here.

1) While logged in to the active control station with root privileges, run this command:

/nas/sbin/cs_standby -failover

This will synchronize the control stations, reboot the active control station, and then make the standby control station active.

Caveats: Please note the expected issues listed below as part of out-of-band communication without an online, active control station:

* In-band production data will not be disrupted

* Data mover failover cannot occur

* Auto-extension of filesystems will not occur

* Scheduled checkpoints will not occur

* Replication sessions may be disrupted

2) Log in to the active control station (the previous standby control station)

3) Verify the new nasadmin password

4) su and verify the root password

5) Fail back to the original primary control station:

/nas/sbin/cs_standby -failover

 

 

 

Web interface disabled on brocade switch

I ran into an issue where one of our brocade switches was inaccessible via the web browser. The error below was displayed when connecting to the IP:

Interface disabled
This Interface (10.2.2.23) has been blocked by the administrator.

In order to resolve this, you’ll need to allow port 80 traffic on the switch.  It was disabled on mine.

First, Log in to the switch and review the existing IP filters (Look for port 80 set to deny):

switcho1:admin> ipfilter –show

Name: default_ipv4, Type: ipv4, State: active
Rule Source IP Protocol Dest Port Action
1 any tcp 22 permit
2 any tcp 23 deny
3 any tcp 897 permit
4 any tcp 898 permit
5 any tcp 111 permit
6 any tcp 80 deny
7 any tcp 443 permit
8 any udp 161 permit
9 any udp 111 permit
10 any udp 123 permit
11 any tcp 600 – 1023 permit
12 any udp 600 – 1023 permit

Next, clone the default policy, as you cannot make changes to the default policy.  Note that you can name the policy anything you like, I chose to name it “Allow80”.

ipfilter –clone Allow80 -from default_ipv4

Delete the rule that denys port 80 (rule 6 in the above example):

ipfilter –delrule Allow80 -rule 6

Add a rule back in to permit it:

ipfilter –addrule Allow80 -rule 12 -sip any -dp 80 -proto tcp -act permit

Save it:

ipfilter –save Allow80

Activate it (this will change default policy to a “defined” state):

ipfilter –activate Allow80

 

That’s it… you should now be able to access your switch via the web browser.

VPLEX Unisphere Login hung at “Retrieving Meta-Volume Information”

I recently had an issue where I was unable to log in to the Unisphere GUI on the VPLEX, it would hang with the message “Retrieving Meta-Volume Information” after progressing about 30% on the progress bar.

This was caused by a hung Java process.  In order to resolve it, you must restart the management server. This will not cause any disruption to hosts connected to the VPLEX.

To do this, run the following command:

ManagementServer:/> sudo /etc/init.d/VPlexManagementConsole restart

If this hangs or does not complete, you will need to run the top command to identify the PID for the java service:

admin@service:~>top
Mem:   3920396k total,  2168748k used,  1751648k free,    29412k buffers
Swap:  8388604k total,    54972k used,  8333632k free,   527732k cached

  PID USER      PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+  COMMAND
26993 service   20   0 2824m 1.4g  23m S     14 36.3  18:58.31 java
 4948 rabbitmq  20   0  122m  42m 1460 S      1  1.1  13118:32 beam.smp
    1 root      20   0 10540   48   36 S      0  0.0  12:34.13 init

Once you’ve identified the PID for the java service, you can kill the process with the kill command, and then run the command to restart the management console again.

ManagementServer:/> sudo kill -9 8798
ManagementServer:/> sudo /etc/init.d/VPlexManagementConsole start

Once the management server restarts, you should be able to log in to the Unisphere for VPLEX GUI again.

Default Passwords

default.jpg

Here is a collection of default passwords for EMC, HP, Cisco, Pure, VMware, TrendMicro and IBM hardware & software.

EMC Secure Remote Support (ESRS) Axeda Policy Manager Server:

  • Username: admin
  • Password: EMCPMAdm7n

EMC VNXe Unisphere (EMC VNXe Series Quick Start Guide, step 4):

  • Username: admin
  • Password: Password123#

EMC vVNX Unisphere:

  • Username: admin
  • Password: Password123#
    NB You must change the administrator password during this first login.

EMC CloudArray Appliance:

  • Username: admin
  • Password: password
    NB Upon first login you are prompted to change the password.

EMC CloudBoost Virtual Appliance:
https://<FQDN>:4444

  • Username: local\admin
  • Password: password
    NB You must immediately change the admin password.
    $ password <current_password> <new_password>

EMC Ionix Unified Infrastructure Manager/Provisioning (UIM/P):

  • Username: sysadmin
  • Password: sysadmin

EMC VNX Monitoring and Reporting:

  • Username: admin
  • Password: changeme

EMC RecoverPoint:

  • Username: admin
    Password: admin
  • Username: boxmgmt
    Password: boxmgmt
  • Username: security-admin
    Password: security-admin

EMC XtremIO:

XtremIO Management Server (XMS)

  • Username: xmsadmin
    password: 123456 (prior to v2.4)
    password: Xtrem10 (v2.4+)

XtremIO Management Secure Upload

  • Username: xmsupload
    Password: xmsupload

XtremIO Management Command Line Interface (XMCLI)

  • Username: tech
    password: 123456 (prior to v2.4)
    password: X10Tech! (v2.4+)

XtremIO Management Command Line Interface (XMCLI)

  • Username: admin
    password: 123456 (prior to v2.4)
    password: Xtrem10 (v2.4+)

XtremIO Graphical User Interface (XtremIO GUI)

  • Username: tech
    password: 123456 (prior to v2.4)
    password: X10Tech! (v2.4+)

XtremIO Graphical User Interface (XtremIO GUI)

  • Username: admin
    password: 123456 (prior to v2.4)
    password: Xtrem10 (v2.4+)

XtremIO Easy Installation Wizard (on storage controllers / nodes)

  • Username: xinstall
    Password: xiofast1

XtremIO Easy Installation Wizard (on XMS)

  • Username: xinstall
    Password: xiofast1

Basic Input/Output System (BIOS) for storage controllers / nodes

  • Password: emcbios

Basic Input/Output System (BIOS) for XMS

  • Password: emcbios

EMC ViPR Controller :
http://ViPR_virtual_ip (the ViPR public virtual IP address, also known as the network.vip)

  • Username: root
    Password: ChangeMe

EMC ViPR Controller Reporting vApp:
http://<hostname>:58080/APG/

  • Username: admin
    Password: changeme

EMC Solutions Integration Service:
https://<Solutions Integration Service IP Address>:5480

  • Username: root
    Password: emc

EMC VSI for VMware vSphere Web Client:
https://<Solutions Integration Service IP Address>:8443/vsi_usm/

  • Username: admin
  • Password: ChangeMe

 

Note:
After the Solutions Integration Service password is changed, it cannot be modified.
If the password is lost, you must redeploy the Solutions Integration Service and use the default login ID and password to log in.

EMC Avamar Backup Service

  • username: admin
  • password: changeme

openSSH key
ssh-agent bash
ssh-add ~admin/.ssh/admin_key
Password: P3t3rPan

Pure Storage Arrays

  • Username: pureuser
  • Password: pureuser

Cisco Integrated Management Controller (IMC) / CIMC / BMC:

  • Username: admin
  • Password: password

Cisco UCS Director:

  • Username: admin
  • Password: admin
  • Username: shelladmin
  • Username: changeme

Hewlett Packard P2000 StorageWorks MSA Array Systems:

  • Username: admin
  • Password: !admin (exclamation mark ! before admin)
  • Username: manage
  • Password: !manage (exclamation mark ! before manage)
IBM Security Access Manager Virtual Appliance:
  • Username: admin
  • Password: admin

VCE Vision:

  • Username: admin
  • Password: 7j@m4Qd+1L
  • Username: root
  • Password: V1rtu@1c3!

VMware vSphere Management Assistant (vMA):

  • Username: vi-admin
  • Password: vmware

VMware Data Recovery (VDR):

  • Username: root
  • Password: vmw@re (make sure you enter @ as Shift-2 as in US keyboard layout)

VMware vCenter Hyperic Server:
https://Server_Name_or_IP:5480/

  • Username: root
  • Password: hqadmin

https://Server_Name_or_IP:7080/

  • Username: hqadmin
  • Password: hqadmin

VMware vCenter Chargeback:
https://Server_Name_or_IP:8080/cbmui

  • Username: root
  • Password: vmware

VMware vCenter Server Appliance (VCSA) 5.5:
https://Server_Name_or_IP:5480

  • Username: root
  • Password: vmware

VMware vCenter Operations Manager (vCOPS):

Console access:

  • Username: root
  • Password: vmware

Manager:
https://Server_Name_or_IP

  • Username: admin
  • Password: admin

Administrator Panel:
https://Server_Name_or_IP/admin

  • Username: admin
  • Password: admin

Custom UI User Interface:
https://Server_Name_or_IP/vcops-custom

  • Username: admin
  • Password: admin

VMware vCenter Support Assistant:
http://Server_Name_or_IP

  • Username: root
  • Password: vmware

VMware vCenter / vRealize Infrastructure Navigator:
https://Server_Name_or_IP:5480

  • Username: root
  • Password: specified during OVA deployment

VMware ThinApp Factory:

  • Username: admin
  • Password: blank (no password)

VMware vSphere vCloud Director Appliance:

  • Username: root
  • Password: vmware

VMware vCenter Orchestrator :
https://Server_Name_or_IP:8281/vco – VMware vCenter Orchestrator
https://Server_Name_or_IP:8283 – VMware vCenter Orchestrator Configuration

  • Username: vmware
  • Password: vmware

VMware vCloud Connector Server (VCC) / Node (VCN):
https://Server_Name_or_IP:5480

  • Username: admin
  • Password: vmware
  • Username: root
  • Password: vmware

VMware vSphere Data Protection Appliance:

  • Username: root
  • Password: changeme

VMware HealthAnalyzer:

  • Username: root
  • Password: vmware

VMware vShield Manager:
https://Server_Name_or_IP

  • Username: admin
  • Password: default
    type enable to enter Privileged Mode, password is 'default' as well

Teradici PCoIP Management Console:

  • The default password is blank

Trend Micro Deep Security Virtual Appliance (DS VA):

  • Login: dsva
  • password: dsva

Citrix Merchandising Server Administrator Console:

  • User name: root
  • password: C1trix321

VMTurbo Operations Manager:

  • User name: administrator
  • password: administrator
    If DHCP is not enabled, configure a static address by logging in with these credentials:
  • User name: ipsetup
  • password: ipsetup
    Console access:
  • User name: root
  • password: vmturbo

EMC World 2015

I’m at EMC World in Las Vegas this week and it’s been fantastic so far.  I’m excited about the new 40TB XtremIO X-bricks and how we might leverage that for our largest and most important 80TB oracle database, also excited about possible use cases for  the Virtual VNX in our small branch locations, and all the other exciting futures that I can’t publicly share because I’m under an NDA with EMC.  Truly exciting and innovative technology is coming from them.  VXblock was also really impressive, although that’s not likely something my company will implement anytime soon.

I found out for the first time today that the excellent VNX Monitoring and Reporting application is now free for the VNX1 platform as well as VNX2.  If you would like to get a license for any of your VNX1 arrays, simply ask your local  sales representative to submit a zero dollar sales order for a license.  We’re currently evaluating ViPR SRM as a replacement for our soon to be “end of life” Control Center install, but until then VNX MR is a fantastic tool that provides nearly the same performance data for no cost at all.  SRM adds much more functionality beyond just VNX monitoring and reporting (i.e., monitoring SAN switches) and I’d highly recommend doing a demo if you’re also still using Control Center.

We also implemented a VPLEX last year and it’s truly been a lifesaver and is an amazing platform.  We currently have a VPLEX local implantation in our primary data center and it’s allowed us to easily migrate workloads from one array to another seamlessly with no disruption to applications.   I’m excited about the possibilities with RecoverPoint as well, I’m still learning about it.

If anyone else who’s at EMC World happens to read this, comment!  I’d love to hear your experiences and what you’re most excited about with EMC’s latest technology.

Gathering performance data on a virtual windows server

When troubleshooting a potential storage related performance problem on a virtual windows server, it’s a bit more difficult to anaylze a because many virtual hosts share the same LUN for a datastore in ESX.  Using EMC’s analyzer or Control Center Performance Manager only gives me statistics on specific disks or LUNs, I have no visibility into a specific virtual server with those tools.  When this situation arises, I use a windows batch script to gather data with the typeperf command line utility for a specific time period and run it directly on the server.  Typically I’ll let it run for 24 hours and then analyze the data in Excel, where it’s easy to make charts and graphs to get a visual view of what’s going on.

Sometimes the most difficult thing to figure out is the correct syntax for the command and which parameters to use.  For reference, here is the command and it’s parameters:

Syntax:

Typeperf [Path [path ...]] [-cf FileName] [-f {csv|tsv|bin}] [-si interval] [-o FileName] [-q [object]] [-qx [object]] [-sc samples] [-config FileName] [-s computer_name]

Parameters:

-c { Path [ path ... ] | -cf   FileName } : Specifies the performance counter path to log. To list multiple counter paths, separate each command path by a space.
 -cf FileName : Specifies the file name of the file that contains the counter paths that you want to monitor, one per line.
 -f { csv | tsv | bin } : Specifies the output file format. File formats are csv (comma-delimited), tsv (tab-delimited), and bin (binary). Default format is csv.
 -si interval [ mm: ] ss   : Specifies the time between samples, in the [mm:] ss format. Default is one second.
 -o FileName   : Specifies the pathname of the output file. Defaults to stdout.
 -q [ object ] : Displays and queries available counters without instances. To display counters for one object, include the object name.
 -qx [ object ] : Displays and queries all available counters with instances. To display counters for one object, include the object name.
 -sc samples : Specifies the number of samples to collect. Default is to sample until you press CTRL+C.
 -config FileName : Specifies the pathname of the settings file that contains command line parameters.
 -s computer_name : Specifies the system to monitor if no server is specified in the counter path.
 /? : Displays help at the command prompt.

EMC’s Analyzer vs. Windows Perfmon Metrics

I tend to look at Response time, disk queue length, Total/Read/Write IO, and Service time first.   I dive into how to interpret many of the SAN performance metrics in my older post here. 

The counters you’ll choose in Windows performance monitor don’t precisely line up with what we commonly look at using EMC’s tools in how they are named, and in addition you can choose ‘LogicalDisk’ and ‘PhysicalDisk’ when selecting the counters.

What is the difference between the Physical Disk vs. Logical Disk performance objects in Perfmon, and why monitor both? Their counters are calculated the same way but their scope is different. I generally use both “\LogicalDisk(*)\” and “\PhysicalDisk(*)\” when I run my perfmon script.

The Physical Disk performance object monitors disk drives on the computer. It identifies the instances representing the physical hardware, and the counters are the sum of the access to all partitions on the physical instance.

The Logical Disk Performance object monitors logical partitions. Performance monitor identifies logical disks by their drive letter or mount point. If a physical disk contains multiple partitions, this counter will report the values just for the partition selected and not for the entire disk. On the other hand, when using Dynamic Disks the logical volumes may span more than one physical disk, in this scenario the counter values will include the access to the logical disk in all the physical disks it spans.

Here are the performance monitor counters that I frequently use, and how they compare to EMC’s navisphere analyzer (or ECC):

“\LogicalDisk(*)\Avg. Disk Queue Length” – (Named the same as EMC) The average number of outstanding requests when the disk was busy
“\LogicalDisk(*)\%% Disk Time” – (No direct EMC equivalent) The “% Disk Time” counter is the “Avg. Disk Queue Length” counter multiplied by 100. It is the same value displayed in a different scale.
“\LogicalDisk(*)\Disk Transfers/sec” – Total Throughput (IO/sec) – the total number of individual disk IO requests completed over a period of one second.  We’ll use this value to help determine Disk Service Time.
“\LogicalDisk(*)\Disk Reads/sec” – Read Throughput (IO/sec)
“\LogicalDisk(*)\Disk Writes/sec” – Write Throughput (IO/sec)
“\LogicalDisk(*)\%% Idle Time” –  (No direct EMC equivalent) This counter provides a very precise measurement of how much time the disk remained in idle state, meaning all the requests from the operating system to the disk have been completed and there are zero pending requests. We’ll also use this to calculate disk service time.
“\LogicalDisk(*)\Avg. Disk sec/Transfer” – Response time (sec) – EMC uses milliseconds, windows uses seconds, so you’ll see 8ms represented as .008 in the results.
“\LogicalDisk(*)\Avg. Disk sec/Read” – Response times for read IO
“\LogicalDisk(*)\Avg. Disk sec/Write” – Response times for write IO

Disk Service Time is caculated with this formula:  Disk Utilization = 100 – %Idle Time, then Disk Utilization  /  Disk Transfers/Sec. = Disk Service Time.

Configuring the Script

This batch script collects all of the relevant data for disk activity.  After 24 hours, it will dump the data into a csv file.  The length of time is controller by the combination of the “-sc” and “-si” parameters.  To collect data in one minute intervals for 24 hours, you’d set si to 60 (collect data every 60 seconds), and sc to 1440 (1440 minutes = 24 hours). To collect data every one minute for 30 minutes, you’d enter “-si 60 -sc 30”.  This script assumes you have a local directory on the C: Drive named ‘Collection’.

@echo off
cd c:\collection

@for /f "tokens=1,2,3,4 delims=/ " %%A in ('date /t') do @(set all=%%A%%B%%C%%D)
@for /f "tokens=1,2,3 delims=: " %%A in ('time /t') do @(set allm=%%A%%B%%C)

typeperf “\LogicalDisk(*)\Avg. Disk Queue Length” “\LogicalDisk(*)\%% Disk Time” “\LogicalDisk(*)\Disk 
Transfers/sec” “\LogicalDisk(*)\Disk Reads/sec” “\LogicalDisk(*)\Disk Writes/sec” “\LogicalDisk(*)\%% Idle Time” 
“\LogicalDisk(*)\Avg. Disk sec/Transfer” “\LogicalDisk(*)\Avg. Disk sec/Read” “\LogicalDisk(*)\Avg. Disk sec/Write” 
“\PhysicalDisk(*)\Avg. Disk Queue Length” “\PhysicalDisk(*)\%% Disk Time” “\PhysicalDisk(*)\Disk Transfers/sec” “\PhysicalDisk(*)\Disk Reads/sec” “\PhysicalDisk(*)\Disk Writes/sec” \PhysicalDisk(*)\%% Idle Time” “\PhysicalDisk(*)\Avg. Disk sec/Transfer” “\PhysicalDisk(*)\Avg. Disk sec/Read” "\PhysicalDisk(*)\Avg. Disk sec/Write” -si 60 -sc 1440 -o PerfCounters-%All%-%Allm%.csv
 

Easy reporting on data domain using the autosupport log

EMC-Data-Domain-DD2200-IMG-03.png

I was looking for an easy (and free) way to do some daily reporting on our data domain hardware.  I was most interested in reporting on disk space, but decided to gather some other data as well.  The easiest method I found is to use the info collected in the autosupport log.  First, I’ll explain how to automatically gather the autosupport log from your data domain, and then move on to how to pull only the information you want from it for reporting purposes.

First, you need to enable ftp on the data domain and add the IP address of the FTP server you’re going to use to pull the file.  Here are the commands you need to run on your data domain:

adminaccess enable ftp
adminaccess add ftp 10.1.1.1 
 

Next, you’ll want to pull the files from the data domain.  I am using a windows FTP server to pull the autosupport files.  The windows batch script I use is scheduled to run once a day and is listed below.  I use the ftp command and call a text file with the specific IP and login info for each data domain box.

ftp -s:10.1.1.2.txt
ftp -s:10.1.1.3.txt
ftp -s:10.2.1.2.txt
ftp -s:10.2.1.3.txt
 

The text files that the ftp script calls (that are named <ipaddress.txt>) look like this:

open 10.1.1.2
sysadmin
<password>
cd support
lcd e:\reports\dd\SITE1_DD_01
get autosupport
quit
 

Once you’ve downloaded the autosupport file, you can run scripts against it to pull out only the info you’re interested in.  I prefer to use unix shell scripts for parsing files because of the extra functionality over standard windows batch scripts.  I have Cygwin installed on my web server to run these shell scripts.

If you take a look at the autosupport log, you’ll notice that it’s very long and contains some duplicate characters in multiple areas, which makes using grep, awk, and sed a bit more challenging.  It took a bit of trial and error, but the scripts below will pull only the specific areas I wanted:  System Alerts, File system compression, Locked files, Replication info, File distribution, and Disk space information.

The first script below will gather disk space information using grep.  Spacing inside the quotes is very important in this case, as I was looking for specific unique strings within the autosupport log, and using grep for these unique strings will produce the correct output.  In this example, I am gathering info for four data domain boxes. For those unfamiliar with Cygwin, you can access windows drive letters by navigating to /cygdrive/<drive letter>.  To view the root of the C drive, you would type ‘cd /cygdrive/c’ from the CLI.  In this case, my windows batch script that pulls the autosupport files places them in the e:\reports\dd folder, which would be /cygdrive/e/reports/dd folder when using the cygwill shell.

Here is the script:

# Site 1 Data Domain 01
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
 
# Site 1 Data Domain 02
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
 
# Site 2 Data Domain 01
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
 
# Site 2 Data Domain 02
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
 
# Copy the reports to the web server directory.
cp /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt /cygdrive/c/inetpub/wwwroot/
cp /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt /cygdrive/c/inetpub/wwwroot/
cp /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt /cygdrive/c/inetpub/wwwroot/
cp /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt /cygdrive/c/inetpub/wwwroot/
 

For the remaining reports, I used the ‘sed’ command rather than ‘grep’.  As an example, I’ll explain how I stripped out the information for the System Alerts section.  In order to strip out it out, I use sed to start cutting text when it reaches ‘Current Alerts’, and stop cutting text when it reaches ‘There are’.   This same process is repeated to pull the information for the other reports as well (alerts, compression, locked files, replication, distribution).

This is what the Current Alerts section looks like in the autosupport log:

Current Alerts
--------------
Alert Id   Time                       Severity   Class               Object          Message                                                             
--------   ------------------------   --------   -----------------   -------------   ---------------------------------------------------------------------
774        Wed Dec 19 11:32:19 2012   CRITICAL   SystemMaintenance                   Core dump capability is now disabled due to lack of space in /ddvar.
807        Sun Jan 20 09:07:18 2013   WARNING    Replication         context=3       repl ctx 3: Sync-as-of time is more than 461 hours ago.             
808        Sun Jan 20 09:07:18 2013   WARNING    Replication         context=4       repl ctx 4: Sync-as-of time is more than 461 hours ago.             
809        Sun Jan 20 09:07:19 2013   WARNING    Replication         context=5       repl ctx 5: Sync-as-of time is more than 461 hours ago.              
814        Mon Jan 28 23:20:45 2013   CRITICAL   Filesystem          FilesysType=2   Space usage in Data Collection has exceeded 95% threshold.          
820        Wed Jan 30 15:52:40 2013   WARNING    Replication         context=2       repl ctx 2: Sync-as-of time is more than 148 hours ago.             
824        Mon Feb  4 12:36:39 2013   WARNING    Replication         context=10      repl ctx 10: Sync-as-of time is more than 24 hours ago.             
825        Wed Feb  6 02:13:51 2013   WARNING    Replication         context=19      repl ctx 19: Sync-as-of time is more than 24 hours ago.             
--------   ------------------------   --------   -----------------   -------------   ---------------------------------------------------------------------
There are 8 active alerts.
 

In this case, sed searches for ‘Current Alerts’ and copies all of the text until it reaches the string ‘There are’, at which point it stops and outputs the file.  The output files are written directly to the wwwroot folder on my internal reporting web page.  Each page encapsulates the output files in an iframe, so the web page is automatically updated every morning when the script runs.

Here is the second script that captures the remaining data:

#For Alerts
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_alerts.txt
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_alerts.txt
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_alerts.txt
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_alerts.txt
 
#For Filesystem Compression
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_filecomp.txt
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_filecomp.txt
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_filecomp.txt
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_filecomp.txt
 
#For Locked Files:
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_lockedfiles.txt
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_lockedfiles.txt
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_lockedfiles.txt
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_lockedfiles.txt
 
#Repl Transferred over 24 hrs
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_repl.txt
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_repl.txt
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_repl.txt
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_repl.txt
 
#File Distribution
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_filedist.txt
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_filedist.txt
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_filedist.txt
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_filedist.txt
 

When all is said and done, the report output looks like this:

Alerts:

Current Alerts
--------------
Alert Id   Time                       Severity   Class         Object          Message                                                   
--------   ------------------------   --------   -----------   -------------   ----------------------------------------------------------
1157       Wed Feb  5 12:23:19 2014   WARNING    Replication   context=3       Sync-as-of time is more than 24 hours ago.                
1161       Sun Feb  9 08:33:48 2014   CRITICAL   Filesystem    FilesysType=2   Space usage in Data Collection has exceeded 95% threshold.
--------   ------------------------   --------   -----------   -------------   ----------------------------------------------------------
There are 2 active alerts.

Filesystem Compression:

Filesys Compression
--------------                  
From: 2014-03-06 05:00 To: 2014-03-13 06:00

                  Pre-Comp   Post-Comp   Global-Comp   Local-Comp      Total-Comp
                     (GiB)       (GiB)        Factor       Factor          Factor
                                                                    (Reduction %)
---------------   --------   ---------   -----------   ----------   -------------
Currently Used:   495122.5    122948.5             -            -     4.0x (75.2)
Written:*                                                                        
  Last 7 days       9204.2      5656.6          1.1x         1.5x     1.6x (38.5)
  Last 24 hrs         74.1        36.4          1.0x         2.0x     2.0x (50.9)
---------------   --------   ---------   -----------   ----------   -------------
 * Does not include the effects of pre-comp file deletes/truncates
   since the last cleaning on 2014/03/11 02:07:50.

Replications:

Replication Data Transferred over 24hr
--------------------------------------
Directory/MTree Replication:
Date         Time         CTX   Pre-Comp (KB)   Pre-Comp (KB)           Replicated (KB)   Low-bw-   Sync-as-of      
                                      Written       Remaining    Pre-Comp       Network     optim   Time            
----------   --------   -----   -------------   -------------   -----------------------   -------   ----------------
2014/03/13   06:36:18       2     104,186,675      86,628,349   55,522,200   37,756,047      1.00   Thu Mar 13 05:53
                            3               0               0            0            0      0.00   Tue Feb  4 12:00
                            4               0               0            0        4,882      0.00   Thu Mar 13 06:00
                            8               0               0            0        5,120      0.00   Thu Mar 13 06:00
                           29               0               0            0        5,098      0.00   Thu Mar 13 06:00
                        (sum)     104,186,675                   55,522,200   37,771,148      1.00  

File Distribution:

File Distribution
-----------------
169,432 files in 8,691 directories

                          Count                         Space
               -----------------------------   --------------------------
         Age         Files       %    cumul%        GiB       %    cumul%
   ---------   -----------   -----   -------   --------   -----   -------
       1 day            13     0.0       0.0       74.4     0.0       0.0
      1 week           223     0.1       0.1     2133.7     0.4       0.4
     2 weeks         5,103     3.0       3.2    39121.2     7.8       8.3
     1 month        12,819     7.6      10.7    79336.5    15.9      24.1
    2 months        21,853    12.9      23.6   153873.8    30.8      54.9
    3 months         5,743     3.4      27.0    45154.6     9.0      64.0
    6 months         3,402     2.0      29.0    13674.3     2.7      66.7
      1 year        17,035    10.1      39.1    72937.7    14.6      81.3
    > 1 year       103,241    60.9     100.0    93461.7    18.7     100.0
   ---------   -----------   -----   -------   --------   -----   -------

                          Count                         Space
               -----------------------------   --------------------------
        Size         Files       %    cumul%        GiB       %    cumul%
   ---------   -----------   -----   -------   --------   -----   -------
       1 KiB        11,257     6.6       6.6        0.0     0.0       0.0
      10 KiB        44,396    26.2      32.8        0.3     0.0       0.0
     100 KiB        38,488    22.7      55.6        1.3     0.0       0.0
     500 KiB         9,652     5.7      61.3        2.1     0.0       0.0
       1 MiB        27,460    16.2      77.5       70.4     0.0       0.0
       5 MiB        12,136     7.2      84.6       27.3     0.0       0.0
      10 MiB         3,861     2.3      86.9       26.9     0.0       0.0
      50 MiB         4,367     2.6      89.5       96.6     0.0       0.0
     100 MiB           853     0.5      90.0       58.2     0.0       0.1
     500 MiB           861     0.5      90.5      201.0     0.0       0.1
       1 GiB           495     0.3      90.8      309.7     0.1       0.2
       5 GiB           567     0.3      91.1     1460.2     0.3       0.5
      10 GiB           336     0.2      91.3     2574.1     0.5       1.0
      50 GiB        14,691     8.7     100.0   494083.5    98.9      99.8
     100 GiB            11     0.0     100.0      683.3     0.1     100.0
     500 GiB             1     0.0     100.0      173.0     0.0     100.0
   > 500 GiB             0     0.0     100.0        0.0     0.0     100.0

Disk Space:

==========  SERVER USAGE   ==========
Active Tier:
Resource           Size GiB   Used GiB   Avail GiB   Use%   Cleanable GiB
/data: pre-comp           -   245211.8           -      -               -
/data: post-comp    51484.9    31811.0     19673.9    62%            35.4
/ddvar                 78.7        7.9        66.8    11%               -

Comparing Dot Hill and EMC Auto Tiering

The problem with auto tiering in general is that a large amount of hot I/Os is “hot” only for a brief moment in time. Workloads are not uniform across an entire data set constantly and if the data isn’t moved in real time it’s very likely that hot data will be accessed from capacity storage.  Ideally a storage system would be able to react to these performance improvements in real time, but the cost in overhead to the storage processors is generally too great.  The problem is somewhat mitigated by having a large Cache (like EMC’s Fast Cache) but the ability to automatically tier data in real time would be ideal.

I recently did a comparison of Dot Hill’s auto tiering strategy to EMC’s, and found that Dot Hill takes a very unique approach to auto tiering that looks promising.  Here’s a brief comparison between the two.

EMC

EMC greatly improved FAST VP on their VNX2.   While the VNX uses 1GB data slices, the VNX2 uses more granular 256MB data slices.  This greatly improves efficiency.  EMC is also shipping MLC SSD instead of SLC SSD. This makes using SSD’s much more affordable in FAST VP pools. (Note that SLC is still required for FAST Cache).

How does the smaller data slice size improve efficiency?  As an example, assume that a 500MB contiguous range of hot data residing on a SAS tier needs to be moved to EFD. ON the VNX1, an entire 1GB slice would be moved.   If this “hot” 1GB slice was moved to a 100GB EFD drive, we would have 100GB of data sitting on the EFD drive but only 50GB of the data is hot.  This is obviously inefficient, as 50% of the data is cold.  With the VNX2’s 256MB data slice size, only 500GB would be moved, resulting in 100% efficiency for that block of data.  The VNX2 makes much more efficient use of the extreme performance and performance tiers.

EMC’s FAST VP auto tiering on the VNX1 is configured by either setting it to manual or creating a schedule.  The schedule can be set to run 24 hours a day to move data in real time, but in practice it’s not practical.  The overhead on the storage processors is simply too great and we’ve configured it to run during off peak hours.  On our busiest VNX1 we see the storage processors jump from ~50% utilization to ~75% utilization when the relocation job is running.  This may improve with the VNX2, but it’s been a problem on the CX series and the VNX1.

vnx_data_slice_explanation

DotHill

According to Dot Hill, their automatic auto tiering doesn’t look at every single IO like the VNX or VNX2, it looks for trends in how data is accessed.  Their rep told me to think of it as examining every 10th IO rather than every single IO.  The idea is to allow the array to move data in real time without overloading the storage processors.  Dot Hill also moves data in 4MB data slices (which is very efficient, as I explained earlier when discussing the VNX2), and will not move more than 80 MB in a 5 second time span (or 960MB/minute maximum) to keep the CPU load down.

So, how does the Dot Hill auto tiering actually work?  They use scoring, scanning, and sorting and each are separate processes that work in real time.  Scoring is used to maintain a current page ranking on every I/O using a process that adds less than one microsecond of overhead.  The algorithm looks at how frequently and how recently the data was accessed.  Higher scores are given to data that is more frequently and recently accessed. Scanning for the high scoring pages happens every 5 seconds. The scanning process uses less than 1.0% of the CPU. The pages with the highest scores become candidates for promotion to SSD.  Sorting is the process that actually moves or migrates the pages up or down based on their score.  As I mentioned earlier, no more than 80 MB of data is moved during any 5 second sort to minimize the overall performance impact.

Summary

I haven’t used EMC’s new VNX2 or Dot Hill’s AssuredSAN to provide any information that uses real world experience.  I think Dot Hill’s implementation looks very promising on paper, and I look forward to reading more customer experiences about it in the future.  They’ve been around a long time but they only recently started offering their products directly to customers as they’ve primarily been an OEM storage manufacturer.  As I mentioned earlier, my experience with EMC’s FAST VP on the CX series and VNX1 show that real time FAST VP consumes too many CPU cycles to be used in real time, we have always run it as an off-business hours process. That’s exactly what Dot Hill’s implementation is trying to address.  We’ve made adjustments to the FAST VP relocation schedule based on monitoring our workload.  We also use FAST Cache, which at least partially solves the problem of suddenly hot data needing extra IO.  FAST Cache and FAST VP work very well together.  Overall I’ve been happy with EMC’s implementation, but it’s good to see another company taking a different approach that could be very competitive with EMC.

You can read more about Dot Hill’s auto tiering here:

http://www.dothill.com/wp-content/uploads/2012/08/RealStorWhitePaper8.14.12.pdf

You can read more about EMC’s VNX1 FAST VP Here:  

https://www.emc.com/collateral/software/white-papers/h8058-fast-vp-unified-storage-wp.pdf

You can read more about EMC’s VNX2 FAST VP Here:

https://www.emc.com/collateral/white-papers/h12208-vnx-multicore-fast-cache-wp.pdf

What is VPLEX?

vplexWe are looking at implementing a storage virtualization device and I started doing a bit of research on EMC’s product offering.  Below is a summary of some of the information I’ve gathered, including a description of what VPLEX does as well as some pros and cons of implementing it.  This is all info I’ve gathered by reading various blogs, looking at EMC documentation and talking to our local EMC reps.  I don’t have any first-hand experience with VPLEX yet.

What is VPLEX?

VPLEX at its core is a storage virtualization appliance. It sits between your arrays and hosts and virtualizes the presentation of storage arrays, including non-EMC arrays.  Instead of presenting storage to the host directly you present it to the VPLEX. You then configure that storage from within the VPLEX and then zone the VPLEX to the host.  Basically, you attach any storage to it, and like in-band virtualization devices, it virtualizes and abstracts them.

There are three VPLEX product offerings, Local, Metro, and Geo:

Local.  VPLEX Local manages multiple heterogeneous arrays from a single interface within a single data center location. VPLEX Local allows increased availability, simplified management, and improved utilization across multiple arrays.

Metro.  VPLEX Metro with AccessAnywhere enables active-active, block level access to data between two sites within synchronous distances.  Host application stability needs to be considered. It is recommended that depending on the application that consideration for Metro be =< 5ms latency. The combination of virtual storage with VPLEX Metro and virtual servers allows for the transparent movement of VM’s and storage across longer distances and improves utilization across heterogeneous arrays and multiple sites.

Geo.  VPLEX Geo with AccessAnywhere enables active-active, block level access to data between two sites within asynchronous distances. Geo improves the cost efficiency of resources and power.  It provides the same distributed device flexibility as Metro but extends the distance up to 50ms of network latency. 

Here are some links to VPLEX content from EMC, where you can learn more about the product:

What are some advantages of using VPLEX? 

1. Extra Cache and Increased IO.  VPLEX has a large cache (64GB per node) that sits in-between the host and the array. It offers additional read cache that can greatly improve read performance on databases because the additional cache is offloaded from the individual arrays.

2. Enhanced options for DR with RecoverPoint. The DR benefits are increased when integrating RecoverPoint with VPLEX Metro or Geo to replicate the data using real time replication. It includes a capacity based journal for very granular rollback capabilities (think of it as a DVR for the data center).  You can also use the native bandwidth reduction features (compression & deduplication) or disable them if you have WAN optimization devices installed like those from Riverbed.  If you want active/active read/write access to data across a large distance, VPLEX is your only option.  NetApp’s V-Series and HDS USPV can’t do it unless they are in the same data center. Here’s a few more advantages:

  • DVR-like recovery to any point in time
  • Dynamic synchronous and asynchronous replication
  • Customized recovery point opbjectives that support any-to-any storage arrays
  • WAN bandwidth reduction of up to 90% of changed data
  • Non-disruptive DR testing

4. Non disruptive data mobility & reduced maintenance costs. One of the biggest benefits of virtualizing storage is that you’ll never have to take downtime for a migration again. It can take months to migrate production systems and without virtualization downtime is almost always required. Also, migration is expensive, it takes a great deal of resources from multiple groups as well as the cost of keeping the older array on the floor during the process. Overlapping maintenance costs are expensive too.  By shortening the migration timeframe hardware maintenance costs will drop, saving money.   Maintenance can be a significant part of the storage TCO, especially if the arrays are older or are going to be used for a longer period of time.  Virtualization can be a great way to reduce those costs and improve the return on assets over time.

5. Flexibility based on application IO.  The ability to move and balance LUN I/O among multiple smaller arrays non-disruptively would allows you to balance workloads and increase your ability to respond to performance demands quickly.  Note that underlying LUNs can be aggregated or simply passed through the VPLEX.

6. Simplified Management and vendor neutrality.   Implementing VPLEX for all storage related provisioning tasks would reduce complexity with multiple vendor arrays.  It allows you to manage multiple heterogeneous arrays from a single interface.  It also makes zoning easier as all hosts would only need to be zoned to the VPLEX rather than every array on the floor, which makes it faster and easier to provision new storage to a new host.

7. Increased leverage among vendors.  This advantage would be true with any virtualization device.  When controller based storage virtualization is employed, there is more flexibility to pit vendors against each other to get the best hardware, software and maintenance costs.  Older arrays could be commoditized which could allow for increased leverage to negotiate the best rates.

8. Use older arrays for Archiving. Data could be seamlessly demoted or promoted to different arrays based on an array’s age, it’s performance levels and it’s related maintenance costs.  Older arrays could be retained for capacity and be demoted to a lower tier of service, and even with the increased maintenance costs it could still save money.

9. Scale.  You can scale it out and add more nodes for more performance when needed.  With a VPLEX Metro configuration, you could configure VPLEX with up to 16 nodes in the cluster between the two sites.

What are some possible disadvantages of VPLEX?

1. Licensing Costs. VPLEX is not cheap.  Also, it can be licensed per frame on VNX but must be licensed per TB on CX series.  Your large,older CX arrays will cost you a lot more to license.

2. It’s one more device to manage.   The VPLEX is an appliance, and it’s one more thing (or things) that has to be managed and paid for.

3. Added complexity to infrastructure.  Depending on the configuration, there could be multiple VPLEX appliances at every site, adding considerable complexity to the environment.

4. Managing mixed workloads in virtual enviornments.  When heavy workloads are all mixed together on the same array there is no way to isolate them, and the ability to migrate that workload non-disruptively to another array is one of the reasons to implement a VPLEX.  In practice, however, those VMs may end up being moved to another array with the same storage limitations as where they came from.  The VPLEX could be simply temporarily solving a problem by moving that problem to a different location.

5. Lack of advanced features. The VPLEX has no advanced storage features such as snapshots, deduplication, replication, or thin provisioning.  It relies on the underlying storage array for those type of features.  As an example, you may want to utilize block based deduplication with an HDS array by placing a Netapp V-series in front of it and using Netapp’s dedupe to enable it.  It is only possible to do that with a Netapp Vseries or HDS USP-V type device, the VPLEX can’t do that.

6. Write cache performance is not improved.  The VPLEX uses write-through caching while their competitor’s storage virtualization devices use write-back caching. When there is a write I/O in a VPLEX environment the I/O is cached on the VPLEX, however it is passed all the way back to the virtualized storage array before an ack is sent back to the host.  The Netapp V-Series and HDS USPV will store the I/O in their own cache and immediately return an ack to the host.  At that point the I/Os are flushed to the back end storage array using their respective write coalescing & cache flushing algorithms.  Because of the write-back behavior it is possible for a possible performance gain above and beyond the performance of the underlying storage arrays due to the caching on these controllers.  There is no performance gain for write I/O in VPLEX environments beyond the existing storage due to the write-through cache design.

Gartner’s Market Share Analysis for Storage Vendors

Here’s an interesting market share analysis by Gartner that was published a few months ago:  http://www.gartner.com/technology/reprints.do?id=1-1GUZA31&ct=130703&st=sb.  It looks like EMC and NetApp rule the market, with EMC on top.  Below are the key findings copied from the linked article.

  • EMC and NetApp retained almost 80% of the market. They were separated by more than $2 billion in revenue from IBM and HP, their next two largest competitors.
  • No. 1 EMC grew its overall network-attached storage (NAS)/unified storage share to 47.9% (up from 41.7% in 2011), while No. 2 NetApp’s overall NAS/unified storage share dropped to 30.3% (down from 36% in 2011).
  • In the overall NAS/unified storage share ranking, the positions of the nine named vendors remained unchanged in 2012 (in order of share rank: EMC, NetApp, IBM, HP, Oracle, Netgear, Dell, Hitachi/Hitachi Data Systems and Fujitsu).
  • For the fifth consecutive year, iSCSI storage area network (SAN) revenue and Fibre Channel SAN revenue continued to gain proportionate share in the overall NAS/unified market.
  • The “pure NAS” market continues to grow at a much faster rate (15.9%) than the overall external controller-based (ECB) block-access market (2.3%), in large part due to the expanding NAS support of growing vertical applications and virtualization.

VNX vs. V7000

Like my previous post about the XIV, I did a little bit of research on how the IBM V7000 compares as an alternative to EMC’s VNX series (which I use now).  It’s only a 240 disk array but allows for 4 nodes in a cluster, so it comes close to matching the VNX 7500’s 1000 disk capability.  It’s an impressive array overall but does have a few shortcomings.  Here’s a brief overview list of some interesting facts I’ve read about, along with some of my opinions based on what I’ve read.  It’s by no means a comprehensive list. 

Good:

·         Additional Processing power.  Block IO can be spread out among 8 Storage Processors in a four node cluster.  You need to buy four frames to get 8 SP’s and up to 960 disks.  Storage Pools can span clustered nodes.

·         NAS Archiving Support. NAS has a built in archiving option to easily move files to a different array or different disks based on date.  EMC does not have this feature built in to DART.

·         File Agent. There is a management agent for both block and file.  EMC does not have a NAS/file host agent, you have to open a session and log in to the Linux based OS with DART.

·         SDD multipathing agent is free.  IBM’s SDD multi-pathing agent (comparable to PowerPath) is free of charge.  EMC’s is not.

·         Nice Management GUI.  The GUI is extremely impressive and easy to use.

·         Granularity.  The EasyTier data chunk size is configurable;  EMC’s FAST VP is stuck at 1GB.

·         Virtualization.  The V7000 can virtualize other vendor’s storage, making it appear to be disks on the actual V7000.  Migrations from other arrays would be simple.

·         Real time Compression.  Real time compression that, according to IBM, actually increases performance when enabled.  EMC utilizes post process compression. 

Bad: 

·         No Backend Bus between nodes. Clustered nodes have no backend bus and must be zoned together.  Inter-node IO traverses the same fabric as hosts.  I don’t see this configuration as optimal, it is a fact that the IO between clustered nodes must travel across the core fabric.  All nodes plug in to the core switches and are zoned together to facilitate communication between them.  According to IBM this introduces a 0.06ms latency, but in my opinion that latency could increase based on IO contention from hosts.  You may see an improvement in performance and response time due to the extra cache and striping across more disks, but you would see that on the VNX as well by adding additional disks to a pool and increasing the size of Fast Cache, and all the disk IO would remain on the VNX’s faster backend bus.  The fact that the clustered configuration introduces any latency at all is a negative in my mind.  Yes, this is another “FUD” argument.

·         Lots of additional connectivity required.  Each node would use 4 FC ports on the core switches (16 ports on each fabric in a four node cluster).  That’s significantly more port utilization on the core switches than a VNX would use.

·         No 3 Tier support in storage pools.  IBM’s EasyTier supports only two tiers of disk.  For performance reasons I would want to use only SSD and SAS.  Eliminating NL-SAS from the pools would significantly increase the number of SAS drives I’d have to purchase to make up the difference in capacity.  EMC’s FAST VP of course supports 3 tiers in a pool.

·         NAS IO limitation.  On unified systems, NAS file systems can only be serviced by IOgroup0, which means they can only use the two SP’s in the first node of the cluster.  File systems can be spread out among disks on other nodes, however.

 

VNX vs. XIV

I recently started researching and learning more about IBM’s XIV as an alternative to EMC’s VNX.  We already have many VNX’s installed and I’m a very happy EMC customer, but checking out EMC’s competition is always a good thing.  I’m looking at other vendors for comparison as well, but I’m just focusing on the XIV for this post.  On the surface the XIV looks like a very impressive system.  It’s got a fully virtualized grid design that distributes data over all of the disks in the array, which maximizes performance by eliminating hotspots.  It also offers industry leading rebuild times for disk failures.  It’s almost like a drop in storage appliance, you choose the size of the NL-SAS disks you want, choose a capacity, drop it in and go.  There’s no configuration to be done and an array can be up and running in just hours.  They are also very competitive on price.  While it really is an impressive array there are some important things to note when you compare it to a VNX.  Here’s a brief overview list of some interesting facts I’ve read about, along with some of my opinions based on what I’ve read. 

Good:

1.       It’s got more CPU power.  The XIV has more CPU’s and more overall processing power than the VNX.  It’s performance scales very well with capacity, as each grid module adds 1 additional CPU and cache.

2.       Remote Monitoring.  There’s an app for that!  IBM has a very nice monitoring app available for iOS.  I was told, however, that IBM does not provide the iPad to run it. 🙂

3.       Replication/VMWare Integration.  VNX and XIV have similar capabilities with regard to block replication and VMWare integration.

5.       Granularity & Efficiency.  The XIV stores data in 1MB chunks evenly across all disks, so reading and writing is spread evenly on all disks (eliminating hotspots).  The VNX stores data in 1 GB chunks across all disks in a given storage pool.

6.       Cloning and Rebuild Speed.  Cloning and disk rebuilds are lightning fast on the XIV because of the RAID-X implementation.  All of the disks are used to rebuild one.

7.       Easy upgrades.  The XIV has a very fast, non disruptive upgrade process for hardware and software.  The VNX has a non-disruptive upgrade process for FLARE (block), but not so much on DART (file).

Bad:

1.       No Data Integrity Checking.  The XIV is the only array that IBM offers that doesn’t have T10-DIF to protect against Data Corruption and has no persistent CRC written to the drives.  EMC has it across the board.

2.       It’s Block Only.  The XIV is not a unified system so you’d have to use a NAS Gateway if you want to use CIFS or NFS.

3.       It’s tierless storage with no real configuration choices.  The XIV doesn’t tier.  It has a large SSD Cache, similar to the VNX’s FastCache which is supported by up to 180 NL-SAS drives.  You have no choice on disk type, no choice on RAID type, and you must stay with the same drive size that you choose from the beginning for any expansions later on.  It eliminates the ability of storage administrators to manage or separate workloads based on application or business priority, you’d need to buy multiple arrays.  The XIV is not a good choice if you have an application that requires extensive tuning or requires very aggressive/low latency response times.

4.       It’s an entirely NL-SAS array.  In my opinion you’d need a very high cache hit ratio to get the IO numbers that IBM claims on paper.  It feels to me like they’ve come up with a decent method to use NL-SAS disks for something they weren’t designed to do, but I’m not convinced it’s the best thing to do.  There is still a very good use case for having SAS and SSD drives used for persistent storage.

5.       There’s an increased Risk of Data Loss on the XIV vs VNX.  All LUNs are spread across all of the drives in an XIV array so a part of every LUN is on every single drive in the array.  When a drive is lost the remaining drives have to regenerate mirror copies of any data that was on the failed drive.  The probability of a second drive failure during the rebuild of the first is not zero, although very close to zero due to their very fast rebuilt times.  What happens if a second drive fails before the XIV array has completed rebuilding the first failed drive?  You lose ALL of the LUNs on that XIV array.  It’s a “FUD” argument yes, but the possibility is real.  Note:  The first comment on this blog post states that this is not true, you can read the reply below.

6.       Limited Usable Capacity on one array.  In order to get up to the maximum capacity of 243TB on the XIV you’d need to fill it with 180 3TB drives.  Also, once you choose a drive size in the initial config you’re stuck with that size. The maximum raw capacity (using 3TB drives) for the VNX 5700 is 1485TB, the VNX 7500 is 2970TB.

First day at EMC World 2013

It’s been a great first day at EMC World 2013 so far.  I’ve been to three breakout sessions, two of which were very informative and useful. I focused my day today on learning more about best practices for the EMC technologies that I already use.  I’m not going to go into great detail now as I need to get to the next session. 🙂 The notes I made are specific to a brocade switch implemenation as that’s what I use. Here are some useful bullet points from a few of my sessions today.

Storage Area Networking Best Practices:

1.  Single Initiator Zoning.  Only put one initiator and the targets it will access in one zone.  I’ve already practiced this for many years, but it was interesting to hear the reasons for doing it that way.  It drastically reduces the number of server queries to the switch.

2.  Dynamic Interface Management.  Use Brocade port fencing.  It can prevent a SAN outage by giving you the ability to shut down a single host or port.

3.  Monitor for Congestion.  Congestion from one host can spread an cause problems in other operating environments.  Definitely enable Brocade bottleneck detection.

4.  Periodic SAN health checks.  Use the EMC SANity or Brocade SAN health tools regularly.  If you’re reading this, you’re verly likely already doing that. 🙂

5.  Monitor for bit errors.  These can lead to bb_credit loss.  The Brocade parameter portcvglongdistance should be set (bb_credit recovery option), which will prevent the problem.  Bit errors could still be a problem on F Ports, however.

6.  Cable Hygeine.  Always clean your cables!  They are a major contributor to physical connectivity problems.

7.  Target Releases.  Always run a target release of Firmware.

VNX NAS Configuration Best Practices:

1. For transactional NAS, always use the correct transfer size for your workload.  The latest VNX OE release supports up to 1MB.

2. Use Jumbo frames end to end.

3. 10G is available now. Don’t use 1G anymore!

4. Use link aggregation for redundancy and load balancing, failsafe for high availability.

5. Use AVM for typical NAS configurations, MVM for transactional NAS where you have high concurrency from a single NFS stream to a single NFS export. 

6. Use Direct Writes for apps that use concurrent IO or are asynchronous (Oracle, VMWare, etc.).

7. For NAS Pool LUNs, use thick LUNs that are all the same size, balance the SP designation for each one, use the same tiering policy for each one, use approximately 1 LUN for every 4 drives, and use thin enabled file systems to maximize the consumption of the highest tier.

8. Always enable FASTCache for NAS LUNs.  They benefit from it even more than block LUNs.

 

Magic Quadrant for Storage Vendors

I was reading over the Gartner report today that dives into specifics on the strengths and weaknesses of the big players in the storage market.  The original Gartner article can be viewed here: http://www.gartner.com/technology/reprints.do?id=1-1EL3WXN&ct=130321&st=sb.

It came as no surprise to me that EMC was ranked as the market leader, with NetApp as #2, Hitachi #3, and IBM #4.  I’ve been very pleased with the performance and reliability of EMC products in my environment and their global support is top notch.  Below is the magic quadrant from the article. marketleaders

Powerpath Install / Upgrade Issues

I recently had several issues when attempting to upgrade a Windows 2008 server from Powerpath v5.3 to v5.5 SP1. I uninstalled 5.3 using the windows utility, rebooted, then reinstalled v5.5 SP1. After the reboot, the server did not come back up. In order to get it to boot, the “last known good configuration” option had to be chosen. I opened an SR with EMC, and they determined that the uninstall process was not completing correctly.

To resolve the problem, you need to run the executable from the command line and add a few parameters. The name of the Powerpath install file will vary depending on the version you are installing, but the command looks like this:

EMCPower.Net32.signed.5.3.b310.exe /v”/L*v C:\logs\PPremove.log NO_REBOOT=1 PPREMOVE=ALL”

In this example, the c:\logs directory must exist before you run it. After running that command to uninstall powerpath and then reinstalling the new version, I no longer had the problem of the server not booting correctly.

After properly installing it, I continued to have a problem with Powerpath administrator not properly recognizing the devices. All of the devices showed up as “DEV ??”. I also saw “harddisk ??” when running powermt display dev=all. To resolve the problem I ran through the following steps:

1. Open Device Manager under Disks, and right-click the device drive that had a yellow ‘!’.
2. Choose “Update Driver Software”.
3. Click on “Browse my comptuer for driver software”.
4. Click on “Let me pick from a list of device drivers on my computer”.
5. In the next screen make sure the “Show compatible hardware” box is checked.
6. Under the Model list you should see the ‘PowerPath Devices’ driver. Highlight it and click next. This will install the PowerPath Driver. When it is done it will require a reboot. Once the server has come back online run another: ‘powermt display dev=all’ to see that the harddisk?? will have changed to harddisk## as expected.

Frequent 0x622 and 0x606 errors in the SP Event Logs

During some routine checking of the SP Event logs on our NS-40 I was noticing a large number of alerts. Every few seconds I was seeing these three alerts pop in:

0x60a Internal Information Only. A logical unit has been enabled
0x622 Background Verify Aborted
0x606 Unit Shutdown for trespass
 

After a bit of investigation, I narrowed down the cause to several large LUNs that had just been added to a new ESX host.  It turns out that the LUNs were still running the background zeroing process, and that’s what was causing all the alerts in the SP Log. When you create a new LUN and the disks have been previously used for other LUNs, the new LUN needs to be “zeroed” (filled with all zeros to clear data). This takes place in the background and it is part of the LUN initialization.  Once this background zeroing process completed on my new LUNs the alert messages stopped.  I was unaware of that process, so I did a bit of research on it.

LUNs are immediately available for use after a bind (using “Fastbind”), however all the operations associated with a bind can take a long time to finish.  The duration of a LUN bind is dependent on these things:

  • LUN’s bind time background verify priority (rate)
  • Size of the LUN being bound
  • Type of drives in the LUN’s RAID Group
  • Potential disabling of initial verify on bind
  • State of the Storage System (Idle or Load)
  • Position of the LUN on the hard disks of the RAID Group

From that list, priority, LUN size, drive type and verification selection all have the greatest effect on duration.  You can calculate the approximate duration of the bind process with this formula:

Time = Bound LUN Capacity * Bind Rate

Here are the Average Bind Rates for FC and SATA disks:

Disk Type ASAP Bind Rate High Bind Rate Medium (default) Bind Rate Low Bind Rate
FC 83 MB/s 7.54 MB/s 5.02 MB/s 4.02 MB/s
SATA 61.7 MB/s 7.47 MB/s 5.09 MB/s 3.78 MB/s

If we were to calculate how many hours it would take to bind a 2000GB LUN on a five disk RAID5 group composed of SATA drives set to a medium (default) bind rate, here’s what the formula would look like:

Time = 2000 GB * ((1/5.09 MB/s) * 1024 MB->GB * (1/3600 sec->hrs) = 111.76 Hours.

There is a detailed white paper that covers this topic from EMC called “The Effect of Priorities on LUN Management Operations” that you can view here:  http://www.emc.com/collateral/hardware/white-papers/h4153-influence-priorities-emc-clariion-lun-wp.pdf.  That’s where I gathered the information above.

Storage Performance Metrics

I often get requests from application owners to review storage performance stats.  I thought I’d give a quick overview of some of the things I look at, what the myriad of performance metrics in commonly used storage performance software tools actually mean, and how you might use some of them to investigate a performance problem.  Performance analysis is very much an art (not a science) and it’s sometimes difficult to pinpoint exact causes based on the mix of applications and workload on the array. Taking all of the metrics into account with a holistic view is needed to be successful. Performing data collection of application workloads over time is recommended because application workload characteristics will likely vary over time. If you have a major problem, I would always recommend opening a service ticket with your hardware vendor.

This post is just an overview of storage performance metrics and isn’t meant to dive in to every possible scenario from every angle. Dell EMC has some excellent guides for performance best practices that you can read here:

Ive used a variety of software tools in my tenure as a storage administrator.  EMC’s Performance Manager, Windows PerfMon, NetApp OnCommand Insight, Solar Winds SRM, ViPR SRM, and of course the ubiquitous Navisphere Analyzer.  All of them basically use the same metrics, so the following information will be useful regardless of which method you use.

The first thing I do when reviewing a potential storage array performance problem is a quick look at the Storage Processors.  This will give you a good indication of the overall health of the array before you dive into the specific LUN (or LUNs) used by the application.

  • SP Cache Dirty Pages (%). These are pages in write cache that have received new data from hosts but have not yet been flushed to disk.  You should have a high percentage of dirty pages as it increases the chance of a read coming from cache or additional writes to the same block of data being absorbed by the cache. If an IO is served from cache the performance is better than if the data had to be retrieved from disk.  That’s why the default watermarks are usually around 60/80% or 70/90%.  You don’t want dirty pages to reach 100%, they should fluctuate between the high and low watermarks (which means the Cache is healthy).  Periodic spikes or drops outside the watermarks are ok, but consistently hitting 100% indicates that the write cache is overstressed.
  • SP Utilization (%). Check and see if either SP is running higher than about 75%.  If either is running that high application response time will be increased.  Also, both will need to be under 50% for non-disruptive upgrades. We had to do a large scale migration of data from one SAN to another at one point in order to get a NDU accomplished.  You’ll also want to check for proper balance.  If one is much higher than the other, you should consider migrating LUNs from one SP owner to another.  I check SP balance on all of our arrays on a daily basis.
  • SP Response time (ms). Make sure again that both SPs are even and that response time is acceptable. I like to see response times under 10ms.  If you see that one SP has high utilization and response time but the other SP doesn’t, look for LUNs owned by the busier SP that are using more array resources. Looking at total IO on a per LUN basis can help confirm If both SPs have relatively similar throughput but one SP has much higher bandwidth.  That could mean that there is some large block IO occurring.
  • SP Port Queue Full Count. This represents the number of times that a front end port issued a QFULL response back to the hosts. If you are seeing QFULL’s it could mean that the Queue Depth on the HBA is too large for the LUNs being accessed.  A Clariion/VNX front end port has a queue depth of 1600 which is the maximum number of simultaneous IO’s that port can process.  Each LUN on the array has a maximum queue depth that is calculated using a formula based on the number of data disks in the RAID Group. For example, a port with 512 queues and a typical LUN queue depth of 32 can support up to: 512 / 32 = 16 LUNs on 1 Initiator (HBA) or 16 Initiators (HBAs) with 1 LUN each or any combination not to exceed this number. Configurations that exceed this number are in danger of returning QFULL conditions. A QFULL condition signals that the target/storage port is unable to process more IO requests and thus the initiator will need to throttle IO to the storage port. As a result of this, application response times will increase and IO activity will decrease.

The next thing I do is look at the specific LUNs that the application owner is asking about. The list below includes the basic performance metrics that I most often look at when investigating a performance problem.

  • Utilization (%) represents the fraction of an observation period during which a LUN has any outstanding requests. When the LUN becomes the bottleneck, the utilization will be at or close to 100%. However, since I/Os can get serviced by multiple disks an increase in workload might still result in a higher throughput.  Utilization by itself is not a very good indicator of the overall performance of the LUN, it needs to be factored in with several other things. For example, If you are writing to a LUN (100% Writes) and the location of the data is in a small physical space on the LUN, it may be possible to get to 100% with write cache re-hits. This means that all writes are being serviced by the write cache and since you are writing data to the same locations over and over you do not flush any of the data to the disks. This can cause your LUN Utilization to be 100% but there will actually be no IO to the disks. Utilization is very affected by caching, both read and write. The LUN can be very busy but may not have a problem. Use Utilization to assist in identifing busy LUNs then look at queuing and response times to see if there really is an issue.
  • Queue Length is the average number of requests within a polling interval that are outstanding to this LUN. A queue length of zero indicates an idle LUN. If three requests arrive at an idle LUN at the same time, only one of them can be served immediately; the other two must wait in the queue. That scenario would result in a queue length of 3.  My general guideline for “bad performance” on a LUN is a queue length greater than 2 for a single disk drive.
  • Average Busy Queue Length is the average number of outstanding requests when the LUN was busy. This does not include any idle time. This value should not exceed 2 times the number of spindles on a LUN. For example, if a LUN has 25 spindles, a value of 50 is acceptable. Since this queue length is counted only when the LUN is not idle, the value indicates the frequency variation (burst frequency) of incoming requests. The higher the value, the bigger the burst and the longer the average response time at this component. In contrast to this metric, the average queue length does also include idle periods when no requests are pending. If you have 50% of the time just one outstanding request, and the other 50% the LUN is idle, the average busy queue length will be 1. The average queue length however, will be ½.
  • Response Time (ms) is the average time, in milliseconds, that a request to this LUN is outstanding, including its waiting time. The higher the queue length for a LUN, the more requests are waiting in its queue, thus increasing the average response time of a single request. For a given workload, queue length and response time are directly proportional.  Keep in mind that cache re-hits bring down the average response time (and service times), whether they are reads or writes. LUN Response time is a good starting point for troubleshooting. It gives a good indicator of what the host system is experiencing. Usually if your LUN response time (Response time = queue length * service time) is good then the host performance is good. High response times don’t always mean that the CLARiiON is busy, it can also indicate that you’re having issues with your host or Fabric.  We use the Brocade Health report on a regular basis to identify hosts that have an excessive amount of traffic, as well as running the EMC HEAT report on hosts that have reported issues (which can identify incorrect HBA Drivers, Bad HBA, etc).These are my general guidelines for response time:
    Less than 10 ms: very good
    Between 10 – 20 ms: okay
    Between 20 – 50 ms: slow, needs attention
    Greater than 50 ms:  I/O bottleneck
  • Service Time (ms) represents the Time, in milliseconds, a request spent being serviced by a component. It does not include time waiting in a queue. Service time is mainly a characteristic of the system component. However, larger I/Os take longer and therefore usually result in lower throughput (IO/s) but better bandwidth (Mbytes/s). In general, Service time is simply the time it takes to actually send the I/O request to the storage and get an answer back. In general, I like to see service times below 20ms.
  • Total Throughput (IO/sec) is the average number of host requests that is passed through the LUN per second. This includes both read and write requests. Smaller requests usually result in a higher total throughput than larger requests.  Examining total throughput (along with %Utilization) is a good way to identify the busiest LUNs on the array. In general, here are the IOPs limits by drive type:
RPM        Drive Type      IOPs
7,200      SATA,NL-SAS     ~80
10,000     SATA,NL-SAS     ~130
10,000     FC,SAS          ~140
15,000     FC,SAS          ~180
N/A        EFD             ~1500 (Read/Write, 60/40)
N/A        EFD             ~6000 (Read)
N/A        EFD             ~3000 (Write)
  • Write Throughput (I/O/sec) The average number of host write requests that is passed through the LUN per second. Smaller requests usually result in a higher write throughput than larger requests.  When troubleshooting specific LUNs, check the write IO size and see if the size is what you would expect for the application you are investigating. Extremely large IO sizes coupled with high IOPS may cause write cache contention.
  • Read Throughput (I/O/sec) The average number of host read requests that is passed through the LUN per second. Smaller requests usually result in a higher read throughput than larger requests.
  • Total Bandwidth (MB/s) The average amount of host data in Mbytes that is passed through the LUN per second. This includes both read and write requests. Larger requests usually result in a higher total bandwidth than smaller requests.
  • Read Bandwidth (MB/s) The average amount of host read data in Mbytes that is passed through the LUN per second. Larger requests usually result in a higher bandwidth than smaller requests.
  • Write Bandwidth (MB/s) The average amount of host write data in Mbytes that is passed through the LUN per second. Larger requests usually result in a higher bandwidth than smaller requests. Keep in mind that writes consume many more array resources than reads.
  • Read Size (KB) The average read request size in Kbytes seen by the LUN. This number indicates whether the overall read workload is oriented more toward throughput (I/Os per second) or bandwidth (Mbytes/second). For a finer distinction of I/O sizes, use an IO Size Distribution chart for this LUN.
  • Write Size (KB) The average write request size in Kbytes seen by the LUN. This number indicates whether the overall write workload is oriented more toward throughput (I/Os per second) or bandwidth (Mbytes/second). For a finer distinction of I/O sizes, use an IO Size Distribution chart for the LUNs.

Below is an explanation of additional performance metrics that I don’t use as frequently, but I’m including them for completeness.

  • Forced Flushes/s Number of times per second the cache had to flush pages to disk to free up space for incoming write requests. Forced flushes are a measure of how often write requests will have to wait for disk I/O rather than be satisfied by an empty slot in the write cache. In most well performing systems this should be zero most of the time. 
  • Full Stripe Writes/s Average number of write requests per second that spanned a whole stripe (all disks in a LUN). This metric is applicable only to LUNs that are part of a RAID5 or RAID3 group.
  • Used Prefetches (%) The percentage of prefetched data in the read cache that was read during the last polling interval.
  • Disk Crossing (%) Percentage of host requests that require I/O to at least two disks compared to the total number of host requests. A single disk crossing can involve more than two disk drives.
  • Disk Crossings/s Number of times per second that a request requires access to at least two disk drives. A single disk crossing can involve more than two disks.
  • Read Cache Hits/s Average number of read requests per second that were satisfied by either read or write cache without requiring any disk access. A read cache hit occurs when recently accessed data is re-referenced while it is still in the cache.
  • Read Cache Misses/s Average number of read requests per second that did require one or more disk accesses.
  • Reads From Write Cache/s Average number of read requests per second that were satisfied by write cache only. Reads from write cache occur when recently written data is read again while it is still in the write cache. This is a subset of read cache hits which includes requests satisfied by either the write or the read cache.
  • Reads From Read Cache/s Average number of read requests per second that were satisfied by the read cache only. Reads from read cache occur when data that has been recently read or prefetched is re-read while it is still in the read cache. This is a subset of read cache hits which includes requests satisfied by either the write or the read cache.
  • Read Cache Hit Ratio The fraction of read requests served from both read and write caches vs. the total number of read requests. A higher ratio indicates better read performance.
  • Write Cache Hits/s Average number of write requests per second that were satisfied by the write cache without  requiring any disk access. Write requests that are not write cache hits are referred to as write cache misses.
  • Write Cache Misses/s Average number of write requests per second that did require one or multiple disk accesses. Write requests that cause forced flushes or that bypass the write cache due to their size are examples of write cache misses.
  • Write Cache Rehits/s Average number of write requests per second that were satisfied by the write cache since they had been referenced before and not yet flushed to the disks. Write cache rehits occur when recently accessed data is referenced again while it is still in the write cache. This is a subset of Write Cache Hits.
  • Write Cache Hit Ratio The ratio of write requests that the write cache satisfied without requiring any disk access vs. the total number of write requests to this LUN. A higher ratio indicates better write performance.
  • Write Cache Rehit Ratio The ratio of write requests that the write cache satisfied since they have been referenced before and not yet flushed to the disks vs. the total number of write requests to this LUN. This is a measure of how often the write cache succeeded in eliminating a write operation to disk. While improving the rehit ratio is useful it is more beneficial to reduce the number of forced flushes.

EMC World 2012 – Thoughts on preparing for a VDI deployment

We’re in the process of testing the deployment of VDI now so I attended a session about preparing for a deployment.   As I mentioned in my previous post, I’m not an expert on any subject after attending a one hour session, but I thought I’d share my thoughts.

The most important take away from the session was mentioned near the beginning – for a truly successful deployment you must do a detailed desktop IO analysis.  It’s important to have a firm grasp on the amount of desktop IO that your company requires and a detailed analysis is the only way to gather that info.   There are “rules of thumb” that can be followed ( 5 IOPs for light users, 10 IOPs for medium users, and 20 IOPs for heavy users), but you could easily end up either over-allocating or under-allocating without knowing the actual numbers.  Lakeside software and Liquidware labs were mentioned as vendors who provide software that does such an an analysis, however I’ve never heard of either of them and can provide no information or feedback on their services. There’s a good free VDI calculator available at http://myvirtualcloud.net/?page_id=1076.  Once you have a good grasp of the amount of IO you’ll need to support, scaling for the 95% percentile should be your target.

What’s the best way to prepare for VDI on a VNX array?  If you conclude from your analysis that your desktop environment is more read intensive, consider hosting it on storage pools that utilize FAST VP with EFD and FastCache.  Using VMWare’s View Storage Accelerator and EMC’s VFCache would also be of great benefit.  If your environment is more write intensive, focus on increasing the spindle count (even at the expense of wasted capacity).  All of the other items mentioned for read intensive environments still apply and would be beneficial.

Final thoughts:

– If you’re using Windows XP make sure disk alignment is set correctly.  If it’s not, you could see up to a 50% penalty in performance.

– Image optimization is very important.  Remove all unneeded services.

– Schedule your normal maintenance operations off hours.  Running patches and updates during business hours could cause the help desk phones to light up.

– Using NFS vs. block is a wash performance wise.  NFS is currently better for scaling up, however, as it allows for up to a 32 node cluster vs. only 8 for block (on linked clones).

– Desktops tend to be write heavy.  FAST VP is great!

EMC World 2012 – Thoughts on Isilon

I’m at EMC World in Las Vegas and I finished up my first full day at EMC World 2012 today.  There are about 15,000 attendees this year, which is much more than last year and it’s obvious. The crowds are huge and the Venetian is packed full.  Joe Tucci’s Keynote was amazing, the video screen behind Joe was longer than a football field and he took the time to point that out. 🙂  He went into detail about the past, present and future of IT and it was very interesting.

Many of the sessions I’m signed up for have non-disclosure agreements, so I can’t speak about some of the new things being announced or the sessions I’ve attended.  I’m trying to focus on learning about (and attending breakout sessions) about EMC technologies that we don’t currently use in my organization to broaden my scope of knowledge. There may be better solutions from EMC available than what the company I work for is currently using, and I want to learn about all the options available.

My first session today was about EMC’s Isilon product and I was excited to learn more about it. My only experience so far with EMC’s file based solutions is with legacy Celerra arrays and VNX File.  So, what’s the difference, and why would anyone choose to purchase an Isilon solution over simply adding a few data movers to their VNX? Why is Isilon better? Good Question. I attended an introductory level session but it was very informative.

I’m not going to pretend to be an expert after listening to a one hour session this morning, but here were my take-aways from it. Isilon, in a nutshell, is a much higher performance solution than VNX file. There are several different iterations of the platform available (S, X, and NL Series) all focused on specific customer needs.  One is for high transactional, IOPs intensive needs, another geared for capacity, and another geared for a balance (and a smaller budget). It uses the OneFS single filesystem (impressive by itself), which eliminates the standard abstraction layers of the filesystem, volume manager, and RAID types.  All of the disks use a single file system regardless of the total size.  The data is arranged in a fully symmetric cluster with data striped across all of the nodes.  The single, OneFS filesystem works regardless of the size of your filesystem – 18TB minimum all the way up to 15 PB.

Adding a new node to Isilon is seamless, it’s instantly added to the cluster (hence the term “Scale-out NAS” EMC has been touting throughout the conference).  You can add up to 144 nodes to a single Isilon array. It also features auto balancing, in that it will automatically rebalance data to the new node that was just added.  It can also remove data from a node and move it to a new one if you decide to decomission a node and replace it with a newer model. Need to replace that 4 year old isilon node with the old, low capacity disks? No problem. Another interesting item to node is how data is stored across the nodes. Isilon does not use a standard RAID model at all, it distributes data across the disks based on how much protection you decide you need.  You can decide as an administrator how the data is protected, choosing to keep as many copies of data as you want (at the expense of total available storage). The more duplicate copies of data you want to keep, the less total storage you have available for production use.  One great benefit of Isilon vs. VNX file is that rebuilds are much faster, as traditional RAID groups are dependant on the total IO available to the single drive being rebuilt, while Isilon rebuilds are spanned across the entire system. It could mean the differnce between a 12 hour single disk RAID5 rebuild vs. less than one hour on Isilon. Pretty cool stuff.

I only have experience with Celerra replicator, but it was also mentioned in the session that Isilon replications can go down to the specific folder level within a file system.  Very cool.  I can only do replications at the entire file system level on VNX file and Celerra right now. I don’t have any experience with that functionality yet, but it sounds very interesting.

There is a new upcoming version of the OneFS (called “Mavericks”) that will introduce even more new features, I’m not going to go into those as they may be part of the non-disclosure agreement.  Everything I’ve mentioned thus far is available currently.  Overall, I was very impressed with the Isilon architecture as compared to VNX file.  EMC claimed that they have the highest FS NAS throughput for any vendor with Isilon at 106GB/sec.  Again, very impressive.

I’ll make another update this week after attending a few more breakout sessions.  I’m also looking forward to learning more about Greenplum, the promise of improved performance through paralellism (using scale out archiceture on standard hardware) is also very interesting to me.  If anyone else is at EMC World this week, please comment!

Cheers!

Problem with soft media errors on SSD drives and FastCache

4/25/2012 Update:  EMC has released a fix for this issue.  Call your account service representative and say you need to upgrade your NS-960 dart to 6.0.55.300 and flare to 4.30.000.5.524 plus drive firmware upgrade on all SSD drives to TC3Q.

Do you have FastCache enabled on your array?  Keep a close eye on your SP event logs for soft media errors on your SSD drives.  I just noticed over 2000 soft media errors on one of my FastCache enabled arrays, and found a technical advisory from EMC (emc282741) that desribes this as a potentially critical problem.  I just opened a case with EMC for my array to be reviewed for a possible disk replacement.  In the event a second disk drive in the same FastCache RAID group encounters soft media errors before the system automatically retires the first drive a dual faulted RAID Group could occur.  This can result in storage pools going offline and becoming completely inaccessible to the attached hosts.  That’s basically a total SAN outage, not good.

Look for errors like the following in your SP event logs:

“Date Stamp”  “Time Stamp” Bus1 Enc1 Dsk0  820 Soft Media Error [Bad block]

EMC states in emc282741 that enhancements are targeted for Q1 2012 to address SSD media errors and dual hardware faults, but in the meantime, make sure you review the SP logs if you have CLARiiON or VNX arrays that are configured with SSD disk drives or are using FAST Cache.  If any instance of the “Soft Media Error” listed above is associated with any one of the solid state disk drives in your arrays, the array should be upgraded to at least FLARE Release 04.30.000.5.522 (for CX4 Series arrays) or Release 05.31.000.5.509 (for VNX Series arrays) and then start a Proactive Copy (PACO) to a hot spare and replace the drive as soon as possible.

In order to quickly review this on each of my arrays, I wrote the following script to update my intranet site with a report every morning:

naviseccli -h clariion1a getlog >clariion1a.txt
naviseccli -h clariion1b getlog >clariion1b.txt  
cat clariion1a.txt | grep -i ‘soft media’ >clariion1_softmedia_errors.csv
cat clariion1b.txt | grep -i ‘soft media’ >>clariion1_softmedia_errors.csv
./csv2htm.pl -e -T -i /home/scripts/clariion1_softmedia_errors.csv -o /<intranet_web_server>/clariion1_softmedia_errors.html
 

The script dumps the entire SP log from each SP into a text file, greps for only soft media errors in each file, then converts the output to HTML and writes it to my intranet web server.

 

Powerpath commands in AIX causing unexpected errors / initialization errors.

We recently had a problem with one of our AIX VIO servers not being able to run any powerpath commands.  Any attempt to run a command would result in an unexpected error or initialization error.   After speaking to EMC about it, the root cause is usually either running out of space on the root filesystem or having the data and stack ulimit paramenters set too low after adding a large number of new LUNs.   We are running AIX 6.1 on an IBM pSeries 550 with PowerPath 5.3 HF1.

Here are the errors that were popping up:

root@vioserver1:/script # powermt config
Unexpected error occured.

root@vioserver1:/script # powermt display dev=all
Initialization error.

root@vioserver1:/script # naviseccli -h <san_dns_name> lun -list -all
evp_enc.c(282): OpenSSL internal error, assertion failed: inl > 0
ksh: 503926 IOT/Abort trap(coredump)

Having too many LUNs caused the issue,  we had recently added an additional 35 for a total of  70.  Increasing the data and stack parameters to ‘unlimited’ resolved the problem.

root@vioserver1:/script # ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     2097151
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited

Optimizing Java Memory for Navisphere / Unisphere

If you have a CLARiiON system with a large configuration in terms of disks, LUNs, initiator records, etc, you may experience a slowdown when managing the system with Navisphere or Unisphere.  If you increase the amount of memory that Java can use, you can significantly improve the response time when using the management console.

Here are the steps:

  1. Log in to the CLARiiON setup page (http://<clariion IP>/setup).  Go to Set Update Parameters > Update Interval.  Change it to 300.
  2. On the Management Server (or your local PC/laptop) go to Control Panel and launch the Java icon.
  3. Go to the Java tab and click view.
  4. Enter -Xmx128m under Java Runtime Parameter, which allocates 128MB for Java.  This number can be increased as you see fit, you may see better results with 512 or 1024.

Auto generating daily performance graphs with EMC Control Center / Performance Manager

This document describes the process I used to pull performance data using the ECC pmcli command line tool, parse the data to make it more usable with a graphing tool, and then use perl scripts to automatically generate graphs.

You must install Perl.  I use ActiveState Perl (Free Community Edition) (http://www.activestate.com/activeperl/downloads).

You must install Cygwin.  Link: http://www.cygwin.com/install.html. I generally choose all packages.

I use the follow CPAN Perl modules:

Step 1:

Once you have the software set up, the first step is to use the ECC command line utility to extract the interval performance data that you’re interested in graphing.  Below is a sample PMCLI command line script that could be used for this purpose.

:Get the current date

For /f “tokens=2-4 delims=/” %%a in (‘date /t’) do (set date=%%c%%a%%b)

:Export the interval file for today’s date.

D:\ECC\Client.610\PerformanceManager\pmcli.exe -export -out D:\archive\interval.csv -type interval -class clariion -date %date% -id APM00324532111

:Copy all the export data to my cygwin home directory for processing later.

copy /y e:\san712_interval.csv C:\cygwin\home\<userid>

You can schedule the command script above to run using windows task scheduler.  I run it at 11:46PM every night, as data is collected on our SAN in 15 minute intervals, and that gives me a file that reports all the way up to the end of one calendar day.

Note that there are 95 data collection points from 00:15 to 23:45 every day if you collect data at 15 minute intervals.  The storage processor data resides in the last two lines of the output file.

Here is what the output file looks like:

EMC ControlCenter Performance manager generated file from: <path>

Data Collected for DiskStats

Data Collected for DiskStats – 0_0_0

                                                             3/28/11 00:15       3/28/11 00:30      3/28/11  00:45      3/28/11 01:00 

Number of Arrivals with Non Zero Queue     12                         20                        23                      23 

% Utilization                                                30.2                     33.3                     40.4                  60.3

Response Time                                              1.8                        3.3                        5.4                     7.8

Read Throughput IO per sec                        80.6                    13.33                   90.4                    10.3

Great information in there, but the format of the data makes it very hard to do anything meaningful with the data in an excel chart.  If I want to chart only % utilization, that data is difficult to chart because there are so many counters around it that are also have data collected on them.   My next goal was to write a script to reformat the data in a much more usable format to automatically create a graph for one specific counter that I’m interested in (like daily utilization numbers), which could then be emailed daily or auto-uploaded to an internal website.

Step 2:

Once the PMCLI data is exported, the next step is to use cygwin bash scripts to parse the csv file and pull out only the performance data that is needed.  Each SAN will need a separate script for each type of performance data.  I have four scripts configured to run based on the data that I want to monitor.  The scripts are located in my cygwin home directory.

The scripts I use:

  • Iostats.sh (for total IO throughput)
  • Queuestats.sh (for disk queue length)
  • Resptime.sh (for disk response time in ms)
  • Utilstats.sh (for % utilization)

Here is a sample shell script for parsing the CSV export file (iostats.sh):

#!/usr/bin/bash

#This will pull only the timestamp line from the top of the CSV output file. I’ll paste it back in later.

grep -m 1 “/” interval.csv > timestamp.csv

#This will pull out only lines that begin with “total througput io per sec”.

grep -i “^Total Throughput IO per sec” interval.csv >> stats.csv

#This will pull out the disk/LUN title info for the first column.  I’ll add this back in later.

grep -i “Data Collected for DiskStats -” interval.csv > diskstats.csv

grep -i “Data Collected for LUNStats -” interval.csv > lunstats.csv

#This will create a column with the disk/LUN number .  I’ll paste it into the first column later.

cat diskstats.csv lunstats.csv > data.csv

#This adds the first column (disk/LUN) and combines it with the actual performance data columns.

paste data.csv stats.csv > combined.csv

#This combines the timestamp header at the top with the combined file from the previous step to create the final file we’ll use for the graph.  There is also a step to append the current date and copy the csv file to an archive directory.

cat timestamp.csv combined.csv > iostats.csv

cp iostats.csv /cygdrive/e/SAN/csv_archive/iostats_archive_$(date +%y%m%d).csv

#  This removes all the temporary files created earlier in the script.  They’re no longer needed.

rm timestamp.csv

rm stats.csv

rm diskstats.csv

rm lunstats.csv

rm data.csv

rm combined.csv

#This strips the last two lines of the CSV (Storage Processor data).  The resulting file is used for the “all disks” spreadsheet.  We don’t want the SP
data to skew the graph.  This CSV file is also copied to the archive directory.

sed ‘$d’ < iostats.csv > iostats2.csv

sed ‘$d’ < iostats2.csv > iostats_disk.csv

rm iostats2.csv

cp iostats_disk.csv /cygdrive/e/SAN/csv_archive/iostats_disk_archive_$(date +%y%m%d).csv

Note: The shell script above can be run in the windows task scheduler as long as you have cygwin installed.  Here’s the syntax:

c:\cygwin\bin\bash.exe -l -c “/home/<username>/iostats.sh”

After running the shell script above, the resulting CSV file contains only Total Throughput (IO per sec) data for each disk and lun.  It will contain data from 00:15 to 23:45 in 15 minute increments.  After the cygwin scripts have run we will have csv datasets that are ready to be exported to a graph.

The Disk and LUN stats are combined into the same CSV file.  It is entirely possible to rewrite the script to only have one or the other.  I put them both in there to make it easier to manually create a graph in excel for either disk or lun stats at a later time (if necessary).  The “all disks graph” does not look any different with both disk and lun stats in there, I tried it both ways and they overlap in a way that makes the extra data indistinguishable in the image.

The resulting data output after running the iostats.sh script is shown below.  I now have a nice, neat excel spreadsheet that lists the total throughput for each disk in the array for the entire day in 15 minute increments.   Having the data formatted in this way makes it super easy to create charts.  But I don’t want to have to do that manually every day, I want the charts to be created automatically.

                                                             3/28/11 00:15       3/28/11 00:30      3/28/11  00:45      3/28/11 01:00

Total Throughput IO per sec   – 0_0_0          12                             20                             23                           23 

Total Throughput IO per sec    – 0_0_1        30.12                        33.23                        40.4                         60.23

Total Throughput IO per sec    – 0_0_2         1.82                          3.3                           5.4                              7.8

Total Throughput IO per sec    -0_0_3         80.62                        13.33                        90.4                         10.3 

Step 3:

Now I want to automatically create the graphs every day using a Perl script.  After the CSV files are exported to a more usable format from the previous step, I Use the GD::Graph library from CPAN (http://search.cpan.org/~mverb/GDGraph-1.43/Graph.pm) to auto-generate the graphs.

Below is a sample Perl script that will autogenerate a great looking graph based on the CSV ouput file from the previous step.

#!/usr/bin/perl

#Declare the libraries that will be used.

use strict;

use Text::ParseWords;

use GD::Graph::lines;

use Data::Dumper;

#Specify the csv file that will be used to create the graph

my $file = ‘C:\cygwin\home\<username>\iostats_disk.csv’;

#my $file  = $ARGV[0];

my ($output_file) = ($file =~/(.*)\./);

#Create the arrays for the data and the legends

my @data;

my @legends;

#parse csv, generate an error if it fails

open(my $fh, ‘<‘, $file) or die “Can’t read csv file ‘$file’ [$!]\n”;

my $countlines = 0;

while (my $line = <$fh>) {

chomp $line;

my @fields = Text::ParseWords::parse_line(‘,’, 0, $line);

#There are 95 fields generated to correspond to the 95 data collection points in each
of the output files.

my @field =

(@fields[1],@fields[2],@fields[3],@fields[4],@fields[5],@fields[6],@fields[7],@fields[8],@fields[9],@fields[10],@fields[11],@fields[12],@fields[13],@fields[14],@fields[15],@fields[16],@fields[17],@fields[18],@fields[19],@fields[20],@fields[21],@fields[22],@fields[23],@fields[24],@fields[25],@fields[26],@fields[27],@fields[28],@fields[29],@fields[30],@fields[31],@fields[32],@fields[33],@fields[34],@fields[35],@fields[36],@fields[37],@fields[38],@fields[39],@fields[40],@fields[41],@fields[42],@fields[43],@fields[44],@fields[45],@fields[46],@fields[47],@fields[48],@fields[49],@fields[50],@fields[51],@fields[52],@fields[53],@fields[54],@fields[55],@fields[56],@fields[57],@fields[58],@fields[59],@fields[60],@fields[61],@fields[62],@fields[63],@fields[64],@fields[65],@fields[66],@fields[67],@fields[68],@fields[69],@fields[70],@fields[71],@fields[72],@fields[3],@fields[74],@fields[75],@fields[76],@fields[77],@fields[78],@fields[79],@fields[80],@fields[81],@fields[82],@fields[83],@fields[84],@fields[85],@fields[86],@fields[87],@fields[88],@fields[89],@fields[90],@fields[91],@fields[92],@fields[93],@fields[94],@fields[95]);
push @data, \@field;

if($countlines >= 1){

push @legends, @fields[0];

}

$countlines++;

}

#The data and legend arrays will read 820 lines of the CSV file.  This number will change based on the number of disks in the SAN, and will be different depending on the SAN being reported on.  The legend info will read the first column of the spreadsheet and create a color box that corresponds to the graph line.  For the purpose of this graph, I won’t be using it because 820+ legend entries look like a mess on the screen.

splice @data, 1, -820;

splice @legends, 0, -820;

#Set Graphing Options

my $mygraph = GD::Graph::lines->new(1024, 768);

# There are many graph options that can be changed using the GD::Graph library.  Check the website (and google) for lots of examples.

$mygraph->set(

title => ‘SP IO Utilization (00:15 – 23:45)’,

y_label => ‘IOs Per Second’,

y_tick_number => 4,

values_vertical => 6,

show_values => 0,

x_label_skip => 3,

) or warn $mygraph->error;

#As I said earlier, because of the large number of legend entries for this type of graph, I change the legend to simply read “All disks”.  If you want the legend to actually put the correct entries and colors, use this line instead:  $mygraph->set_legend(@legends);

$mygraph->set_legend(‘All Disks’);

#Plot the data

my $myimage = $mygraph->plot(\@data) or die $mygraph->error;

# Export the graph as a gif image.  The images are currently moved to the IIS folder (c:\inetpub\wwwroot) with one of the scripts.  The could also be emailed using a sendmail utility.

my $format = $mygraph->export_format;

open(IMG,”>$output_file.$format”) or die $!;

binmode IMG;

print IMG $myimage->gif;

close IMG;

After this script runs the resulting image file will be saved in the cygwin home directory (It saves it in the same directory that the CSV file is located in).  One of the nightly scripts I run will copy the image to our interal IIS server’s image directory, and sendmail will email the graph to the SAN Admin team.

That’s it!  You now have lots of pretty graphs with which you can impress your management team. 🙂

Here is a sample graph that was generated with the Perl script:

Filesystem Alignment

You’re likely to have seen the filesystem alignment check fail on most, if not all, of the EMC HEAT reports that you run on your windows 2003 servers.  The starting offset for partition 1 should optimally be a multiple of 128 sectors.  So, how do you fix this problem, and what does it mean?

If you align the partition to 128 blocks (or 64KB as each block is 512bytes) then you don’t cross a track boundary and thereby issue the minimum number of IOs.   Issuing the minimum number of IOs sounds good, right? 🙂

Because NTFS reserves 31.5 KB of signature space, if a LUN has an element size of 64 KB with the default alignment offset of 0 (both are default Navisphere settings), a 64 KB write to that LUN would result in a disk crossing even though it would seem to fit perfectly on
the disk.  A disk crossing can also be referred to as a split IO because the read or write must be split into two or more segments. In this case, 32.5 KB would be written to the first disk and 31.5 KB would be written to the following disk, because the beginning of the stripe is offset by 31.5 KB of signature space. This problem can be avoided by providing the correct alignment offset.  Each alignment offset value represents one block.  Therefore, EMC recommends setting the alignment offset value to 63, because 63 times 512 bytes is 31.5 KB.

Checking your offset:

1. Launch System Information in windows (msinfo32.exe)

2. Select Components -> Storage -> Disks.

3. Scroll to the bottom and you will see the partition starting offset information.  This number needs to be perfectly divisible by 4096, if it’s not then your partition is not properly aligned.

Correcting your starting offset:

Launch diskpart:

C:\>diskpart

DISKPART> list disk

Two disks should be listed

DISKPART> select disk 1

This selects the second disk drive

DISKPART> list partitions

This step should give a message “There are no partitions on this disk to show”.  This confirms a blank disk.

DISKPART> create partition primary align=64

That’s it.  You now have a perfectly aligned disk.

A guide for troubleshooting CIFS issues on the Celerra

In my experience, every CIFS issue you may have will fall into 8 basic areas, the first five being the most common. Check all of these things and I can almost guarantee you will resolve your problem. 🙂

1. CIFS Service. Check and make sure the CIFS Service is running: server_cifs server_2 -protocol CIFS -option start

2. DNS. Check and make sure that your DNS server entries on the Celerra are correct, that you’re configured to point to at least two, and that they are up and running with the DNS Service running.

3. NTP. Make sure your NTP server entry is correct on the Celerra, and that the IP is reachable on the network and is actively providing NTP services.

4. User Mapping.

5. Default Gateway. Double check your default gateway in the Celerra’s routing table. Get the network team involved if you’re not sure.

6. Interfaces. Make sure the interfaces are physically connected and properly configured.

7. Speed/Duplex. Make sure the speed and duplex settings on the Celerra match those of the switch port that the interfaces are plugged in to.

8. VLAN. Double check your VLAN settings on the interfaces, make sure it matches what is configured on the connected switch.

The SAN Guy

I’ve been in IT for over 17 years now, and have a few tricks up my sleeve after all this time.  My most recent job transition in this field has been to a full time SAN Administrator.  It’s a job function I’ve performed for 8+ years, but for the past four years it’s been my exclusive responsibility.

I work for a global company with EMC SAN hardware deployed in many countries around the globe.  I’m responsible for the management and administration of all of this hardware, primarily Clariion and Celerra hardware.  Because EMC doesn’t always provide all of the tools you need, sometimes administrators like us have to get a bit creative to get the job done. 😉

I’ll be posting some tips and tricks I’ve picked up about EMC SAN administration here, as well as some simple, “work smarter not harder” tips for every day tasks, and maybe just some general info as well.