Category Archives: VPLEX

VPLEX initiator paths dropped

We recently ran into an SP bug check on one of our VNX arrays and after it came back up several of the initiator paths to the VPLEX did not come back up.  We were also seeing IO timeouts.  This is a known bug that happens when there is an SP reboot and is fixed with Patch 1 for GeoSynchrony 5.3.  EMC has released a script that provides a workaround until the patch can be applied: https://download.emc.com/downloads/DL56253_VPLEX_VNX_SCRIPT.zip.zip

The following pre-conditions need to happen during a VNX NDU to see this issue on VPLEX:
1] During a VNX NDU, SPA goes down.
2] At this point IO time-outs start happening on IT nexus’s pertaining to SPA.
3] The IO time-outs cause the VPLEX SCSI Layer to send LU Reset TMF’s. These LU Reset TMF’s get timed out as well.

You can review ETA 000193541 on EMC’s support site for more information.  It’s a critical bug and I’d suggest patching as soon as possible.

 

Advertisements

VPLEX Health Check

This is a brief post to share the CLI commands and sample output for a quick VPLEX health check.  Our VPLEX had a dial home event and below are the commands that EMC ran to verify that it was healthy.  Here is the dial home event that was generated:

SymptomCode: 0x8a266032
SymptomCode: 0x8a34601a
Category: Status
Severity: Error
Status: Failed
Component: CLUSTER
ComponentID: director-1-1-A
SubComponent: stdf
CallHome: Yes
FirstTime: 2014-11-14T11:20:11.008Z
LastTime: 2014-11-14T11:20:11.008Z
CDATA: Compare and Write cache transaction submit failed, status 1 [Versions:MS{D30.60.0.3.0, D30.0.0.112, D30.60.0.3}, Director{6.1.202.1.0}, ClusterWitnessServer{unknown}] RCA: The attempt to start a cache transaction for a Scsi Compare and Write command failed. Remedy: Contact EMC Customer Support.

Description: The processing of a Scsi Com pare and Write command could not complete.
ClusterID: cluster-1

Based on that error the commands below were run to make sure the cluster was healthy.

This is the general health check command:

VPlexcli:/> health-check
 Product Version: 5.3.0.00.00.10
 Product Type: Local
 Hardware Type: VS2
 Cluster Size: 2 engines
 Cluster TLA:
 cluster-1: FNM00141800023
 
 Clusters:
 ---------
 Cluster Cluster Oper Health Connected Expelled Local-com
 Name ID State State
 --------- ------- ----- ------ --------- -------- ---------
 cluster-1 1 ok ok True False ok
 
 Meta Data:
 ----------
 Cluster Volume Volume Oper Health Active
 Name Name Type State State
 --------- ------------------------------- ----------- ----- ------ ------
 cluster-1 c1_meta_backup_2014Nov21_100107 meta-volume ok ok False
 cluster-1 c1_meta_backup_2014Nov20_100107 meta-volume ok ok False
 cluster-1 c1_meta meta-volume ok ok True
 
 Director Firmware Uptime:
 -------------------------
 Director Firmware Uptime
 -------------- ------------------------------------------
 director-1-1-A 147 days, 16 hours, 15 minutes, 29 seconds
 director-1-1-B 147 days, 15 hours, 58 minutes, 3 seconds
 director-1-2-A 147 days, 15 hours, 52 minutes, 15 seconds
 director-1-2-B 147 days, 15 hours, 53 minutes, 37 seconds
 
 Director OS Uptime:
 -------------------
 Director OS Uptime
 -------------- ---------------------------
 director-1-1-A 12:49pm up 147 days 16:09
 director-1-1-B 12:49pm up 147 days 16:09
 director-1-2-A 12:49pm up 147 days 16:09
 director-1-2-B 12:49pm up 147 days 16:09
 
 Inter-director Management Connectivity:
 ---------------------------------------
 Director Checking Connectivity
 Enabled
 -------------- -------- ------------
 director-1-1-A Yes Healthy
 director-1-1-B Yes Healthy
 director-1-2-A Yes Healthy
 director-1-2-B Yes Healthy
 
 Front End:
 ----------
 Cluster Total Unhealthy Total Total Total Total
 Name Storage Storage Registered Ports Exported ITLs
 Views Views Initiators Volumes
 --------- ------- --------- ---------- ----- -------- -----
 cluster-1 56 0 299 16 353 9802
 
 Storage:
 --------
 Cluster Total Unhealthy Total Unhealthy Total Unhealthy No Not visible With
 Name Storage Storage Virtual Virtual Dist Dist Dual from Unsupported
 Volumes Volumes Volumes Volumes Devs Devs Paths All Dirs # of Paths
 --------- ------- --------- ------- --------- ----- --------- ----- ----------- -----------
 cluster-1 203 0 199 0 0 0 0 0 0
 
 Consistency Groups:
 -------------------
 Cluster Total Unhealthy Total Unhealthy
 Name Synchronous Synchronous Asynchronous Asynchronous
 Groups Groups Groups Groups
 --------- ----------- ----------- ------------ ------------
 cluster-1 0 0 0 0
 
 Cluster Witness:
 ----------------
 Cluster Witness is not configured

This command checks the status of the cluster:

VPlexcli:/> cluster status
Cluster cluster-1
operational-status: ok
transitioning-indications:
transitioning-progress:
health-state: ok
health-indications:
local-com: ok

This command checks the state of the storage volumes:

VPlexcli:/> storage-volume summary
Storage-Volume Summary (no tier)
---------------------- --------------------

Health out-of-date 0
storage-volumes 203
unhealthy 0

Vendor DGC 203

Use meta-data 4
used 199

Capacity total 310T

Matching LUNs and UIDs when presenting VPLEX LUNs to Unix hosts

Our naming convention for LUNs includes the pool ID, LUN number, server name, filesystem/drive letter, last four digits of the array’s serial number, and size (in GB). Having all of this information in the LUN name makes for very easy reporting and identification of LUNs on a server.  This is what our LUN names look like: P1_LUN100_SPA_0000_servername_filesystem_150G

Typically, when presenting a new LUN to our AIX administration team for a new server build, they would assign the LUNs to specific volume groups based on the LUN names. The command ‘powermt display dev=hdiskpower#’ always includes the name & intended volume group for the LUN, making it easy for our admins to identify a LUN’s purpose.  Now that we are presenting LUNs through our VPlex, when they run a powermt display on the server the UID for the LUN is shown, not the name.  Below is a sample output of what is displayed.

root@VIOserver1:/ # powermt display dev=all
Pseudo name=hdiskpower0
VPLEX ID=FNM00141800023
Logical device ID=6000144000000010704759ADDF2487A6 (this would usually be displayed as a LUN name)
state=alive; policy=ADaptive; queued-IOs=0
==============================================================================
————— Host ————— – Stor – — I/O Path — — Stats —
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
1 fscsi1 hdisk8 CL1-0B active alive 0 0
1 fscsi1 hdisk6 CL1-0F active alive 0 0
0 fscsi0 hdisk4 CL1-0D active alive 0 0
0 fscsi0 hdisk2 CL1-07 active alive 0 0

Pseudo name=hdiskpower1
VPLEX ID=FNM00141800023
Logical device ID=6000144000000010704759ADDF2487A1 (this would usually be displayed as a LUN name)
state=alive; policy=ADaptive; queued-IOs=0
==============================================================================
————— Host ————— – Stor – — I/O Path — — Stats —
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
1 fscsi1 hdisk9 CL1-0B active alive 0 0
1 fscsi1 hdisk7 CL1-0F active alive 0 0
0 fscsi0 hdisk5 CL1-0D active alive 0 0
0 fscsi0 hdisk3 CL1-07 active alive 0 0

In order to easily match up the UIDs with the LUN names on the server, an extra step needs to be taken on the VPlex CLI. Log in to the VPlex using a terminal emulator, and once you’re logged in use the ‘vplexcli’ command. That will take you to a shell that allows for additional commands to be entered.

login as: admin
Using keyboard-interactive authentication.
Password:
Last login: Fri Sep 19 13:35:28 2014 from 10.16.4.128
admin@service:~> vplexcli
Trying ::1…
Connected to localhost.
Escape character is ‘^]’.

Enter User Name: admin

Password:

VPlexcli:/>

Once you’re in, run the ls -t command with the additional options listed below. You will need to substitute the STORAGE_VIEW_NAME with the actual name of the storage view that you want a list of LUNs from.

VPlexcli:/> ls -t /clusters/cluster-1/exports/storage-views/STORAGE_VIEW_NAME::virtual-volumes

The output looks like this:

/clusters/cluster-1/exports/storage-views/st1pvio12a-b:
Name Value
————— ————————————————————————————————–
virtual-volumes [(0,P1_LUN411_7872_SPB_VIOServer1_VIO_10G,VPD83T3:6000144000000010704759addf2487a6,10G),
(1,P0_LUN111_7872_SPA_VIOServer1_VIO_10G,VPD83T3:6000144000000010704759addf2487a1,10G)]

Now you can easily see which disk UID is tied to which LUN name.

If you would like to get a list of every storage view and every LUN:UID mapping, you can substitute the storage view name with an asterisk (*).

VPlexcli:/> ls -t /clusters/cluster-1/exports/storage-views/*::virtual-volumes

The resulting report will show a complete list of LUNs, grouped by storage view:

/clusters/cluster-1/exports/storage-views/VIOServer1:
Name Value
————— ————————————————————————————————–
virtual-volumes [(0,P1_LUN421_9322_SPB_/clusters/cluster-1/exports/storage-views/ VIOServer2:
Name Value
————— ————————————————————————————————–
virtual-volumes [(0,P1_LUN421_9322_SPB_VIOServer2_root_75G,VPD83T3:6000144000000010704759addf248ad9,75G),
(1,R2_LUN125_9322_SPB_VIOServer2_redo2_12G,VPD83T3:6000144000000010704759addf248b09,12G),
(2,R2_LUN124_9322_SPA_VIOServer2_redo1_12G,VPD83T3:6000144000000010704759addf248b04,12G),
(3,P3_LUN906_9322_SPB_VIOServer2_oraarc_250G,VPD83T3:6000144000000010704759addf248aff,250G),
(4,P2_LUN706_9322_SPA_VIOServer2_oraarc_250G,VPD83T3:6000144000000010704759addf248afa,250G)]

/clusters/cluster-1/exports/storage-views/VIOServer2:
Name Value
————— ————————————————————————————————
virtual-volumes [(1,R2_LUN1025_9322_SPB_VIOServer2_redo2_12G,VPD83T3:6000144000000010704759addf248b09,12G),
(2,R2_LUN1024_9322_SPA_VIOServer2_redo1_12G,VPD83T3:6000144000000010704759addf248b04,12G),
(3,P3_LUN906_9322_SPB_VIOServer2_ora1_250G,VPD83T3:6000144000000010704759addf248aff,250G),
(4,P2_LUN706_9322_SPA_VIOServer2_ora2_250G,VPD83T3:6000144000000010704759addf248afa,250G)]

/clusters/cluster-1/exports/storage-views/VIOServer3:
Name Value
————— ————————————————————————————————
virtual-volumes [(0,P0_LUN101_3432_SPA_VIOServer3_root_75G,VPD83T3:6000144000000010704759addf248a0a,75G),
(1,P0_LUN130_3432_SPA_VIOServer3_redo1_25G,VPD83T3:6000144000000010704759addf248a0f,25G),

Our VPlex has only been installed for a few months and our team is still learning.  There may be a better way to do this, but it’s all I’ve been able to figure out so far.

What is VPLEX?

vplexWe are looking at implementing a storage virtualization device and I started doing a bit of research on EMC’s product offering.  Below is a summary of some of the information I’ve gathered, including a description of what VPLEX does as well as some pros and cons of implementing it.  This is all info I’ve gathered by reading various blogs, looking at EMC documentation and talking to our local EMC reps.  I don’t have any first-hand experience with VPLEX yet.

What is VPLEX?

VPLEX at its core is a storage virtualization appliance. It sits between your arrays and hosts and virtualizes the presentation of storage arrays, including non-EMC arrays.  Instead of presenting storage to the host directly you present it to the VPLEX. You then configure that storage from within the VPLEX and then zone the VPLEX to the host.  Basically, you attach any storage to it, and like in-band virtualization devices, it virtualizes and abstracts them.

There are three VPLEX product offerings, Local, Metro, and Geo:

Local.  VPLEX Local manages multiple heterogeneous arrays from a single interface within a single data center location. VPLEX Local allows increased availability, simplified management, and improved utilization across multiple arrays.

Metro.  VPLEX Metro with AccessAnywhere enables active-active, block level access to data between two sites within synchronous distances.  Host application stability needs to be considered. It is recommended that depending on the application that consideration for Metro be =< 5ms latency. The combination of virtual storage with VPLEX Metro and virtual servers allows for the transparent movement of VM’s and storage across longer distances and improves utilization across heterogeneous arrays and multiple sites.

Geo.  VPLEX Geo with AccessAnywhere enables active-active, block level access to data between two sites within asynchronous distances. Geo improves the cost efficiency of resources and power.  It provides the same distributed device flexibility as Metro but extends the distance up to 50ms of network latency. 

Here are some links to VPLEX content from EMC, where you can learn more about the product:

What are some advantages of using VPLEX? 

1. Extra Cache and Increased IO.  VPLEX has a large cache (64GB per node) that sits in-between the host and the array. It offers additional read cache that can greatly improve read performance on databases because the additional cache is offloaded from the individual arrays.

2. Enhanced options for DR with RecoverPoint. The DR benefits are increased when integrating RecoverPoint with VPLEX Metro or Geo to replicate the data using real time replication. It includes a capacity based journal for very granular rollback capabilities (think of it as a DVR for the data center).  You can also use the native bandwidth reduction features (compression & deduplication) or disable them if you have WAN optimization devices installed like those from Riverbed.  If you want active/active read/write access to data across a large distance, VPLEX is your only option.  NetApp’s V-Series and HDS USPV can’t do it unless they are in the same data center. Here’s a few more advantages:

  • DVR-like recovery to any point in time
  • Dynamic synchronous and asynchronous replication
  • Customized recovery point opbjectives that support any-to-any storage arrays
  • WAN bandwidth reduction of up to 90% of changed data
  • Non-disruptive DR testing

4. Non disruptive data mobility & reduced maintenance costs. One of the biggest benefits of virtualizing storage is that you’ll never have to take downtime for a migration again. It can take months to migrate production systems and without virtualization downtime is almost always required. Also, migration is expensive, it takes a great deal of resources from multiple groups as well as the cost of keeping the older array on the floor during the process. Overlapping maintenance costs are expensive too.  By shortening the migration timeframe hardware maintenance costs will drop, saving money.   Maintenance can be a significant part of the storage TCO, especially if the arrays are older or are going to be used for a longer period of time.  Virtualization can be a great way to reduce those costs and improve the return on assets over time.

5. Flexibility based on application IO.  The ability to move and balance LUN I/O among multiple smaller arrays non-disruptively would allows you to balance workloads and increase your ability to respond to performance demands quickly.  Note that underlying LUNs can be aggregated or simply passed through the VPLEX.

6. Simplified Management and vendor neutrality.   Implementing VPLEX for all storage related provisioning tasks would reduce complexity with multiple vendor arrays.  It allows you to manage multiple heterogeneous arrays from a single interface.  It also makes zoning easier as all hosts would only need to be zoned to the VPLEX rather than every array on the floor, which makes it faster and easier to provision new storage to a new host.

7. Increased leverage among vendors.  This advantage would be true with any virtualization device.  When controller based storage virtualization is employed, there is more flexibility to pit vendors against each other to get the best hardware, software and maintenance costs.  Older arrays could be commoditized which could allow for increased leverage to negotiate the best rates.

8. Use older arrays for Archiving. Data could be seamlessly demoted or promoted to different arrays based on an array’s age, it’s performance levels and it’s related maintenance costs.  Older arrays could be retained for capacity and be demoted to a lower tier of service, and even with the increased maintenance costs it could still save money.

9. Scale.  You can scale it out and add more nodes for more performance when needed.  With a VPLEX Metro configuration, you could configure VPLEX with up to 16 nodes in the cluster between the two sites.

What are some possible disadvantages of VPLEX?

1. Licensing Costs. VPLEX is not cheap.  Also, it can be licensed per frame on VNX but must be licensed per TB on CX series.  Your large,older CX arrays will cost you a lot more to license.

2. It’s one more device to manage.   The VPLEX is an appliance, and it’s one more thing (or things) that has to be managed and paid for.

3. Added complexity to infrastructure.  Depending on the configuration, there could be multiple VPLEX appliances at every site, adding considerable complexity to the environment.

4. Managing mixed workloads in virtual enviornments.  When heavy workloads are all mixed together on the same array there is no way to isolate them, and the ability to migrate that workload non-disruptively to another array is one of the reasons to implement a VPLEX.  In practice, however, those VMs may end up being moved to another array with the same storage limitations as where they came from.  The VPLEX could be simply temporarily solving a problem by moving that problem to a different location.

5. Lack of advanced features. The VPLEX has no advanced storage features such as snapshots, deduplication, replication, or thin provisioning.  It relies on the underlying storage array for those type of features.  As an example, you may want to utilize block based deduplication with an HDS array by placing a Netapp V-series in front of it and using Netapp’s dedupe to enable it.  It is only possible to do that with a Netapp Vseries or HDS USP-V type device, the VPLEX can’t do that.

6. Write cache performance is not improved.  The VPLEX uses write-through caching while their competitor’s storage virtualization devices use write-back caching. When there is a write I/O in a VPLEX environment the I/O is cached on the VPLEX, however it is passed all the way back to the virtualized storage array before an ack is sent back to the host.  The Netapp V-Series and HDS USPV will store the I/O in their own cache and immediately return an ack to the host.  At that point the I/Os are flushed to the back end storage array using their respective write coalescing & cache flushing algorithms.  Because of the write-back behavior it is possible for a possible performance gain above and beyond the performance of the underlying storage arrays due to the caching on these controllers.  There is no performance gain for write I/O in VPLEX environments beyond the existing storage due to the write-through cache design.