Tag Archives: list

Brocade Switch Type Matrix

I recently performed an inventory of all of our Brocade switches and stumbled upon this list of switch types that allows you to identify the Brocade model number.  Simply go to http:///SwitchInfo.html, do a search for “switchType” in the report, and compare that number to the table below to identify your model.

12

3900

2 Gb 32-port switch

16

3200

2 Gb 8-port value line switch

21

24000

2 Gb 128-port core fabric switch

26

3850

2 Gb 16-port switch with switch limit

27

3250

2 Gb 8-port switch with switch limit

29

4012

2 Gb 12-port Blade Server SAN I/O Module

34

200E

2 Gb 16-port switch with switch limit

37

4020

2 Gb 20-port Blade Server SAN I/O Module

43

4024

4 Gb 24-port Blade Server SAN I/O Module

44

4900

4 Gb 64-port switch

45

4016

2 Gb 16-port Blade Server SAN I/O Module

51

4018

2 Gb 16/18-port Blade Server SAN I/O Module

61

4424

2 Gb 24-port Blade Server SAN I/O Module

62

DCX

8 Gb 798-port core fabric backbone

64

5300

8 Gb 80-port switch

66

5100

8 Gb 40-port switch

67

Encryption Switch

8 Gb 16-port encryption switch

70

5410

8 Gb 12-port Blade Server SAN I/O Module

71

300

8 Gb 16-port switch

72

5480

8 Gb 24-port Blade Server SAN I/O Module

73

5470

8 Gb 20-port Blade Server SAN I/O Module

75

M5424

8 Gb 24-port Blade Server SAN I/O Module

77

DCX-4S

8 Gb 192-port core fabric backbone

83

7800

8 Gb 16-FC ports, 6 GbE ports extension switch

86

5450

8 Gb 26-port Blade Server SAN I/O Module

87

5460

8 Gb 26-port Blade Server SAN I/O Module

92

VA-40FC

8 Gb 40-port switch

109

6510

16 Gb 48-port switch

117

6547

16 Gb 48-port Blade Server SAN I/O Module

118

6505

16 Gb 24-port switch

120

DCX 8510-8

16 Gb 512-port core fabric backbone

121

DCX 8510-4

16 Gb 256-port core fabric backbone

124

5430

8 Gb 16-port Blade Server SAN I/O Module

125

5431

8 Gbit 16-port stackable switch module

129

6548

16 Gb 28-port Blade Server SAN I/O Module

130

M6505

16 Gbit 24-port Blade Server SAN I/O Module

133

6520

16 Gb 96-port switch

134

5432

8 Gb 24-port Blade Server SAN I/O Module

148

7840

16 Gb 24-FC ports, 16 10GbE ports, 2 40GbE ports extension switch

Advertisements

Magic Quadrant for Storage Vendors

I was reading over the Gartner report today that dives into specifics on the strengths and weaknesses of the big players in the storage market.  The original Gartner article can be viewed here: http://www.gartner.com/technology/reprints.do?id=1-1EL3WXN&ct=130321&st=sb.

It came as no surprise to me that EMC was ranked as the market leader, with NetApp as #2, Hitachi #3, and IBM #4.  I’ve been very pleased with the performance and reliability of EMC products in my environment and their global support is top notch.  Below is the magic quadrant from the article. marketleaders

Storage Performance Metrics

I often get requests from application owners to review performance stats.  I thought I’d give a quick overview of some of the things I look at, what the myriad of performance metrics in Navisphere Analyzer and ECC Performance Manager mean, and how you might use some of them to investigate a performance problem.  Performance analysis is very much an art (not a science) and it’s sometimes difficult to pinpoint exact causes based on the mix of applications and workload on the array. Taking all of the metrics into account with a holistic view is needed to be successful.  Performing data collection of application workloads over time is recommended because application workload characteristics will likely vary over time. If you have a major problem, I would always recommend opening an SR with EMC.

This post is just an overview of SAN performance metrics and isn’t meant to dive in to every possible scenario from every angle. EMC already has excellent guides for performance best practices that you can read here:

Because we have EMC’s Performance Manager tool installed in our environment, I always go to that tool first rather than Navisphere Analyzer.  Both use the same metrics, so the following information will be useful regardless of which method you use.

The first thing I do is look at the Storage Processors.  This will give you a good indication of the overall health of the array before you dive into the specific LUN (or LUNs) used by the application.

  • SP Cache Dirty Pages (%). These are pages in write cache that have received new data from hosts but have not yet been flushed to disk.  You should have a high percentage of dirty pages as it increases the chance of a read coming from cache or additional writes to the same block of data being absorbed by the cache. If an IO is served from cache the performance is better than if the data had to be retrieved from disk.  That’s why the default watermarks are usually around 60/80% or 70/90%.  You don’t want dirty pages to reach 100%, they should fluctuate between the high and low watermarks (which means the Cache is healthy).  Periodic spikes or drops outside the watermarks are ok, but consistently hitting 100% indicates that the write cache is overstressed.
  • SP Utilization (%). Check and see if either SP is running higher than about 75%.  If either is running that high application response time will be increased.  Also, both will need to be under 50% for non-disruptive upgrades. We had to do a large scale migration of data from one SAN to another at one point in order to get a NDU accomplished.  You’ll also want to check for proper balance.  If one is much higher than the other, you should consider migrating LUNs from one SP owner to another.  I check SP balance on all of our arrays on a daily basis.
  • SP Response time (ms). Make sure again that both SPs are even and that response time is acceptable. I like to see response times under 10ms.  If you see that one SP has high utilization and response time but the other SP doesn’t, look for LUNs owned by the busier SP that are using more array resources. Looking at total IO on a per LUN basis can help confirm If both SPs have relatively similar throughput but one SP has much higher bandwidth.  That could mean that there is some large block IO occurring.
  • SP Port Queue Full Count. This represents the number of times that a front end port issued a QFULL response back to the hosts. If you are seeing QFULL’s it could mean that the Queue Depth on the HBA is too large for the LUNs being accessed.  A Clariion/VNX front end port has a queue depth of 1600 which is the maximum number of simultaneous IO’s that port can process.  Each LUN on the array has a maximum queue depth that is calculated using a formula based on the number of data disks in the RAID Group. For example, a port with 512 queues and a typical LUN queue depth of 32 can support up to: 512 / 32 = 16 LUNs on 1 Initiator (HBA) or 16 Initiators (HBAs) with 1 LUN each or any combination not to exceed this number. Configurations that exceed this number are in danger of returning QFULL conditions. A QFULL condition signals that the target/storage port is unable to process more IO requests and thus the initiator will need to throttle IO to the storage port. As a result of this, application response times will increase and IO activity will decrease.

The next thing I do is look at the specific LUNs that the application owner is asking about. The list below includes the basic performance metrics that I most often look at when investigating a performance problem.

  • Utilization (%) represents the fraction of an observation period during which a LUN has any outstanding requests. When the LUN becomes the bottleneck, the utilization will be at or close to 100%. However, since I/Os can get serviced by multiple disks an increase in workload might still result in a higher throughput.  Utilization by itself is not a very good indicator of the overall performance of the LUN, it needs to be factored in with several other things. For example, If you are writing to a LUN (100% Writes) and the location of the data is in a small physical space on the LUN, it may be possible to get to 100% with write cache re-hits. This means that all writes are being serviced by the write cache and since you are writing data to the same locations over and over you do not flush any of the data to the disks. This can cause your LUN Utilization to be 100% but there will actually be no IO to the disks. Utilization is very affected by caching, both read and write. The LUN can be very busy but may not have a problem. Use Utilization to assist in identifing busy LUNs then look at queuing and response times to see if there really is an issue.
  • Queue Length is the average number of requests within a polling interval that are outstanding to this LUN. A queue length of zero indicates an idle LUN. If three requests arrive at an idle LUN at the same time, only one of them can be served immediately; the other two must wait in the queue. That scenario would result in a queue length of 3.  My general guideline for “bad performance” on a LUN is a queue length greater than 2 for a single disk drive.
  • Average Busy Queue Length is the average number of outstanding requests when the LUN was busy. This does not include any idle time. This value should not exceed 2 times the number of spindles on a LUN. For example, if a LUN has 25 spindles, a value of 50 is acceptable. Since this queue length is counted only when the LUN is not idle, the value indicates the frequency variation (burst frequency) of incoming requests. The higher the value, the bigger the burst and the longer the average response time at this component. In contrast to this metric, the average queue length does also include idle periods when no requests are pending. If you have 50% of the time just one outstanding request, and the other 50% the LUN is idle, the average busy queue length will be 1. The average queue length however, will be ½.
  • Response Time (ms) is the average time, in milliseconds, that a request to this LUN is outstanding, including its waiting time. The higher the queue length for a LUN, the more requests are waiting in its queue, thus increasing the average response time of a single request. For a given workload, queue length and response time are directly proportional.  Keep in mind that cache re-hits bring down the average response time (and service times), whether they are reads or writes. LUN Response time is a good starting point for troubleshooting. It gives a good indicator of what the host system is experiencing. Usually if your LUN response time (Response time = queue length * service time) is good then the host performance is good. High response times don’t always mean that the CLARiiON is busy, it can also indicate that you’re having issues with your host or Fabric.  We use the Brocade Health report on a regular basis to identify hosts that have an excessive amount of traffic, as well as running the EMC HEAT report on hosts that have reported issues (which can identify incorrect HBA Drivers, Bad HBA, etc).These are my general guidelines for response time:
    Less than 10 ms: very good
    Between 10 – 20 ms: okay
    Between 20 – 50 ms: slow, needs attention
    Greater than 50 ms:  I/O bottleneck
  • Service Time (ms) represents the Time, in milliseconds, a request spent being serviced by a component. It does not include time waiting in a queue. Service time is mainly a characteristic of the system component. However, larger I/Os take longer and therefore usually result in lower throughput (IO/s) but better bandwidth (Mbytes/s). In general, Service time is simply the time it takes to actually send the I/O request to the storage and get an answer back. In general, I like to see service times below 20ms.
  • Total Throughput (IO/sec) is the average number of host requests that is passed through the LUN per second. This includes both read and write requests. Smaller requests usually result in a higher total throughput than larger requests.  Examining total throughput (along with %Utilization) is a good way to identify the busiest LUNs on the array. In general, here are the IOPs limits by drive type:
RPM        Drive Type      IOPs
7,200      SATA,NL-SAS     ~80
10,000     SATA,NL-SAS     ~130
10,000     FC,SAS          ~140
15,000     FC,SAS          ~180
N/A        EFD             ~1500 (Read/Write, 60/40)
N/A        EFD             ~6000 (Read)
N/A        EFD             ~3000 (Write)
  • Write Throughput (I/O/sec) The average number of host write requests that is passed through the LUN per second. Smaller requests usually result in a higher write throughput than larger requests.  When troubleshooting specific LUNs, check the write IO size and see if the size is what you would expect for the application you are investigating. Extremely large IO sizes coupled with high IOPS may cause write cache contention.
  • Read Throughput (I/O/sec) The average number of host read requests that is passed through the LUN per second. Smaller requests usually result in a higher read throughput than larger requests.
  • Total Bandwidth (MB/s) The average amount of host data in Mbytes that is passed through the LUN per second. This includes both read and write requests. Larger requests usually result in a higher total bandwidth than smaller requests.
  • Read Bandwidth (MB/s) The average amount of host read data in Mbytes that is passed through the LUN per second. Larger requests usually result in a higher bandwidth than smaller requests.
  • Write Bandwidth (MB/s) The average amount of host write data in Mbytes that is passed through the LUN per second. Larger requests usually result in a higher bandwidth than smaller requests. Keep in mind that writes consume many more array resources than reads.
  • Read Size (KB) The average read request size in Kbytes seen by the LUN. This number indicates whether the overall read workload is oriented more toward throughput (I/Os per second) or bandwidth (Mbytes/second). For a finer distinction of I/O sizes, use an IO Size Distribution chart for this LUN.
  • Write Size (KB) The average write request size in Kbytes seen by the LUN. This number indicates whether the overall write workload is oriented more toward throughput (I/Os per second) or bandwidth (Mbytes/second). For a finer distinction of I/O sizes, use an IO Size Distribution chart for the LUNs.

Below is an explanation of additional performance metrics that I don’t use as frequently, but I’m including them for completeness.

  • Forced Flushes/s Number of times per second the cache had to flush pages to disk to free up space for incoming write requests. Forced flushes are a measure of how often write requests will have to wait for disk I/O rather than be satisfied by an empty slot in the write cache. In most well performing systems this should be zero most of the time. 
  • Full Stripe Writes/s Average number of write requests per second that spanned a whole stripe (all disks in a LUN). This metric is applicable only to LUNs that are part of a RAID5 or RAID3 group.
  • Used Prefetches (%) The percentage of prefetched data in the read cache that was read during the last polling interval.
  • Disk Crossing (%) Percentage of host requests that require I/O to at least two disks compared to the total number of host requests. A single disk crossing can involve more than two disk drives.
  • Disk Crossings/s Number of times per second that a request requires access to at least two disk drives. A single disk crossing can involve more than two disks.
  • Read Cache Hits/s Average number of read requests per second that were satisfied by either read or write cache without requiring any disk access. A read cache hit occurs when recently accessed data is re-referenced while it is still in the cache.
  • Read Cache Misses/s Average number of read requests per second that did require one or more disk accesses.
  • Reads From Write Cache/s Average number of read requests per second that were satisfied by write cache only. Reads from write cache occur when recently written data is read again while it is still in the write cache. This is a subset of read cache hits which includes requests satisfied by either the write or the read cache.
  • Reads From Read Cache/s Average number of read requests per second that were satisfied by the read cache only. Reads from read cache occur when data that has been recently read or prefetched is re-read while it is still in the read cache. This is a subset of read cache hits which includes requests satisfied by either the write or the read cache.
  • Read Cache Hit Ratio The fraction of read requests served from both read and write caches vs. the total number of read requests. A higher ratio indicates better read performance.
  • Write Cache Hits/s Average number of write requests per second that were satisfied by the write cache without  requiring any disk access. Write requests that are not write cache hits are referred to as write cache misses.
  • Write Cache Misses/s Average number of write requests per second that did require one or multiple disk accesses. Write requests that cause forced flushes or that bypass the write cache due to their size are examples of write cache misses.
  • Write Cache Rehits/s Average number of write requests per second that were satisfied by the write cache since they had been referenced before and not yet flushed to the disks. Write cache rehits occur when recently accessed data is referenced again while it is still in the write cache. This is a subset of Write Cache Hits.
  • Write Cache Hit Ratio The ratio of write requests that the write cache satisfied without requiring any disk access vs. the total number of write requests to this LUN. A higher ratio indicates better write performance.
  • Write Cache Rehit Ratio The ratio of write requests that the write cache satisfied since they have been referenced before and not yet flushed to the disks vs. the total number of write requests to this LUN. This is a measure of how often the write cache succeeded in eliminating a write operation to disk. While improving the rehit ratio is useful it is more beneficial to reduce the number of forced flushes.

What’s new in FLARE 32?

We recently installed a new VNX5700 at the end of July and EMC was on-site to install it the day after the general release of FLARE 32. We had planned our pool configuration around this release, so the timing couldn’t have been more perfect. After running the new release for about a month now it’s proven to be rock solid, and to date no critical security patches have been released that we needed to apply.

The most notable new feature for us was the addition of mixed RAID types within the same pool.  We can finally use RAID6 for the large NL-SAS drives in the pool and not have to make the entire pool RAID6.  There also are several new performance enhancements that should make a big impact, including load balancing within a tier and pool rebalancing.

Below is an overview of the new features in the initial release (05.32.000.5.006).

· VNX Block to Unified Services Plug-n-play: This feature allows customers to perform Block to Unified upgrades.

· Support for In-Family upgrades: This feature allows for the In-family Data In Place (DIP) Conversion kits that are to be available for the VNX OE for File 7.1 and VNX OE for Block 05.32 release. In-family DIP conversions across the VNX family will re-use the installed SAS DAEs with associated drives and SLICs

· Windows 2008 R2: Branch Cache Support: This feature provides support for the branch cache feature in Windows 2008 R2. This feature allows a Windows client to cache a file locally in one of the branch office servers and then use the cached copy whenever the same file is being requested. The VNX for File CIFS Server operates as the Central Office Content Server in support of the Branch Cache feature.

· VAAI for NFS: Snaps of snaps: Supports snap of snap of VMDK files on a NFS file systems to at least 1 level of depth (source to snap to snap). Though the functionality will initially only be available through the VAAI for NFS interface, it may in future be exposed through the VSI plug-in as well. This feature allows VMware View 5.0 to use VAAI for NFS on a VNX system. This feature also requires that file systems be enabled during creation for “VMware VAAI nested clone support”, and the installation of the EMCNasPlugin-1-0.10.zip on the ESX servers.”

· Load balance within a tier: This feature allows for redistribution of slices across all drives in a given tier of a pool to improve performance. This also includes proactive load balancing, such that slices are relocated from one RAID group in a tier to another, based on activity level, to balance the load within the tier. These relocations will occur within the user-defined tiering relocation window.

· Improve block compression performance: This feature provides for improved block compression performance. This includes increasing speed of compression operations, decreasing storage system impact, and decreasing host response time impact.

· Deeper File Compression Algorithm: This feature provides an option to utilize deeper compression algorithm for greater capacity savings, when using file-level compression. This option can be leveraged by 3rd party application servers such as the FileMover Appliance Server, based on data types (i.e. per metadata definitions) that are best suited for the deeper compression algorithm).

· Rebalance when adding drives to Pools: This feature provides for the redistribution of slices across all drives in a newly expanded pool to improve performance.

· Conversion of DLU to TLU and back, when enabling Block Compression: This feature provides an internal mechanism (not user-invoked), when enabling Compression on a thick pool LUN, that would result in an in-place conversion from Thick (“direct”) pool LUNs to Thin Pool LUNs, rather than performing a migration. Additionally, for LUNs that were originally Thick and then converted to Thin, it provides an internal mechanism, upon disabling compression, to convert the LUNs back to Thick, without requiring a user-invoked migration.

· Mixed RAID Types in Pools: This feature allows a user to define RAID types per storage tier in pool, rather than requiring a single RAID type across all drives in a single pool.

· Improved TLU performance, no worse than 115% of a FLU: This feature provides improved TLU performance. This includes decreasing host response time impact and potentially decreasing storage system impact.

· Distinguished Compression Capacity Savings: This feature provides a display of the capacity savings due to compression. This display will inform the user of the savings due to compression, distinct from the savings due to thin provisioning. The benefit for the user is that he can determine the incremental benefit of using compression. This is necessary because there is currently a performance impact when using Compression, so users need to be able to make a cost/benefit analysis.

· Additional Tiering Policies: This feature provides additional tiering options in pools: namely, “Start High, then Auto-tier”. When the user selects this policy for a given LUN, the initial allocation is on highest tier, and subsequent tiering is based on activity.

· Additional RAID Options in Pools: This feature provides 2 additional RAID options in pools, for better efficiency: 8+1 for RAID 5, and 14+2 for RAID 6. These options will be available for new pools. These options will be available in both the GUI and the CLI.

· E-Trace Enhancements: Top files per fs and other stats: This feature allows the customer to identify the top files in a file system or quota tree. The files can be identified by pathnames instead of ids.

· Support VNX Snapshots in Unisphere Quality of Service Manager: This feature provides Unisphere Quality of Service Manager (UQM) support for both the source LUN and the snapshot LUN introduced by the VNX Snapshots feature.

· Support new VNX Features in Unisphere Analyzer: This feature provides support for all new VNX features in Unisphere Analyzer, including but not limited to VNX Snapshots and 64-bit counters.

· Unified Network Services: This feature provides several enhancements that will improve user experience. The various enhancements delivered by UNS in VNX OE for File 7.1 and VNX OE for Block 05.32 release are as follows:

· Continuous Monitoring: This feature provides the ability to monitor the vital statistics of a system continuously (out-of-box) and then take appropriate action when specific conditions are detected. The user can specify multiple statistical counters to be monitored – the default counters that will be monitored are CPU utilization, memory utilization and NFS IO latency on all file systems. The conditions when an event would be raised can also be specified by the user in terms of a threshold value and time interval during which the threshold will need to be exceeded for each statistical counter being monitored. When an event is raised, the system can perform any number of actions – possible choices are log the event, start detailed correlated statistics collection for a specified time period, send email or send a SNMP trap.

· Unisphere customization for VNX: This feature provides the addition of custom references within Unisphere Software (VNX) via editable source files for product documentation and packaging, custom badges and nameplates.

· VNX Snapshots: This feature provides VNX Snapshots (a.k.a write-in-place pointer-based snapshots) that in their initial release will support Block LUNs only and require pool-based LUNs. VNX Snapshots will support File Systems in a later release. The LUNs referred to below are pool-based LUNs (thick LUNs and Thin LUNs.)

· NDMP V4 IPv6 extension: This feature provides support for the Symantec, NetApp, and EMC authored and approved NDMP v4 IPv6 extension in order to back up using NDMP 2-way and 3-way in IPv6 networked environments.

· NDMP Access Time: This feature provides the last access time (atime). In prior releases, this was not retained during an NDMPCopy and was thus set to the time of migration. So, after a migration, the customer lost the ability to archive “cold” or “inactive” data. This feature adds an optional NDMP variable (RETAIN_ATIME=y/n, the default being ‘n’) which if set includes the atime in the NDMP data stream so that it can be restored properly on the destination.

· SRDF Interoperability for Control Station: This feature provides SRDF (Symmetrix Remote Data Facility) with the ability to manage the failover process between local and remote VNX Gateways. VNX Gateway needs a way to give up control of the failover and failback to an external entity and to suppress the initiation of these processes from within the Gateway.

 

ProSphere 1.6 Updates

ProSphere 1.6 was released this week, and it looks like EMC was listening!  Several of the updates are features that I specifically requested when I gave my feedback to EMC at EMC World.  I’m sure it’s just a coincidence, but it’s good to finally see some valuable improvements that make this product that much closer to being useful in my company’s environment.  The most important items I wanted to see was the ability to export performance data to a csv file and improved documentation on the REST API.  Both of those things were included with this release.  I haven’t looked yet to see if the performance exports can be run from a command line (a requirement for it to be useful to me for scripting).  The REST API documentation was created in the form of a help file.  It can be downloaded an run from an internal web server as well, which is what I did.

Here are the new features in v1.6:

Alerting

ProSphere can now receive Brocade alerts for monitoring and analysis. These alerts can be forwarded through SNMP traps.

Consolidation of alerts from external sources is now extended to include:

• Brocade alerts (BNA and CMCNE element managers)

• The following additional Symmetrix Management Console (SMC) alerts:
– Device Status
– Device Pool Status
– Thin Device Allocation
– Director Status
– Port Status
– Disk Status
– SMC Environmental Alert

Capacity

– Support for Federated Tiered Storage (FTS) has been added, allowing ProSphere to identify LUNs that have been presented from external storage logically, positioned behind the Unisphere for VMAX 10K, 20K and 40K.

– Service Levels are now based on the Fully Automated Storage Tier (FAST) policies defined in Symmetrix arrays. ProSphere reports on how much capacity is available for each Service Level, and how much is being consumed by each host in the environment.

Serviceability

– Users can now export ProSphere reports for performance and capacity statistics in CSV format.

Unisphere for VMAX 1.0 compatibility

– ProSphere now supports the new Unisphere for VMAX as well as Single Sign On and Launch-in-Context to the management console of the Unisphere for VMAX element manager. ProSphere, in conjunction with Unisphere for VMAX, will have the same capabilities as Symmetrix Management Console and Symmetrix Performance Analyzer.

Unisphere support

– In this release, you can launch Unisphere (CLARiiON, VNX, and Celerra) from ProSphere, but without the benefits of Single Sign On and Launch-in-Context.

Undocumented Celerra / VNX File commands

vnx1

The .server_config command is undocumented from EMC, I assume they don’t want customers messing with it. Use these commands at your own risk. 🙂

Below is a list of some of those undocumented commands, most are meant for viewing performance stats. I’ve had EMC support use the fcp command during a support call in the past.   When using the command for fcp stats,  I believe you need to run the ‘reset’ command first as it enables the collection of statistics.

There are likely other parameters that can be used with .server_config but I haven’t discovered them yet.

TCP Stats:

To view TCP info:
.server_config server_x -v “printstats tcpstat”
.server_config server_x -v “printstats tcpstat full”
.server_config server_x -v “printstats tcpstat reset”

Sample Output (truncated):
TCP stats :
connections initiated 8698
connections accepted 1039308
connections established 1047987
connections dropped 524
embryonic connections dropped 3629
conn. closed (includes drops) 1051582
segs where we tried to get rtt 8759756
times we succeeded 11650825
delayed acks sent 537525
conn. dropped in rxmt timeout 0
retransmit timeouts 823

SCSI Stats:

To view SCSI IO info:
.server_config server_x -v “printstats scsi”
.server_config server_x -v “printstats scsi reset”

Sample Output:
This output needs to be in a fixed width font to view properly.  I can’t seem to adjust the font, so I’ve attempted to add spaces to align it.
Ctlr: IO-pending Max-IO IO-total Idle(ms) Busy(ms) Busy(%)
0:      0         53    44925729       122348758     19159954   13%
1:      0                                           1 1 141508682       0          0%
2:      0                                           1 1 141508682       0          0%
3:      0                                           1 1 141508682       0          0%
4:      0                                           1 1 141508682       0          0%

File Stats:

.server_config server_x -v “printstats filewrite”
.server_config server_x -v “printstats filewrite full”
.server_config server_x -v “printstats filewrite reset”

Sample output (Full Output):
13108 writes of 1 blocks in 52105250 usec, ave 3975 usec
26 writes of 2 blocks in 256359 usec, ave 9859 usec
6 writes of 3 blocks in 18954 usec, ave 3159 usec
2 writes of 4 blocks in 2800 usec, ave 1400 usec
4 writes of 13 blocks in 6284 usec, ave 1571 usec
4 writes of 18 blocks in 7839 usec, ave 1959 usec
total 13310 blocks in 52397489 usec, ave 3936 usec

FCP Stats:

To view FCP stats, useful for checking SP balance:
.server_config server_x -v “printstats fcp”
.server_config server_x -v “printstats fcp full”
.server_config server_x -v “printstats fcp reset”

Sample Output (Truncated):
This output needs to be in a fixed width font to view properly.  I can’t seem to adjust the font, so I’ve attempted to add spaces to align it.
Total I/O Cmds: +0%——25%——-50%——-75%—–100%+ Total 0
FCP HBA 0 |                                                                                            | 0%  0
FCP HBA 1 |                                                                                            | 0%  0
FCP HBA 2 |                                                                                            | 0%  0
FCP HBA 3 |                                                                                            | 0%  0
# Read Cmds: +0%——25%——-50%——-75%—–100%+ Total 0
FCP HBA 0 |                                                                                            | 0% 0
FCP HBA 1 |                                                                                            | 0% 0
FCP HBA 2 |                                                                                            | 0% 0
FCP HBA 3 |  XXXXXXXXXXX                                                          | 25% 0

Usage:

‘fcp’ options are:       bind …, flags, locate, nsshow, portreset=n, rediscover=n
rescan, reset, show, status=n, topology, version

‘fcp bind’ options are:  clear=n, read, rebind, restore=n, show
showbackup=n, write

Description:

Commands for ‘fcp’ operations:
fcp bind <cmd> ……… Further fibre channel binding commands
fcp flags ………….. Show online flags info
fcp locate …………. Show ScsiBus and port info
fcp nsshow …………. Show nameserver info
fcp portreset=n …….. Reset fibre port n
fcp rediscover=n ……. Force fabric discovery process on port n
Bounces the link, but does not reset the port
fcp rescan …………. Force a rescan of all LUNS
fcp reset ………….. Reset all fibre ports
fcp show …………… Show fibre info
fcp status=n ……….. Show link status for port n
fcp status=n clear ….. Clear link status for port n and then Show
fcp topology ……….. Show fabric topology info
fcp version ………… Show firmware, driver and BIOS version

Commands for ‘fcp bind’ operations:
fcp bind clear=n ……. Clear the binding table in slot n
fcp bind read ………. Read the binding table
fcp bind rebind …….. Force the binding thread to run
fcp bind restore=n ….. Restore the binding table in slot n
fcp bind show ………. Show binding table info
fcp bind showbackup=n .. Show Backup binding table info in slot n
fcp bind write ……… Write the binding table

NDMP Stats:

To Check NDMP Status:
.server_config server_x -v “printstats vbb show”

CIFS Stats:

This will output a CIFS report, including all servers, DC’s, IP’s, interfaces, Mac addresses, and more.

.server_config server_x -v “cifs”

Sample Output:

1327007227: SMB: 6: 256 Cifs threads started
1327007227: SMB: 6: Security mode = NT
1327007227: SMB: 6: Max protocol = SMB2
1327007227: SMB: 6: I18N mode = UNICODE
1327007227: SMB: 6: Home Directory Shares DISABLED
1327007227: SMB: 6: Usermapper auto broadcast enabled
1327007227: SMB: 6:
1327007227: SMB: 6: Usermapper[0] = [127.0.0.1] state:active (auto discovered)
1327007227: SMB: 6:
1327007227: SMB: 6: Default WINS servers = 172.168.1.5
1327007227: SMB: 6: Enabled interfaces: (All interfaces are enabled)
1327007227: SMB: 6:
1327007227: SMB: 6: Disabled interfaces: (No interface disabled)
1327007227: SMB: 6:
1327007227: SMB: 6: Unused Interface(s):
1327007227: SMB: 6:  if=172-168-1-84 l=172.168.1.84 b=172.168.1.255 mac=0:60:48:1c:46:96
1327007227: SMB: 6:  if=172-168-1-82 l=172.168.1.82 b=172.168.1.255 mac=0:60:48:1c:10:5d
1327007227: SMB: 6:  if=172-168-1-81 l=172.168.1.81 b=172.168.1.255 mac=0:60:48:1c:46:97
1327007227: SMB: 6:
1327007227: SMB: 6:
1327007227: SMB: 6: DOMAIN DOMAIN_NAME FQDN=DOMAIN_NAME.net SITE=STL-Colo RC=24
1327007227: SMB: 6:  SID=S-1-5-15-7c531fd3-6b6745cb-ff77ddb-ffffffff
1327007227: SMB: 6:  DC=DCAD01(172.168.1.5) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD02(172.168.29.8) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD03(172.168.30.8) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD04(172.168.28.15) ref=2 time=0 ms
1327007227: SMB: 6: >DC=SERVERDCAD01(172.168.1.122) ref=334 time=1 ms (Closest Site)
1327007227: SMB: 6: >DC=SERVERDCAD02(172.168.1.121) ref=273 time=1 ms (Closest Site)
1327007227: SMB: 6:
1327007227: SMB: 6: CIFS Server SERVERFILESEMC[DOMAIN_NAME] RC=603
1327007227: UFS: 7: inc ino blk cache count: nInoAllocs 361: inoBlk 0x0219f2a308
1327007227: SMB: 6:  Full computer name=SERVERFILESEMC.DOMAIN_NAME.net realm=DOMAIN_NAME.NET
1327007227: SMB: 6:  Comment=’EMC-SNAS:T6.0.41.3′
1327007227: SMB: 6:  if=172-168-1-161 l=172.168.1.161 b=172.168.1.255 mac=0:60:48:1c:46:9c
1327007227: SMB: 6:   FQDN=SERVERFILESEMC.DOMAIN_NAME.net (Updated to DNS)
1327007227: SMB: 6:  Password change interval: 0 minutes
1327007227: SMB: 6:  Last password change: Fri Jan  7 19:25:30 2011 GMT
1327007227: SMB: 6:  Password versions: 2, 2
1327007227: SMB: 6:
1327007227: SMB: 6: CIFS Server SERVERBKUPEMC[DOMAIN_NAME] RC=2 (local users supported)
1327007227: SMB: 6:  Full computer name=SERVERbkupEMC.DOMAIN_NAME.net realm=DOMAIN_NAME.NET
1327007227: SMB: 6:  Comment=’EMC-SNAS:T6.0.41.3′
1327007227: SMB: 6:  if=172-168-1-90 l=172.168.1.90 b=172.168.1.255 mac=0:60:48:1c:10:54
1327007227: SMB: 6:   FQDN=SERVERbkupEMC.DOMAIN_NAME.net (Updated to DNS)
1327007227: SMB: 6:  Password change interval: 0 minutes
1327007227: SMB: 6:  Last password change: Thu Sep 30 16:23:50 2010 GMT
1327007227: SMB: 6:  Password versions: 2
1327007227: SMB: 6:
 

Domain Controller Commands:

These commands are useful for troubleshooting a windows domain controller connection issue on the control station.  Use these commands along with checking the normal server log (server_log server_2) to troubleshoot that type of problem.

To view the current domain controllers visible on the data mover:

.server_config server_2 -v “pdc dump”

Sample Output (Truncated):

1327006571: SMB: 6: Dump DC for dom='<domain_name>’ OrdNum=0
1327006571: SMB: 6: Domain=<domain_name> Next trusted domains update in 476 seconds1327006571: SMB: 6:  oldestDC:DomCnt=1,179531 Time=Sat Oct 15 15:32:14 2011
1327006571: SMB: 6:  Trusted domain info from DC='<Windows_DC_Servername>’ (423 seconds ago)
1327006571: SMB: 6:   Trusted domain:<domain_name>.net [<Domain_Name>]
   GUID:00000000-0000-0000-0000-000000000000
1327006571: SMB: 6:    Flags=0x20 Ix=0 Type=0x2 Attr=0x0
1327006571: SMB: 6:    SID=S-1-5-15-d1d612b1-87382668-9ba5ebc0
1327006571: SMB: 6:    DC=’-‘
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x547,1355
1327006571: SMB: 6:   Trusted domain: <Domain_Name>
1327006571: SMB: 6:    Flags=0x22 Ix=0 Type=0x1 Attr=0x1000000
1327006571: SMB: 6:    SID=S-1-5-15-76854ac0-4c527104-321d5138
1327006571: SMB: 6:    DC=’\\<Windows_DC_Servername>’
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x0,0
1327006571: SMB: 6:   Trusted domain:<domain_name>.net [<domain_name>]
1327006571: SMB: 6:    Flags=0x20 Ix=0 Type=0x2 Attr=0x0
1327006571: SMB: 6:    SID=S-1-5-15-88d60754-f3ed4f9d-b3f2cbc4
1327006571: SMB: 6:    DC=’-‘
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x547,1355
DC=DC0x0067a82c18 <Windows_DC_Servername>[<domain_name>](10.3.0.5) ref=2 time(getdc187)=0 ms LastUpdt=Thu Jan 19 20:45:14 2012
    Pid=1000 Tid=0000 Uid=0000
    Cnx=UNSUCCESSFUL,DC state Unknown
    logon=Unknown 0 SecureChannel(s):
    Capa=0x0 Nego=0x0000000000,L=0 Chal=0x0000000000,L=0,W2kFlags=0x0
    refCount=2 newElectedDC=0x0000000000 forceInvalid=0
    Discovered from: WINS

To enable or disable a domain controller on the data mover:

.server_config server_2 -v “pdc enable=<ip_address>”  Enable a domain controller

.server_config server_2 -v “pdc disable=<ip_address>”  Disable a domain controller

MemInfo:

 .server_config server_2 -v “meminfo”

Sample Output (truncated):

CPU=0
3552907011 calls to malloc, 3540029263 to free, 61954 to realloc
Size     In Use       Free      Total nallocs nfrees
16       3738        870       4608   161720370   161716632
32      18039      17289      35328   1698256206   1698238167
64       6128       3088       9216   559872733   559866605
128       6438      42138      48576   255263288   255256850
256       8682      19510      28192   286944797   286936115
512       1507       2221       3728   357926514   357925007
1024       2947       9813      12760   101064888   101061941
2048       1086        198       1284    5063873    5062787
4096         26        138        164    4854969    4854943
8192        820         11        831   19562870   19562050
16384         23         10         33       5676       5653
32768          6          1          7        101         95
65536         12          0         12         12          0
524288          1          0          1          1          0
Total Used     Total Free    Total Used + Free
all sizes   18797440   23596160   42393600

MemOwners:

.server_config server_2 -v “help memowners”

Usage:
memowners [dump | showmap | set … ]

Description:
memowners [dump] – prints memory owner description table
memowners showmap – prints a memory usage map
memowners memfrag [chunksize=#] – counts free chunks of given size
memowners set priority=# tag=# – changes dump priority for a given tag
memowners set priority=# label=’string’ – changes dump priority for a given label
The priority value can be set to 0 (lowest) to 7 (highest).

Sample Output (truncated):

1408979513: KERNEL: 6: Memory_Owner dump.
nTotal Frames 1703936 Registered = 75,  maxOwners = 128
1408979513: KERNEL: 6:   0 (   0 frames) No owner, Dump priority 6
1408979513: KERNEL: 6:   1 (3386 frames) Free list, Dump priority 0
1408979513: KERNEL: 6:   2 (40244 frames) malloc heap, Dump priority 6
1408979513: KERNEL: 6:   3 (6656 frames) physMemOwner, Dump priority 7
1408979513: KERNEL: 6:   4 (36091 frames) Reserved Mem based on E820, Dump priority 0
1408979513: KERNEL: 6:   5 (96248 frames) Address gap based on E820, Dump priority 0
1408979513: KERNEL: 6:   6 (   0 frames) Rmode isr vectors, Dump priority 7

Reporting on the state of VNX auto-tiering

 

To go along with my previous post (reporting on LUN tier distribution) I also include information on the same intranet page about the current state of the auto-tiering job.  We run auto-tiering from 10PM to 6AM in the morning to avoid the movement of data during business hours or our normal backup window in the evening.

Sometimes the auto-tiering job will get very backed up and would theoretically never finish in the time slot that we have for data movement.  I like to keep tabs on the amount of data that needs to move up or down, and the amount of time that the array estimates until it’s completion.  If needed, I will sometimes modify the schedule to run 24 hours a day over the weekend and change it back early on Monday morning.  Unfortunately, EMC did not design the auto-tiering scheduler to allow for creating different time windows on different days. It’s a manual process.

This is a relatively simple, one line CLI command, but it provides very useful info and it’s convenient to add it to a daily report to see it at a glance.

I run this script at 6AM every day, immediately following the end of the window for data to move:

naviseccli -h clariion1_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion1_hostname.autotier.txt

naviseccli -h clariion2_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion2_hostname.autotier.txt

naviseccli -h clariion3_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion3_hostname.autotier.txt

naviseccli -h clariion4_hostname autotiering -info -state -rate -schedule -opStatus > c:\inetpub\wwwroot\clariion4_hostname.autotier.txt

 ....
 The output for each individual clariion looks like this:
Auto-Tiering State: Enabled
Relocation Rate: Medium

Schedule Name: Default Schedule
Schedule State: Enabled
Default Schedule: Yes
Schedule Days: Sun Mon Tue Wed Thu Fri Sat
Schedule Start Time: 22:00
Schedule Stop Time: 6:00
Schedule Duration: 8 hours
Storage Pools: Clariion1_SPB, Clariion2_SPA

Storage Pool Name: Clariion2_SPA
Storage Pool ID: 0
Relocation Start Time: 12/05/11 22:00
Relocation Stop Time: 12/06/11 6:00
Relocation Status: Inactive
Relocation Type: Scheduled
Relocation Rate: Medium
Data to Move Up (GBs): 1854.11
Data to Move Down (GBs): 909.06
Data Movement Completed (GBs): 2316.00
Estimated Time to Complete: 9 hours, 12 minutes
Schedule Duration Remaining: None

Storage Pool Name: Clariion1_SPB
Storage Pool ID: 1
Relocation Start Time: 12/05/11 22:00
Relocation Stop Time: 12/06/11 6:00
Relocation Status: Inactive
Relocation Type: Scheduled
Relocation Rate: Medium
Data to Move Up (GBs): 1757.11
Data to Move Down (GBs): 878.05
Data Movement Completed (GBs): 1726.00
Estimated Time to Complete: 11 hours, 42 minutes
Schedule Duration Remaining: None
 
 

Celerra replication monitoring script

This script allows me to quickly monitor and verify the status of my replication jobs every morning.  It will generate a csv file with six columns for file system name, interconnect, estimated completion time, current transfer size,current transfer size remaining, and current write speed.

I recently added two more remote offices to our replication topology and I like to keep a daily tab on how much longer they have to complete the initial seeding, and it will also alert me to any other jobs that are running too long and might need my attention.

Step 1:

Log in to your Celerra and create a directory for the script.  I created a subdirectory called “scripts” under /home/nasadmin.

Create a text file named ‘replfs.list’ that contains a list of your replicated file systems.  You can cut and paste the list out of Unisphere.

The contents of the file should should look something like this:

Filesystem01
Filesystem02
Filesystem03
Filesystem04
Filesystem05
 Step 2:

Copy and paste all of the code into a text editor and modify it for your needs (the complete code is at the bottom of this post).  I’ll go through each section here with an explanation.

1: The first section will create a text file ($fs.dat) for each filesystem in the replfs.list file you made eariler.

for fs in `cat replfs.list`
         do
         nas_replicate -info $fs | egrep 'Celerra|Name|Current|Estimated' > $fs.dat
         done
 The output will look like this:
Name                                        = Filesystem_01
Source Current Data Port            = 57471
Current Transfer Size (KB)          = 232173216
Current Transfer Remain (KB)     = 230877216
Estimated Completion Time        = Thu Nov 24 06:06:07 EST 2011
Current Transfer is Full Copy      = Yes
Current Transfer Rate (KB/s)       = 160
Current Read Rate (KB/s)           = 774
Current Write Rate (KB/s)           = 3120
 2: The second section will create a blank csv file with the appropriate column headers:
echo 'Name,System,Estimated Completion Time,Current Transfer Size (KB),Current Transfer Remain (KB),Write Speed (KB)' > replreport.csv

3: The third section will parse all of the output files created by the first section, pulling out only the data that we’re interested in.  It places it in columns in the csv file.

         for fs in `cat replfs.list`

         do

         echo $fs","`grep Celerra $fs.dat | awk '{print $5}'`","`grep -i Estimated $fs.dat |awk '{print $5,$6,$7,$8,$9,$10}'`","`grep -i Size $fs.dat |awk '{print $6}'`","`grep -i Remain $fs.dat |awk '{print $6}'`","`grep -i Write $fs.dat |awk '{print $6}'` >> replreport.csv

        done
 If you’re not familiar with awk, I’ll give a brief explanation here.  When you grep for a certain line in the output code, awk will allow you to output only one word in the line.

For example, if you want the output of “Yes” put into a column in the csv file, but the output code line looks like “Current Transfer is Full Copy      = Yes”, then you could pull out only the “Yes” by typing in the following:

 nas_replicate -info Filesystem01 | grep  Full | awk '{print $7}'

Because the word ‘Yes’ is the 7th item in the line, the output would only contain the word Yes.

4: The final section will send an email with the csv output file attached.

uuencode replreport.csv replreport.csv | mail -s "Replication Status Report" user@domain.com

Step 3:

Copy and paste the modified code into a script file and save it.  I have mine saved in the /home/nasadmin/scripts folder. Once the file is created, make it executable by typing in chmod +X scriptfile.sh, and change the permissions with chmod 755 scriptfile.sh.

Step 4:

You can now add the file to crontab to run automatically.  Add it to cron by typing in crontab –e, to view your crontab entries type crontab –l.  For details on how to add cron entries, do a google search as there is a wealth of info available on your options.

Script Code:

for fs in `cat replfs.list`

         do

         nas_replicate -info $fs | egrep 'Celerra|Name|Current|Estimated' > $fs.dat

        done

 echo 'Name,System,Estimated Completion Time,Current Transfer Size (KB),Current Transfer Remain (KB),Write Speed (KB)' > replreport.csv

         for fs in `cat replfs.list`

         do

         echo $fs","`grep Celerra $fs.dat | awk '{print $5}'`","`grep -i Estimated $fs.dat |awk '{print $5,$6,$7,$8,$9,$10}'`","`grep -i Size $fs.dat |awk '{print $6}'`","`grep -i Remain $fs.dat |awk '{print $6}'`","`grep -i Write $fs.dat |awk '{print $6}'` >> replreport.csv

         done

 uuencode replreport.csv replreport.csv | mail -s "Replication Status Report" user@domain.com
 The final output of the script generates a report that looks like the sample below.  Filesystems that have all zeros and no estimated completion time are caught up and not currently performing a data synchronization.
Name System Estimated Completion Time Current Transfer Size (KB) Current Transfer Remain (KB) Write Speed (KB)
SA2Users_03 SA2VNX5500 0 0 0
SA2Users_02 SA2VNX5500 Wed Dec 16 01:16:04 EST 2011 211708152 41788152 2982
SA2Users_01 SA2VNX5500 Wed Dec 16 18:53:32 EST 2011 229431488 59655488 3425
SA2CommonFiles_04 SA2VNX5500 0 0 0
SA2CommonFiles_03 SA2VNX5500 Wed Dec 16 10:35:06 EST 2011 232173216 53853216 3105
SA2CommonFiles_02 SA2VNX5500 Mon Dec 14 15:46:33 EST 2011 56343592 12807592 2365
SA2commonFiles_01 SA2VNX5500 0 0 0

Adding/Removing modules from a datamover

I recently had an issue where a brand new datamover installed by EMC would not allow me to make it a standby for the existing datamovers.  It turns out that the hardware (specifically the number of FC and ethernet interfaces) must match PRECISELY, the number of ports and the slots the modules are installed in have to match across all datamovers.

The new datamover that was installed had an extra 4 port ethernet module installed in it.  Below is the procedure I used to remove the module, including all the commands to take it down, reconfigure it, and bring it back up successfully.  Removing the extra module solved the problem, it matched the config of the others and allowed me to configure it as a standby.

First, log in to the CLI on the control station with root priviliges.  Next, just run the commands below in order.

Turn off connecthome and emails to avoid false alarms.
 /nas/sbin/nas_connecthome -service stop
 /nas/bin/nas_emailuser -modify -enabled no
 /nas/bin/nas_emailuser -info

Copy and paste this to save, it will list the current datamover config.
 nas_server -i -a

Run this to shut the datamover down.  Run getreason to verify when it’s down.
 server_cpu server_<x> -halt now
 /nasmcd/sbin/getreason

Remove/replace the module now.

Power the datamover back on.
 /nasmcd/sbin/t2reset pwron -s <slot number>

Watch getreason for status
 /nasmcd/sbin/getreason
(Wait for it to reboot and say ‘Hardware Misconfigured’)

Once it is in a ‘misconfigured’ state, run setup_slot to configure it:
 /nasmcd/sbin/setup_slot -i 4

Run this command to view the current hardware config, verify that your change was made:
 server_sysconfig server_4 -p

Restart connecthome and email services.
 /nas/sbin/nas_connecthome -service start -clear
 /nas/sbin/nas_connecthome -i
 /nas/bin/nas_emailuser -modify -enabled yes
 /nas/bin/nas_emailuser -info

That’s it!  your datamover has been updated and reconfigured.

Use the CLI to determine replication job throughput

This handy command will allow you to determine exactly how much bandwidth you are using for your Celerra replication jobs.

Run this command first, it will generate a file with the stats for all of your replication jobs:

nas_replicate -info -all > /tmp/rep.out

Run this command next:

grep "Current Transfer Rate" /tmp/rep.out |grep -v "= 0"

The output looks like this:

Current Transfer Rate (KB/s)   = 196
 Current Transfer Rate (KB/s)   = 104
 Current Transfer Rate (KB/s)   = 91
 Current Transfer Rate (KB/s)   = 90
 Current Transfer Rate (KB/s)   = 91
 Current Transfer Rate (KB/s)   = 88
 Current Transfer Rate (KB/s)   = 94
 Current Transfer Rate (KB/s)   = 89
 Current Transfer Rate (KB/s)   = 112
 Current Transfer Rate (KB/s)   = 108
 Current Transfer Rate (KB/s)   = 91
 Current Transfer Rate (KB/s)   = 117
 Current Transfer Rate (KB/s)   = 118
 Current Transfer Rate (KB/s)   = 119
 Current Transfer Rate (KB/s)   = 112
 Current Transfer Rate (KB/s)   = 27
 Current Transfer Rate (KB/s)   = 136
 Current Transfer Rate (KB/s)   = 117
 Current Transfer Rate (KB/s)   = 242
 Current Transfer Rate (KB/s)   = 77
 Current Transfer Rate (KB/s)   = 218
 Current Transfer Rate (KB/s)   = 285
 Current Transfer Rate (KB/s)   = 287
 Current Transfer Rate (KB/s)   = 184
 Current Transfer Rate (KB/s)   = 224
 Current Transfer Rate (KB/s)   = 82
 Current Transfer Rate (KB/s)   = 324
 Current Transfer Rate (KB/s)   = 210
 Current Transfer Rate (KB/s)   = 328
 Current Transfer Rate (KB/s)   = 156
 Current Transfer Rate (KB/s)   = 156

Each line represents the throughput for one of your replication jobs.  Adding all of those numbers up will give you the amount of bandwidth you are consuming.  In this case, I’m using about 4.56MB/s on my 100MB link.

This same technique can of course be applied to any part of the output file.  If you want to know the estimated completion date of each of your replication jobs, you’d run this command against the rep.out file:

grep "Estimated Completion Time" /tmp/rep.out

That will give you a list of dates, like this:

Estimated Completion Time      = Fri Jul 15 02:12:53 EDT 2011
 Estimated Completion Time      = Fri Jul 15 08:06:33 EDT 2011
 Estimated Completion Time      = Mon Jul 18 18:35:37 EDT 2011
 Estimated Completion Time      = Wed Jul 13 15:24:03 EDT 2011
 Estimated Completion Time      = Sun Jul 24 05:35:35 EDT 2011
 Estimated Completion Time      = Tue Jul 19 16:35:25 EDT 2011
 Estimated Completion Time      = Fri Jul 15 12:10:25 EDT 2011
 Estimated Completion Time      = Sun Jul 17 16:47:31 EDT 2011
 Estimated Completion Time      = Tue Aug 30 00:30:54 EDT 2011
 Estimated Completion Time      = Sun Jul 31 03:23:08 EDT 2011
 Estimated Completion Time      = Thu Jul 14 08:12:25 EDT 2011
 Estimated Completion Time      = Thu Jul 14 20:01:55 EDT 2011
 Estimated Completion Time      = Sun Jul 31 05:19:26 EDT 2011
 Estimated Completion Time      = Thu Jul 14 17:12:41 EDT 2011

Very useful stuff. 🙂

 

Use the CLI to quickly determine the size of your Celerra checkpoint filesystems

Need to quickly figure out which checkpoint filesystems are taking up all of your precious savvol space?  Run the CLI command below.  Filling up the savvol storage pool can cause all kinds of problems besides failing checkpoints.  It can also cause filesystem replication jobs to fail.

To view it on the screen:

nas_fs -query:IsRoot==False:TypeNumeric==1 -format:’%s\n%q’ -fields:Name,Checkpoints -query:TypeNumeric==7 -format:’   %40s : %5d : %s\n’ -fields:Name,ID,Size

To save it in a file:

nas_fs -query:IsRoot==False:TypeNumeric==1 -format:’%s\n%q’ -fields:Name,Checkpoints -query:TypeNumeric==7 -format:’   %40s : %5d : %s\n’ -fields:Name,ID,Size > checkpoints.txt

vi checkpoints.txt   (to view the file)

Here’s a sample of the output:

UserFilesystem_01
ckpt_ckpt_UserFilesystem_01_monthly_001 :   836 : 220000
ckpt_ckpt_UserFilesystem_01_monthly_002 :   649 : 220000

UserFilesystem_02
ckpt_ckpt_UserFilesystem_02_monthly_001 :   836 : 80000
ckpt_ckpt_UserFilesystem_02_monthly_002 :   649 : 80000

The numbers are in MB.

 

Useful Celerra / VNX File Commands

vnx1.jpg

Here is a list of VNX OE for File and Celerra commands I keep at my desk for reference.  I have another post that references some additional undocumented commands here.

NAS Commands:

nas_disk   -list    Lists the disk table
nas_checkup     Runs a system health check.
nas_pool   -size -all    Lists available space on each defined storage pool
nas_replicate  -info –all | grep <fs>  Info about each filesystem’s replication status, grep to view just one.
nas_replicate  -list    A list of all current replications
nas_server  -list     Lists all datamovers. 1=primary,4=standby,6=rdf (remote data facility)
<watch> /nas/sbin/getreason   Shows current status of each datamover. 5=up, 0=down or rebooting
nas_fs      Creates, deletes, extends, modifies, and lists filesystems.
nas_config     Control station configuration (requires root login)
nas_version     View current nas revision
nas_ckpt_schedule    Manage  checkpoint schedule
nas_storage -list   List the attached backend storage systems (with ID’s)
nas_storage -failback id=<x>    Fail back failed over SP’s or disks
nas_server  -vdm <vdm_name> -setstate loaded      Loads a VDM
nas_server  -vdm <vdm_name> -setstate mounted    Unloads a VDM
/nas/sbin/t2reset pwron -s   This command will power on a data mover that has been shut down.  This was user submitted in the comments on this post.

Several nas_<x> commands can be run with an additional database query option for reporting purposes.  Please view my blog post about it here for more information.

Server commands:

server_cpu server_<x> -r now   Reboots a datamover
server_ping <IP>    ping any IP from the control station
server_ifconfig server_2 –all   View all configured interfaces
server_route server_2 {-list,flush,add,delete}   Routing table commands
server_mount     Mount a filesystem
server_export     Export a filesystem
server_stats     Provides realtime stats for a datamover, many different options.
server_sysconfig    Modifies hardware config of the data movers.
server_devconfig    Configures devices on the data movers.
server_sysstat     Shows current Memory, CPU, and thread utilization
server_log server_2    Shows current log
vi /nas/jserver/logs/system_log   Java System log
vi /var/log/messages    System Messages
server_ifconfig server_2 <interface_name> up  Bring up a specific interface
server_ifconfig server_2 <interface_name> down Take a specific interface down
server_date     Sets system time and NTP server settings
server_date <server_X> timesvc start ntp <time_server_IP_address>  Starts NTP on a data mover
server_date <server_X> timesvc stats ntp    To view the status of NTP.
server_date <server_X> timesvc update ntp    Forces an update of NTP
server_file     FTP equivalent command
server_dns     Configure DNS
server_cifssupport    Support services for CIFS users

To create a single checkpoint:
nas_ckpt_schedule -create <ckpt_fs_name> -filesystem <fs_name> -recurrence once

To create a Read/Write copy of a single checkpoint:
fs_ckpt <ckpt_fs_name> -name <r/w_ckpt_fs_name> -Create -readonly n 

To export a Read/Write checkpoint copy to a CIFS Share:
server_export [vdm] -P cifs -name [filesystem]_ckpt1 -option netbios=[cifserver] [filesystem]_ckpt1_writeable1

To view HBA Statistics:
.server_config server_2 -v “printstats fcp reset”  Toggles the service on/off
.server_config server_2 -v “printstats fcp full”     View the stats table (must wait a while for some stats to collect before viewing)

To Join/Unjoin a CIFS Server from the domain:
server_cifs server_2 -Join compname=SERVERNAME,domain=DOMAIN.COM,admin=ADMINID
server_cifs server_2 -Unjoin compname=SERVERNAME,domain=DOMAIN.COM,admin=ADMINID

To view the current domain controllers visible on the data mover:
.server_config server_2 -v “pdc dump”

To enable or disable a domain controller on the data mover:
.server_config server_2 -v “pdc enable=<ip_address>”  Enable a domain controller
.server_config server_2 -v “pdc disable=<ip_address>”  Disable a domain controller

To stop and start the CIFS service:
server_setup server_2 -P cifs -o stop   Stop CIFS Service
server_setup server_2 -P cifs -o start  Start CIFS Service

To stop, start or check the status of the iSCSI service:
server_iscsi server_2 -service -start     Start iSCSI service
server_iscsi server_2 -service -stop      Stop iSCSI service
server_iscsi server_2 -service -status  Check the status of the iSCSI service

To enable/disable NDMP Logging:
Turn it on:
.server_config  server_x  “logsys set  severity  NDMP=LOG_DBG2”
.server_config  server_x  “logsys set  severity  PAX=LOG_DBG2”
Turn it off:
.server_config  server_x  “logsys  set severity  NDMP=LOG_ERR”
.server_config  server_x  “logsys set severity   PAX=LOG_ERR”

For gathering performance statistics:
server_netstat server_x -i               Interface statistics
server_sysconfig server_x -v         Lists virtual devices
server_sysconfig server_x -v -i vdevice_name  Informational stats on the virtual device
server_netstat server_x -s -a tcp  Retransmissions
server_nfsstat server_x                    NFS SRTs
server_nfsstat server_x -zero        Reset NFS stats

Filesystem specific commands:

fs_ckpt      Manage Checkpoints
fs_dhsm     Manage File Mover
fs_group     Manage filesystem groups

Complete List of  “nas_”  Commands:

This is just for reference, you can easily pull up this list from a Celerra by typing nas_ and hitting the tab key.

nas_acl
nas_ckpt_schedule
nas_dbtable
nas_emailuser
nas_inventory
nas_pool
nas_slice
nas_task
nas_automountmap
nas_cmd
nas_devicegroup
nas_event
nas_license
nas_quotas
nas_stats
nas_version nas_cel
nas_copy
nas_disk
nas_fs
nas_logviewer
nas_replicate
nas_storage
nas_volume
nas_checkup
nas_cs
nas_diskmark
nas_fsck
nas_message
nas_server
nas_symm
nas_xml

Complete list of  “server_”  Commands:

This is just for reference, you can easily pull up this list from a Celerra by typing server_ and hitting the tab key.

server_archive
server_cifssupport
server_file
server_log
server_name
server_ping6
server_sysconfig
server_vtlu
server_arp
server_cpu
server_ftp
server_mgr
server_netstat
server_rip
server_sysstat
server_cdms
server_date
server_http
server_mount
server_nfs
server_route
server_tftp
server_cepp
server_dbms
server_ifconfig
server_mountpoint
server_nfsstat
server_security
server_umount
server_certificate
server_devconfig
server_ip
server_mpfs
server_nis
server_setup
server_uptime
server_checkup
server_df
server_iscsi
server_mpfsstat
server_param
server_snmpd
server_usermapper
server_cifs
server_dns
server_kerberos
server_mt
server_pax
server_standby
server_version
server_cifsstat
server_export
server_ldap
server_muxconfig
server_ping
server_stats
server_viruschk

Complete list of  “fs_” Commands:

This is just for reference, you can easily pull up this list from a Celerra by typing fs_ and hitting the tab key.

fs_ckpt
fs_dedupe
fs_dhsm
fs_group
fs_rdf
fs_timefinder