Frequent 0x622 and 0x606 errors in the SP Event Logs

During some routine checking of the SP Event logs on our NS-40 I was noticing a large number of alerts. Every few seconds I was seeing these three alerts pop in:

0x60a Internal Information Only. A logical unit has been enabled
0x622 Background Verify Aborted
0x606 Unit Shutdown for trespass
 

After a bit of investigation, I narrowed down the cause to several large LUNs that had just been added to a new ESX host.  It turns out that the LUNs were still running the background zeroing process, and that’s what was causing all the alerts in the SP Log. When you create a new LUN and the disks have been previously used for other LUNs, the new LUN needs to be “zeroed” (filled with all zeros to clear data). This takes place in the background and it is part of the LUN initialization.  Once this background zeroing process completed on my new LUNs the alert messages stopped.  I was unaware of that process, so I did a bit of research on it.

LUNs are immediately available for use after a bind (using “Fastbind”), however all the operations associated with a bind can take a long time to finish.  The duration of a LUN bind is dependent on these things:

  • LUN’s bind time background verify priority (rate)
  • Size of the LUN being bound
  • Type of drives in the LUN’s RAID Group
  • Potential disabling of initial verify on bind
  • State of the Storage System (Idle or Load)
  • Position of the LUN on the hard disks of the RAID Group

From that list, priority, LUN size, drive type and verification selection all have the greatest effect on duration.  You can calculate the approximate duration of the bind process with this formula:

Time = Bound LUN Capacity * Bind Rate

Here are the Average Bind Rates for FC and SATA disks:

Disk Type ASAP Bind Rate High Bind Rate Medium (default) Bind Rate Low Bind Rate
FC 83 MB/s 7.54 MB/s 5.02 MB/s 4.02 MB/s
SATA 61.7 MB/s 7.47 MB/s 5.09 MB/s 3.78 MB/s

If we were to calculate how many hours it would take to bind a 2000GB LUN on a five disk RAID5 group composed of SATA drives set to a medium (default) bind rate, here’s what the formula would look like:

Time = 2000 GB * ((1/5.09 MB/s) * 1024 MB->GB * (1/3600 sec->hrs) = 111.76 Hours.

There is a detailed white paper that covers this topic from EMC called “The Effect of Priorities on LUN Management Operations” that you can view here:  http://www.emc.com/collateral/hardware/white-papers/h4153-influence-priorities-emc-clariion-lun-wp.pdf.  That’s where I gathered the information above.

Advertisements