Powerpath commands in AIX causing unexpected errors / initialization errors.

We recently had a problem with one of our AIX VIO servers not being able to run any powerpath commands.  Any attempt to run a command would result in an unexpected error or initialization error.   After speaking to EMC about it, the root cause is usually either running out of space on the root filesystem or having the data and stack ulimit paramenters set too low after adding a large number of new LUNs.   We are running AIX 6.1 on an IBM pSeries 550 with PowerPath 5.3 HF1.

Here are the errors that were popping up:

root@vioserver1:/script # powermt config
Unexpected error occured.

root@vioserver1:/script # powermt display dev=all
Initialization error.

root@vioserver1:/script # naviseccli -h <san_dns_name> lun -list -all
evp_enc.c(282): OpenSSL internal error, assertion failed: inl > 0
ksh: 503926 IOT/Abort trap(coredump)

Having too many LUNs caused the issue,  we had recently added an additional 35 for a total of  70.  Increasing the data and stack parameters to ‘unlimited’ resolved the problem.

root@vioserver1:/script # ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     2097151
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited

Advertisements

How to scrub/zero out data on a decommissioned VNX or Clariion

datawipe

Our audit team needed to ensure that we were properly scrubbing the old disks before sending our old Clariion back to EMC on a trade in.  EMC of course offers scrubbing services that run upwards of $4000 for an array.  They also have a built in command that will do the same job:

navicli -h <SP IP> zerodisk -messner B E D
B Bus
E Enclosure
D Disk

usage: zerodisk disk-names [start|stop|status|getzeromark]

sample: navicli -h 10.10.10.10 zerodisk -messner 1_1_12

This command will write all zero’s to the disk, making any data recovery from the disk impossible.  Add this command to a windows batch file for every disk in your array, and you’ve got a quick and easy way to zero out all the disks.

So, once the disks are zeroed out, how do you prove to the audit department that the work was done? I searched everywhere and could not find any documentation from emc on this command, which is no big surprise since you need the engineering mode switch (-messner) to run it.  Here were my observations after running it:

This is the zeromark status on 1_0_4 before running navicli -h 10.10.10.10 zerodisk -messner 1_0_4 start:

 Bus 1 Enclosure 0  Disk 4

 Zero Mark: 9223372036854775807

 This is the zeromark status on 1_0_4 after the zerodisk process is complete:

(I ran navicli -h 10.10.10.10 zerodisk -messner 1_0_4 getzeromark to get this status)

 Bus 1 Enclosure 0  Disk 4

Zero Mark: 69704

 The 69704 number indicates that the disk has been successfully scrubbed.  Prior to running the command, all disks will have an extremely long zero mark (18+ digits), after the zerodisk command completes the disks will return either a 69704 or 69760 depending on the type of disk (FC/SATA).  That’s be best I could come up with to prove that the zeroing was successful.  Running the getzeromark option on all the disks before and after the zerodisk command should be sufficient to prove that the disks were scrubbed.