Alerting on VNX File SP Failover

You may see a Celerra alert description that states “Disk dx has been trespassed” or “Storage Processor is ready to restore”.  This would mean that one or more Celerra LUNs aren’t being accessed through the default SP.  It can happen during a DART upgrade, some other maintenance activity, or simply a temporary loss of connectivity to the default SP for the LUN.  I wanted to set up an email alert to let the storage team know when an SP failover occurs, so I wrote a script to send an alert email when a failover is detected.  It can be run directly on the data mover and scheduled as frequently as you like via cron.

Here’s the script:

#!/bin/bash

TODAY=$(date) 
HOST=$(hostname) 

STATUS=`cat /scripts/health/arrayFOstatus.txt`

# Checks and displays status of NAS storage, will show SP/disk Failover info. 
# We will use this info to include in the alert email if needed. 

/nas/bin/nas_storage -check -all > /scripts/health/backendcheck.txt 

#  [nasadmin@celerra]$ nas_storage -check -all     
#  Discovering storage (may take several minutes) 
#  Error 5017: storage health check failed #  CK900052700319 SPA is failed over #  CK900052700319 d6 is failed over

# Shows detailed info, I'm only pulling out failover info. 

/nas/bin/nas_storage -info -all | grep failed_over > /scripts/health/failovercheck.txt 

# The command above results in this output: 
#   failed_over = <True/False> 
#   failed_over = <True/False> 
#   failed_over = <True/False> 

# The first entry is the value for the array, second is SPA, third is SPB. 

# The next line pulls the True/False value for the entire array (the third value on the first line of output) 

echo `cat /scripts/health/failovercheck.txt | awk '{if (NR<2) print $3}'` > /scripts/health/arrayFOstatus.txt

# Now we check the value in the 'arrayFOstatus.txt' file, if it's 'True', we send an email notification that there is an SP failed over. 

# In addition to sending an email, you could also run the 'nas_storage -failback id=1' command to automatically fail it back.

if [ "$STATUS" == "False" ]; then  
   echo "Value is False" 
fi

if [ "$STATUS" == "True" ]; then  
   mail -s "SP Failover on $HOST" username@domain.com < /scripts/health/backendcheck.txt  

   #nas_storage -failback id=1 #Optionally fail it back, our team decided to alert only and fail back manually.

   echo "Value is True" 
fi

If a failover is detected, you can manually fail it back with the following commands:

Determine/Confirm the ID number:

[nasadmin@celerra]$ nas_storage -list
 id   acl    name                     serial_number
 1    0      CK900052700319 CK900052700319

Fail it back (will fail back Celerra/VNX File LUNs only):

[nasadmin@celerra]$ nas_storage -failback id=1
id  = 1  
serial_number   = CCK900052700319  
name  = CCK900052700319  
acl  = 0  
done
Advertisements

Leave a Reply