Category Archives: Control Center / ProSphere

Troubleshooting NAS Discovery issues on EMC Ionix Control Center (ECC)

We recently converted two of our existing VNX arrays to unified systems, and I was attempting to add our newly added NAS to Control Center.  I went through the normal assisted discovery using the ‘NAS Container’ option.  Unfortunately, I got an error in the discovery results window.  Here’s the error I saw:

SESSION_ACTION: Discover  [4]  MO Type = NasContainer
Container_IP=10.10.10.4 | Container_Port=443 | Container_Username=root | Container_Password=****** | Container_Type=Celerra
  command status = finished, errors
  objects found = 6  agents responding = 2
  completed in 231 seconds
  action begins at: Wed Nov 06 09:48:05 CST 2013
  action ends at: Wed Nov 06 09:51:56 CST 2013
 
Reported objects:
[1]  NasContainer=10.10.10.4
 
Reported agent errors:
[1]  ADAResult: (9) Celerra@10.10.10.4 : SSH communication failed – Please verify emcplink settings
    Responding agent: NAS Agent @ eccagtserver.rgare.net
[2]  ADAResult: (9) Celerra@10.10.10.4: nas_version returned invalid response
    Responding agent: NAS Agent @ eccinfserver.rgare.net
Responding agents:
[1]  NAS Agent @ eccagtserver.rgare.net
[2]  NAS Agent @ eccinfserver.rgare.net

 

It looks like there’s an ssh setting that’s incorrect.  Being unfamiliar with the emcplink utility, I did a bit of research on how to configure it properly, and I will go through what needs to be done.

Before diving in to using and configuring emcplink, here are some simple troubleshooting steps you should run through first:

   – Verify that the NAS agent is installed and active.  You can view all of the running agents by clicking on the gear icon on the lower right hand side (in the status bar).  Scroll through the agents and make sure the agent is active.

   – Verify that the Java Process is running on the Control Station.

         * Log in to the Control Station and type the following command:

                      ps -aex |grep java

          * If it’s running, you will see lines similar to the following:

                      21927 ? S 0:15 /usr/java/bin/java -server …..

                      22200 ? S 0:00 /usr/java/bin/java -server …..

 – Make sure that the ssh daemon is running (I’m assuing you’re using ssh for remote connectivity):

           * Log in to the Control Station and type the following command:

                       ps -aex |grep sshd

           * If it’s running, you will see a line similar to the following:

                       882 ?      Ss     0:00 /usr/sbin/sshd

 – Verify that the Celerra (or VNX) data mover is connected to an array

                       nas_storage -list

 – Verify the user name and password youre using during the assisted discovery works. Try logging on with that ID/password directly.

Here are the troubleshooting steps I took, and some more info about emcplink:

What is emcplink? It’s a utility allows you to specify security policies for secure shell (SSH) client authentication which is required for the Storage Agent for NAS to discover NAS containers.

The highest ssh security level (full security) requires that users manually run emcplink in order to provide a username and password for ssh authentication and to manually accept an ssh key returned from emcplink to discover the NAS container.  Afer it’s accepted the key is stored on the NAS Agent host. If the key changes, to rediscover the NAS container you must manually run emcplink again and accept the changed key. If your environment does not require full ssh security, use emcplink to set lower security levels that will automatically accept new or changed keys without requiring the manual entering of ssh usernames, passwords, and keys.

The emcplink command is a command line utility. To run emcplink, first open a command prompt window.  Change to the <install_root>/exec/CNN610 directory on the host where Storage Agent for NAS resides, where <install_root> is the ControlCenter infrastructure install directory.

If your installation uses SSH version 2, update your agent configuration so emcplink uses SSH version 2 when handling SSH keys, the default is version 1. Note that SSH version 2 is not backward compatible with version 1. If you switch to SSH version 2, you must run emcplink again to rediscover all NAS containers that were previously discovered with SSH version 1.

If you want to update your install to version 2, follow these steps (I did this during my troubleshooting):

1. Stop Storage Agent for NAS using the ControlCenter Console.

2. Edit the following file:

      <install_root>/exec/CNN610/cnn.ini

3. In cnn.ini [ssh] change version = 1 to version = 2

4. Save and exit cnn.ini.

5. Restart Storage Agent for NAS.

The next step is to enable the policy that you need for your environment. The default policy is EMC_SSH_KEY_SECURITY_FULL.  You can add more than one policy.  If added policies contradict one another, the most recently added policy takes effect.

Enter the following command to add or remove an SSH security policy:

       emcplink -setpolicy [+|-]policy_name

       Example:  emcplink -setpolicy +EMC_SSH_KEY_SECURITY_FULL

In my case, I first disabled the default policy, then enabled the policy that I wanted. Here are the commands I ran:

       emcplink -setpolicy -EMC_SSH_KEY_SECURITY_FULL

       emcplink -setpolicy +EMC_SSH_KEY_SECURITY_ALLOW_NEW

After running it, I used the ‘getpolicy’ option to verify what the current active policy was:

       emcplink -getpolicy

the output looks like this:

          Policy Is:
              EMC_SSH_KEY_SECURITY_ALLOW_NEW
 

Here are the policy options you can choose from:

EMC_SSH_KEY_SECURITY_FULL (default)

Do not automatically accept any new or changed NAS container keys. They must be accepted manually, using emcplink (refer to emcplink – interactive). Provides the same functionality that plink (no longer valid) provided in previous ControlCenter versions, when manual user name/password/key entry was required.

EMC_SSH_KEY_SECURITY_ALLOW_NEW

Accept new keys, but not changed keys. SSH authentication occurs automatically for initial discovery, and also for subsequent discoveries as long as the NAS Container key does not change. If a key is changed, discovery is attempted via telnet.

EMC_SSH_KEY_SECURITY_ALLOW_CHANGE

Accept changed keys, but not new keys. When a Celerra is initially discovered, SSH authentication occurs manually. If the key is changed for subsequent discoveries of that NAS Container, SSH authentication occurs automatically.

EMC_SSH_KEY_SECURITY_NONE

Accept both new and changed keys. SSH authentication occurs automatically at initial discovery and all subsequent discoveries, regardless of whether NAS container keys are changed.

After I verified the policy I wanted was running, I then manually enterted SSH security information for each array I wanted to add to accept the NAS container keys. Run the following command for each array to add the key to the server cache (the -2 optionally tells emcplink to use SSH version 2):

         emcplink -ssh -interactive -2 -pw password username@Array_IP_address

Note that SSH version 2 is not backward compatible with version 1. If you switch to SSH version 2, you must run emcplink again to rediscover all NAS containers that were previously discovered with SSH version 1.

That’s it! Once I accepted all of the ssh keys and re-ran the discoveries, the new arrays were discovered just fine.

 

Advertisements

Upgrading EMC Ionix Control Center from Update Bundle 12 (UB12) to Update Bundle 14 (UB14)

Below are the steps I just took to upgrade our ECC environment from UB12 to UB14. There are quite a few steps and EMC has this process documented very well. Every install file and patch has it’s own readme file with detailed installation instructions along with additional info about special considerations depending on your environment. If you’re about to begin this upgrade, you can use these steps as a general guide but I highly recommend reading all of EMC’s documentation as well. There are general notes, warnings, and information in their documentation that did not apply to me but may apply to you. From beginning to end the upgrade process took me about seven hours to complete.

Prerequisites for the repository server:

    • Oracle 10g OCPU Update must be installed
    • Operating system locale settings must be set to English
    • Minimum 13GB of free disk space
    • Disable IOPath (ECC_IOPath.bat –i to check, ECC_IOPath.bat –d to disable)
    • You must back up the repository for ECC and StorageScope
    • ControlCenter Environment Checker must be run
    • All infrastructure services must be stopped
    • The following additional services must be stopped: AntiVirus, COM+, Server management services, vmware tools, Distributed Transaction Coordinator, Terminal Services, and WMI.

1. Download all of the required files. I’m running ECC on two 32 bit Windows servers (one repository server and one application server), so all of the download links I provided below are for windows.

a. Download the UB14 Update Bundle (32 bit is CC_4979.zip, 64 bit is CC_5013.zip):

https://download.emc.com/downloads/DL44408_ControlCenter_Update_Bundle_14_(64-bit)_software_download.zip

https://download.emc.com/downloads/DL44407_ControlCenter_Update_Bundle_14_(32-bit)_software_download.zip

b. Download the latest cumulative Patch (currently CC_5046.zip):

https://download.emc.com/downloads/DL47699_Cumulative_6.1_Update_Bundle_14_Storage_Agent_for_HDS_Patch_5046_Download.zip

c. Download the latest Ionix ControlCenter 6.1 Backup and Restore Utility:

https://download.emc.com/downloads/DL37276_Ionix_ControlCenter_6.1_Backup_and_Restore_Utility_.zip

d. Download the latest Ionix ControlCenter Environment Checker (Currently version 1.0.5.3):

https://download.emc.com/downloads/DL37274_Ionix_ControlCenter_Environment_Checker_1.0.5.3.exe

e. Download the latest Oracle 10g OCPU Patch here (CC_5030.zip):

https://download.emc.com/downloads/DL46584_ControlCenter_6.1_OCPU_(5030)_Download.zip

f. Download the latest Unisphere Host Client/Agent:

https://download.emc.com/downloads/DL30840_Unisphere__Client_(Windows)_1.2.26.1.0094.exe

g. Download the latest Navisphere CLI (Current release is 7.32.25.1.63):

https://download.emc.com/downloads/DL30859_Navisphere_CLI_(Windows_-_all_supported_32_&_64-bit_versions)_7.32.25.1.63.exe

h. I’d recommend reviewing the UB14 Read Me file to identify any specific warnings that may be relevant to your specific installation. The Read Me file for Update Bundle 14 can be downloaded here:

https://support.emc.com/docu44826_Ionix_ControlCenter_6.1_Update_Bundle_14_Read_Me.pdf?language=en_US

2. Log in to the repository server. Run the ECC Backup and Restore Utility. Because our ECC servers are virtualized, I also ran a snapshot from the VSphere console so I could revert back if needed. Below is an overview of the steps I took, review the readme file from EMC for more detail. Note that I shut down our ECC application server for the duration of the upgrade on the repository server.

a. Run the patch to install the files

b. Go to the ECC install directory, HF4938 folder, note timestamp on regutil610.exe

c. Check that the ECC/tools/util/regutil610.exe file has the same timestamp

d. Run ECC/HF498/backup.bat (that sets all ECC services to manual as well)

e. Reboot hosts, verify ECC services are not running

f. Manually back up the ECC folder to another location

3. Run Environment checker on the repository server.

a. Make sure that all checks have passed before proceeding with the upgrade.

4. Check ECC 6.1 compatibility matrix, make sure everything is up to date and compatible

a. Here’s a link to the compatibility matrix: https://support.emc.com/docu31657_Ionix-ControlCenter-6.1-Support-Matrix.pdf?language=en_US

b. Note that UB14 requires Solutions Enabler 7.5, 7.5.1 or 7.6.

c. Host Agent must be at 1.2.x and CLI must be at 7.32.x if you’re running the latest VNX hardware.

5. Run Oracle 10g OCPU Patch (CC_5030) on the repository server if required. This step took quite a bit of time to complete. I’d expect 60-90 minutes for this step.

a. Make sure all ECC application services are stopped.

b. Run %ECC_INSTALL_ROOT%\Repository\admin\Ramb_scripts\ramb_hotback.bat

c. Run %ECC_INSTALL_ROOT%\Repository\admin\Ramb_scripts\ram_export_db.bat

d. Run %emcstsoh%\admin\emcsts_scripts\emcsts_coldback.bat

e. Run %emcstsoh%\admin\emcsts_scripts\emcsts_export_db.bat

f. Copy the contents of the %ECC_INSTALL_ROOT% to a backup location of your choice

g. Extract the SQL_PATCH_5030.zip to a local drive on the ECC Server.

h. Verify repository by running the following: %ECC_INSTALL_ROOT%\Repository\admin\Ramb_scripts\ramb_recomp_invalid.bat

i. Verify the storage scope repository by running the following: %ECC_INSTALL_ROOT%\Repository\admin\emcsts_scripts\emcsts_recomp_invalid.bat

j. If the prior two scripts ran successfully, proceed, if not call EMC.

k. Run the prereboot.bat script from the SQL_PATCH_5030 directory.

l. Reboot

m. Run the postreboot.bat script from the SQL_PATCH_5030 directory.

n. At the end of the postreboot script, it will ask if you want to revert all the services back to their original state. Respond with Y.

o. Verify successful installation by running %ECC_INSTALL_ROOT%\Repository\admin\emcsts_scripts\emcsts_hotfixstatus.bat

6. Update the Repository server with the latest Host agent and Navisphere CLI:

a. Unisphere Host Agent 1.2.25.1.0163: https://download.emc.com/downloads/DL30839_Unisphere_Host_Agent_(Windows_Â -_all_supported_32_&_64-bit_versions)__1.2.25.1.0163.exe

b. Navisphere CLI 7.32.25.1.63: https://download.emc.com/downloads/DL30859_Navisphere_CLI_(Windows_-_all_supported_32_&_64-bit_versions)_7.32.25.1.63.exe

c. A reboot is not required after installing the Agent and CLI. Make sure to preserve your security settings when asked in the installation.

7. Install a newer version of Solutions Enabler, if required. As mentioned earlier, UB14 requires version 7.5, 7.5.1 or 7.6.

a. Link to the MRLK Control Center supported version of SE (v7.5.0 – CC_5014): https://download.emc.com/downloads/DL44373_Solutions_Enabler_7.5_MRLK.zip

b. There is a readme pdf file included in the download with details on installation.

c. Stop the EMC storapid service.

d. Stop the EMC Control Center Server and Store Services.

e. Do NOT stop the OracleECCREP_HOMETNSListener or OracleServiceRAMBDB services.

f. Run the install.

g. Restart the services.

8. Prepare for UB14 Installation. Now all the prerequisite installs are complete on my system. The next step is to verify that the correct services are stopped and begin the UB14 upgrade. We have everything running on one repository server, your environment could be different.

a. On the repository server, stop the following services, set them to Manual, and reboot:

i. EMC ControlCenter API Server

ii. EMC ControlCenter Key Management Server

iii. EMC ControlCenter Master Agent

iv. EMC ControlCenter Repository

v. EMC ControlCenter Server

vi. EMC ControlCenter Store

vii. EMC ControlCenter STS MSA Web Server

viii. EMC ControlCenter Web Server

ix. EMC StorageScope Repository

x. EMC StorageScope Server

b. The following services must be stopped: AntiVirus, COM+, Server management services, vmware tools, Distributed Transaction Coordinator, and WMI.

c. Verify the following services are running before beginning the UB14 installation:

i. OracleECCREP_HOMETNSListener

ii. OracleServiceRAMBDB

iii. OracleServiceEMCSTSDB

9. Begin the UB14 Installation (CC_4079 for 32bit, CC_5013 for 64bit)

a. Execute the Patch61014383_4979_x86.exe patch (or the 64 bit patch if you’re on a 64bit server).

b. I recommend reading the UB14 readme file. If you encounter any errors, a few common install issues are listed.

c. This step takes a very long time, count on at least 3-4 hours. I also have StorageScope which added additional time.

d. After the patch install completes, change all the services back to their original ‘automatic’ state.

e. Reboot the repository server

10. Install the latest cumulative Update Bundle 14 patch next (The most current right now is CC_5046).

a. The current version of the UB14 patch can be downloaded here:

https://download.emc.com/downloads/DL47699_Cumulative_6.1_Update_Bundle_14_Storage_Agent_for_HDS_Patch_5046_Download.zip

b. Stop the EMC Control Center Store and Server Services

c. Stop the following Services:

i. EMC ControlCenter API Server

ii. EMC ControlCenter Key Management Server

iii. EMC ControlCenter Master Agent

iv. EMC ControlCenter Repository

v. EMC ControlCenter Server

vi. EMC ControlCenter Store

vii. EMC ControlCenter STS MSA Web Server

viii. EMC ControlCenter Web Server

ix. EMC StorageScope Repository

x. EMC StorageScope Server

d. Run Patch 61014615_5046_hds.exe

e. Restart all Services that were stopped earlier in steps B and C.

f. Apply patch to agents:

i. Start the ECC Console

ii. Right click on the repository server object

iii. Select Agents à Apply Patch

iv. If the task fails, restart the master agent from the console

11. Update The application server

a. Install the latest Unisphere Host Agent 1.2.25.1.0163: https://download.emc.com/downloads/DL30839_Unisphere_Host_Agent_(Windows_Â -_all_supported_32_&_64-bit_versions)__1.2.25.1.0163.exe

b. Install the latest Navisphere CLI 7.32.25.1.63: https://download.emc.com/downloads/DL30859_Navisphere_CLI_(Windows_-_all_supported_32_&_64-bit_versions)_7.32.25.1.63.exe

c. Update the ECC Console by navigating to https://<repository_server>:30002/webinstall

i. Click Installation

ii. Click Console Patch 6.1.0.14.383

d. Install Solutions Enabler on the application server (not the MRLK version)

i. Windows 32 bit download

https://download.emc.com/downloads/DL43505_se7500-WINDOWS-x86.exe.exe

ii. Windows 64 bit download:

https://download.emc.com/downloads/DL43504_se7500-WINDOWS-x64.exe.exe

iii. Stop existing ECC and Solutions Enabler services

iv. Launch Install

e. Apply Agent patches from within ECC Console

i. Start the ECC Console

ii. Right click on the application server object

iii. Select Agents à Apply Patch

iv. If the task fails, restart the master agent from the console

12. Verify that WLA Archive collection (performance data) is collecting properly.

a. Go to your WLAArchives folder

b. Go to the Clariion\<serial number>\interval subfolder

c. Sort the files by date. If new files are being written then it is working.

13. Verify that all agents are running and are at the correct patch level

a. Launch the Control Center console

b. Click on the small gear icon on the lower right side of the window, which will launch the agents view on the right side of the screen

c. Verify that all agents are running and are patched to 6.1.0.14.383. Any that require updates can be updated from this screen, right click, choose agents and install patch.

14. Remove backup directories (optional). The List is on page 16 & 17 of the UB14 readme file.

That’s it! You’re done.

Note: 24 hours after the completion of the upgrade I noticed that WLA archive data collection wasn’t working for some of the arrays. I deleted the arrays that were’nt working and rediscovered them, which resolved the problem. Deleting the arrays removes all historical data from StorageScope.

What’s new in EMC Control Center Update Bundle 14 (UB14)?

I’m in the process right now of upgrading our ECC installation from UB12 to UB14.  I’ll be making another post soon with the steps I took to complete the upgrade.  Below is a list of all the new features in UB14.

  • Added support for Oracle 11g.
  • Added Oracle 64-bit support in ControlCenter for 64-bit operating systems.
  • Added support for VMAX 1.5 and CKD devices.
  • Added support for Solutions Enabler 7.5.
  • Added support to discover and display new Federated Tiered Storage (FTS).
  • Added support for EMC CentraStar® 4.2.2.
  • Added support for Java 1.7 update 7.
  • Added support for HDS WLA by using JRE 1.7.
  • Added support for the availability of pool compression information in STS query builder while generating custom reports.
  • Added support for Mixed RAID types in VNX storage pool.
  • Added a new memory check during installation to verify if 4GB of RAM is available to install/upgrade UB14 in an all-in-one setup.
  • Added a new memory check during installation to verify if 3GB of RAM is available to install/upgrade UB14 in a distributed setup.
  • Added a new disk space check during installation to verify if minimum disk space is available for UB14 upgrade in both all-in-one and distributed setups and notify users if the requirement is not met.
  • Added support to export/import database to/from an external drive.
  • Added new free disk space availability alert.
  • Removed support for Symmetrix, SDM, and CLARiiON agents on HP-UX and AIX operating systems.
  • EMC Ionix ControlCenter 6.1 Update Bundle 14 Read Me 3
  • Removed support for Microsoft Internet Explorer version 6.0.
  • Added support to discover or rediscover Managed Objects through NAS, CLARiiON, Symmetrix, and Host Agents using DefaultHiddenDCP feature.
  • Added support to change the fully qualified domain name of non-infrastructure hosts or platforms.
  • Added support to remove virtual machines marked as Deleted on the console automatically using a scheduled job that runs every day.
  • Added support to reserve connections for WLA communications.
  • Added support to optimize the mechanism of sending MOLIST for Brocade switches.
  • Added support to optimize performance statistics collection for WLA.
  • Added support to optimize provider calls for switch and fabric discovery.
  • Added support to optimize logging and error handling in agent.
  • Added support to start processing opcode only after the agent INITX initialization.
  • Added virtual provisioning support for CLARiiON FLARE 28.
  • Added support to edit MessageReplyTimeout value in Gateway agent ini file.
  • Improved error messages in console for action.
  • Improved Store log to simplify root cause determination of issues.
  • Reduced Integration Gateway agent crashes.

Automating VNX Storage Processor Percent Utilization Alerts

Note:  The original post describes a method that requires EMC Control Center and Performance Manager.  That tool has been deprecated by EMC in favor of ViPR SRM.  There is still a method you can use to gather CPU information for use in bash scripts. I don’t have script examples that use this command, but if anyone needs help send me a comment and I’ll help. The Navisphere CLI command to get busy/idle ticks for the Storage processors is naviseccli -h getcontrol -cbt.

The output looks like this:

Controller busy ticks: 1639432
Controller idle ticks: 1773844

The SP utilization statistics outputted are an average of the utilization across all the cores of the SP’s processors since the last reset. To get the actual point-in-time SP CPU utilization from this output requires a calculation. You need to poll twice, create a delta for the individual counters by subtracting the earlier value from the later, and apply this formula:

Utilization = Busy Ticks / (Busy Ticks + Idle Ticks)

What follows is the original method I posted that requries EMC Control Center.

I was tasked with coming up with a way to get email alerts whenever our SP utilization breaks a certain threshold.  Since none of the monitoring tools that we own will do that right now, I had to come up with a way using custom scripts.  This is my 2nd post on the same subject, I removed my post from yesterday as it didn’t work as I intended.  This time I used EMC’s Performance Manager rather than pulling data from the SP with the Navisphere CLI.

First, I’m running all of my bash scripts on a windows sever using cygwin.  These should run fine on any linux box as well, however.  Because I don’t have a native sendmail configuration set up on the windows server, I’m using the control station on the Celerra to actually do the comparison of the utilization numbers in the text files and then email out an alert.  The Celerra control station automatically pulls the file via FTP from the windows server every 30 minutes and sends out an email alert if the numbers cross the threshold.  A description of each script and the schedule is below.

Windows Server:

Export.cmd:

This first windows batch script runs an export (with pmcli) from EMC Performance Manager that does a dump of all the performance stats for the current day.

For /f "tokens=2-4 delims=/ " %%a in ('date /t') do (set date=%%c%%a%%b)

C:\ECC\Client.610\PerformanceManager\pmcli.exe -export -out c:\cygwin\home\scripts\sputil999_interval.csv -type interval -class clariion -date %date% -id APM00400500999

Data.cmd:

This cygwin/bash script manipulates the file export from above and ultimately creates two single text files (one for SPA and one for SPB) with a single numerical value of the most recent SP Utilization.  There are a few extra steps at the beginning of the script that are irrelevant to the SP utilization, they’re there for other purposes.

#This will pull only the timestamp line from the top

grep -m 1 "/" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/timestamp.csv

# This will pull out only the "disk utilization" line.

grep -i "^% Utilization" /home/scripts/sputil/0999_interval.csv >> /home/scripts/sputil/stats.csv

# This will pull out the disk/LUN title info for the first column

grep -i "Data Collected for DiskStats -" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/diskstats.csv

grep -i "Data Collected for LUNStats -" /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/lunstats.csv

# This will create a column with the disk/LUN number

cat /home/scripts/sputil/diskstats.csv /home/scripts/sputil/lunstats.csv > /home/scripts/sputil/data.csv

# This combines the disk/LUN column with the data column

paste /home/scripts/sputil/data.csv /home/scripts/sputil/stats.csv > /home/scripts/sputil/combined.csv

cp /home/scripts/sputil/combined.csv /home/scripts/sputil/utilstats.csv
 

#  This removes all the temporary files
rm /home/scripts/sputil/timestamp.csv
rm /home/scripts/sputil/stats.csv
rm /home/scripts/sputil/diskstats.csv
rm /home/scripts/sputil/lunstats.csv
rm /home/scripts/sputil/data.csv
rm /home/scripts/sputil/combined.csv

# This next line strips the file of all but the last two rows, which are SP Utilization.

# The 1 looks at the first character in the row, the D specifies "starts with D", then deletes rows meeting those conditions.

awk -v FS="" -v OFS="" '$1 != "D"' < /home/scripts/sputil/utilstats.csv > /home/scripts/sputil/sputil.csv

#This pulls the values from the last column, which would be the most recent.

awk -F, '{print $(NF-1)}' < /home/scripts/sputil/sputil.csv > /home/scripts/sputil/sp_util.csv

#pull 1st line (SPA) into separate file

sed -n 1,1p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spAutil.txt

#pull 2nd line (SPB) into separate file

sed -n 2,2p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spButil.txt

#The spAutil.txt/spButil.txt files now contain only a single numerical value, which would be the most recent %utilization from the Control Center/Performance Manager dump file.

#Copy files to web server root directory

cp /home/scripts/sputil/*.txt /cygdrive/c/inetpub/wwwroot

Celerra Control Station:

CelerraArray:/home/nasadmin/sputil/ftpsp.sh

The script below connects to the windows server and grabs the current SP utilization text files via FTP every 30 minutes (via a cron job).

#!/bin/bash
cd /home/nasadmin/sputil
ftp windows_server.domain.net <<SCRIPT
get spAutil.txt
get spButil.txt
quit
SCRIPT
 CelerraArray:/home/nasadmin/sputil/spcheck.sh:

This script does the comparison check to see if the SP utilization is over our threshold. If it is, it sends an email alert that includes the %Utilization number in the subject line of the email. To change the threshold setting, you’d need to change the THRESHOLD=<XX> line in the script.  The line containing printf “%2.0f” converts the floating point value to an integer, as bash scripts don’t recognize floating point values.

#!/bin/bash

SPB=`cat /home/nasadmin/sputil/spButil.txt` 
SPBcheck= printf "%2.0f" $SPB > /home/nasadmin/sputil/spButil2.txt 
SPB=`cat /home/nasadmin/sputil/spButil2.txt`

echo $SPB
THRESHOLD=50
if [ $SPB -eq 0 ] && [ $THRESHOLD -eq 0 ] 
then 
        echo "Both are zero"
 elif [ $SPB -eq $THRESHOLD ]
 then         
        echo "Both Values are equal"
 elif [ $SPB -gt $THRESHOLD ]
 then          
        echo "SPB is greater than the threshold.  Sending alert" 

        uuencode spButil.txt | mail -s "<array_name> SPB Utilization Alert: $SPB % above threshold of $THRESHOLD %" notify@domain.com
else         
echo "$SPB is lesser than $THRESHOLD" 
fi

CelerraArray Crontab schedule:

The FTP script is currently set to pull SP utilization files.  Run “crontab –e” to edit the scheduler.  I’ve got the alert script set to run at the top of the hour and half past the hour, and the updated SP files from the web server are FTP’d in a few minutes prior.

[nasadmin@CelerraArray sputil]$ crontab –l
58,28 * * * * /home/nasadmin/sputil/ftpsp.sh
0,30 * * * * /home/nasadmin/sputil/spcheck.sh
 Overall Scheduling:

Windows Server:

Performance Manager Dump runs 15 minutes past the hour (exports data)
Data script runs at 20 minutes past the hour (processes data to get SP Utilization)

Celerra Server:

FTP script pulls new SP utilization text files at 28 minutes past the hour
Alert script runs at 30 minutes past the hour

The cycle then repeats at minute 45, minute 50, minute 58, and minute 0.

 

ProSphere 1.6 Updates

ProSphere 1.6 was released this week, and it looks like EMC was listening!  Several of the updates are features that I specifically requested when I gave my feedback to EMC at EMC World.  I’m sure it’s just a coincidence, but it’s good to finally see some valuable improvements that make this product that much closer to being useful in my company’s environment.  The most important items I wanted to see was the ability to export performance data to a csv file and improved documentation on the REST API.  Both of those things were included with this release.  I haven’t looked yet to see if the performance exports can be run from a command line (a requirement for it to be useful to me for scripting).  The REST API documentation was created in the form of a help file.  It can be downloaded an run from an internal web server as well, which is what I did.

Here are the new features in v1.6:

Alerting

ProSphere can now receive Brocade alerts for monitoring and analysis. These alerts can be forwarded through SNMP traps.

Consolidation of alerts from external sources is now extended to include:

• Brocade alerts (BNA and CMCNE element managers)

• The following additional Symmetrix Management Console (SMC) alerts:
– Device Status
– Device Pool Status
– Thin Device Allocation
– Director Status
– Port Status
– Disk Status
– SMC Environmental Alert

Capacity

– Support for Federated Tiered Storage (FTS) has been added, allowing ProSphere to identify LUNs that have been presented from external storage logically, positioned behind the Unisphere for VMAX 10K, 20K and 40K.

– Service Levels are now based on the Fully Automated Storage Tier (FAST) policies defined in Symmetrix arrays. ProSphere reports on how much capacity is available for each Service Level, and how much is being consumed by each host in the environment.

Serviceability

– Users can now export ProSphere reports for performance and capacity statistics in CSV format.

Unisphere for VMAX 1.0 compatibility

– ProSphere now supports the new Unisphere for VMAX as well as Single Sign On and Launch-in-Context to the management console of the Unisphere for VMAX element manager. ProSphere, in conjunction with Unisphere for VMAX, will have the same capabilities as Symmetrix Management Console and Symmetrix Performance Analyzer.

Unisphere support

– In this release, you can launch Unisphere (CLARiiON, VNX, and Celerra) from ProSphere, but without the benefits of Single Sign On and Launch-in-Context.

Performance Data Collection/Discovery issues in ProSphere 1.5.0

I was an early adopter of ProSphere 1.0, it was deployed at all of our data centers globally within a few weeks of it’s release.  I gave up on 1.0.3, as the syncing between instances didn’t work and EMC told me that it wouldn’t be fixed until the next major release.  Well, that next major release was 1.5 so I jumped back in when it was released in early March 2012.

My biggest frustration initially was that I performance data didn’t seem to be collecting for any of the arrays.  I was able to discover all of the arrays but there wasn’t any detailed information available for any of them.  No LUN detail, no performance data.  Why?  Well, it seems ProSphere data collection is extremely dependant on a full path discovery, from host to switch to array.  Simply discovering the arrays by themselves isn’t sufficient.  Unless at least one host is seeing the complete path the performance collection on the array is not triggered.

With that said, my next step was to get everything properly discovered.  Below is an overview of what I did to get everything discovered and performance data collection working.

1 – Switch Discovery.

Because an SMI-S agent is required to discover arrays and switches, you’ll need a separate server to run the SMI-S agents. I’m using a Windows 2008 server.  If you want to keep data collection separated between geographical locations, you’ll need to install separate instances of ProSphere at each site and have separate SMI-S agent servers at each site.  The instances can then be synchronized together in a federated configuration (in Admin | System | Synchronize ProSphere Applications).

We use brocade switches so I initially downloaded and installed the brocade SMI-S agent.  It can be downloaded directly from Brocade here:  http://www.brocade.com/services-support/drivers-downloads/smi-agent/index.page.  I installed 120.9.0 and had some issues with discoveries.  EMC later told me that I needed to use 120.11.0 or later, which didn’t seem to be available on Brocade’s website. After speaking to an EMC rep regarding the Brocade SMI-S agent version issue, it was recommended to me that I use EMC’s software instead. Either should work, however.  You can use the SMI-S agent that’s included with Connectrix Manager Converged Network Edition (CMCNE).  The product itself requires a license, but you do not need to use a license to use only the SMI-S agent.  After installation, launch “C:\CMCNE 11.1.4\bin\smc.bat” and click on the Configure SMI Agent button to add the IP’s of your switches.  The one issue I ran in to with this was user security.  Only one userid and password can be used across all switches, so you may need to create a new id/password across all of your switches.  I had to do that and spent about a half of a day finishing that up. Once you add the switches in, use the IP of the host that the agent is installed on as your target for switch discovery in ProSphere. The default userid and password is administrator / password.

Make sure that port 5988 is open on the server you’re running this agent on. If it is Windows 2K8, disable the windows firewall or add an exception for ports 5988 and 5989 as well as the SMI-S processes ECOM and SLPD.exe.

2 – EMC Array Discovery

I had initially downloaded and installed the Solutions Enabler vApp thinking that it would work for my Clariion & VNX discoveries.  I was told later (after opening an SR) that it does provide SMI-S services.  EMC has their own SMI-S agent that will need to be installed a on a separate server, as it will use the same ports (5988/5989) as the Brocade agent (or CMCNE).  It can be downloaded here:  http://powerlink.emc.com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=servicesDownloadsTemplatePg&internalId=0b014066800251b8&_irrt=true, or by navigating in Powerlink to Home > Support > Software Downloads and Licensing > Downloads S > SMI-S Provider.

Once EMC’s SMI-S agent is installed you’ll need to add your arrays to it.  Open a command prompt and navigate to C:\Program Files\EMC\ECIM\ECOM\bin, and launch testsmiprovider.   When it prompts, choose “localhost”, “port 5988”, and use admin / #1Password as the login credentials.  Once logged in, you can use the “addsys” command to add the IP’s of your arrays.

Just like before, make sure that port 5988 is open on the server you’re running this agent on and disable the windows firewall or add an exception for ports 5988 and 5989.  You’ll again use the IP of the host that the agent is installed on as your target for array discovery.

3 – Host Discoveries

Host discoveries can be done directly without an agent.  You can use the root password for UNIX or ESX and any AD account in windows that has local administrator rights on each server.  Of course you can also set up specialized service accounts with the appropriate rights based on your company’s security regulations.

4 – Enable Path Data Collection

In order to see specific information about LUNs on the array, you will need to enable Path Performance Collection for each host.  If the host isn’t discovered and performance collection isn’t enabled, you won’t see any LUN information when looking at arrays.  To enable it, go to Discovery | Objects list | Hosts from the ProSphere console and click on the “On | Off” slider button to turn it on for each host.

5 – Verify Full path connectivity

Once all of the discoveries are complete, you can verify full path connectivity for an array by going to Discovery | Objects list | Arrays, click on any array, and look at the map.  If there is a cloud representing a switch with a red line to the array, you’re seeing the path.  You can use the same method for a host, if you go to Discovery | Objects list | Hosts and click on a host, you should see the host, the switch fabric, and the array on the map.  If you don’t see that full path you won’t get any data collected.

Comments and Thoughts

You can go directly to EMC’s community forum for general support and information here:   https://community.emc.com/community/support/prosphere.

After using ProSphere 1.5.0  for a little while now, I must say it’s definitely NOT Control Center.  It isn’t quite as advanced or full featured, but I don’t think it’s supposed to be.  It’s supposed to be an easy to deploy tool to get basic, useful information quickly.

I use the pmcli.exe command line tool in ECC extensively for custom data exports and reporting, and ProSphere does not provide a similar tool.  EMC does have an API built in to ProSphere that can be used to pull information over http (for example, to get a host list, type https://<prosphere_app_server>/srm/hosts).  I haven’t done too much research into that feature yet.  Version 1.5 added API support for array capacity, performance, and topology data.  You can read about it more in the white paper titled “ProSphere Integration: An overview of the REST API’s and information model” (h8893), which should be available on powerlink.

My initial upgrade from 1.0.3  to 1.5.0 did not go smoothly, I had about a 50% success rate across all of my installations.  My issues related to upgrades that would work but services wouldn’t start afterwards, and in one case the update web page simply stayed blank and would not let me run the upgrade to begin with.  Beware of upgrades if you want to preserve existing historical data, I ended up deleting the vApp and starting over for most of my deployments.

I’ve only recently been able to get all of my discoveries completed. I feel that the ProSphere documentation is somewhat lacking, I found myself wanting/needing more detail in many areas.  Most of my time has been spent doing trial and error testing with a little help from EMC support after I opened an SR.  I’ll give a more detailed post in the future about actually using ProSphere in the real world once I’ve had more time to use it.

Other items to note:

-ProSphere does not provide the same level of detail for historical data that you get in StorageScope, nor does it give the same amount of detail as Performance Manager.  It’s meant more for a quick “at a glance” view.

-ProSphere does not include the root password in the documentation, customers aren’t necessarily supposed to log in to the console.  I’m sure with a call to support you could obtain it.  Having the ability to at least start and stop services would be useful, as I had an issue with one of my upgrades where services wouldn’t start.  You can view the status of the services on any ProSphere server by navigating to https://prosphere_app_server/cgi-bin/mon.cgi?command=query_opstatus_full.

-ProSphere doesn’t gather the same level of detail about hosts and storage as ECC, but that’s the price you pay for agentless discovery.  Agents are needed for more detailed information.

How to troubleshoot EMC Control Center WLA Archive issues

We’re running EMC Control Center 6.1 UB12, and we use it primarly for it’s robust performance data collection and reporting capabilities.  Performance Manager is a great tool and I use it frequently.

Over the years I’ve had occasional issues with the WLA Archives not collecting performance data and I’ve had to open service requests to get it fixed.  Now that I’ve been doing this for a while, I’ve collected enough info to troubleshoot this issue and correct it without EMC’s assistance in most cases.

Check your ..\WLAArchives\Archives directory and look under the Clariion (or Celerra) folder, then the folder with your array’s serial number, then the interval folder.  This is where the “*.ttp” (text) and “*.btp” (binary) performance data files are stored for Performance Manager.  Sort by date.  If there isn’t a new file that’s been written in the last few hours data is not being collected.

Here are the basic items I generally review when data isn’t being collected for an array:

  1. Log in to every array in Unisphere, go to system properties, and on the ‘General’ tab make sure statistics logging is enabled.  I’ve found that if you don’t have an analyzer license on your array and start the 7 day data collection for a “naz” file, after the 7 days is up the stats logging option will be disabled.  You’ll have to go back in and re-enable it after the 7 day collection is complete.  If stats logging isn’t enabled on the array the WLA data collection will fail.
  2. If you recently changed the password on your clarion domain account, Make sure that naviseccli is updated properly for security access to all of your arrays (use the “addusersecurity” CLI option) and perform a rediscovery of all your arrays as well from within the ECC console.  There is no way from within the ECC console to update the password on an array, you must go through the discovery process again for all of them.
  3.  Verify the agents are running.  In the ECC console, click on the gears icon in the lower right hand corner.  It will create a window that shows the status of all the agents, including the WLA Archiver.  If WLA isn’t started, you can start it by right clicking on any array, choosing Agents, then start.  Check the WLAArchives  directories again (after waiting about an hour) and see if it’s collecting data again.

If those basic steps don’t work, checking the logs may point you in the right direction:

  1.  Review the Clariion agent logs for errors.  You’re not looking for anything specific here, just do a search for “error”, “unreachable” or for the specific IP’s of your arrays and see if there is anything obvious wrong. 
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL.log
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Bx.log.gz
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL.ini
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Err.log
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Bx_Err.log
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Discovery.log.gz
 

Here’s an example of an error I found in one case:

            MGL 14:10:18 C P I 2536   (29.94 MB) [MGLAgent::ProcessAlert] => Processing SP
            Unreachable alert. MO = APM00100600999, Context = Clariion, Category = SP
            Element = Unreachable
 

      2.   Review the WLA Agent logs.  Again, just search for errors and see if there is anything obvious that’s wrong. 

            %ECC_INSTALL_ROOT%\exec\ENW610\ENW.log
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW_Bx.log.gz
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW.ini
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW_Err.log
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW_Bx_Err.log
 

If the logs don’t show anything obvious, here are the steps I take to restart everything.  This has worked on several occasions for me.

  1. From the Control Center console, stop all agents on the ECC Agent server.  Do this by right clicking on the agent server (in the left pane), choose agents and stop.  Follow the prompts from there.
  2. Log in to the ECC Agent server console and stop the master agent.  You can do this in Computer Management | Services, stop the service titled “EMC ControlCenter Master Agent”.
  3. From the Control Center console, stop all agents on the Infrastructure server.  Do this by right clicking on the agent server (in the left pane), choose agents and stop.  Follow the prompts from there.
  4. Verify that all services have stopped properly.
  5. From the ECC Agent server console, go to C:\Windows\ECC\ and delete all .comfile and .lck files.
  6. Restart all agents on the Infrastructure server.
  7. Restart the Master Agent on the Agent server.
  8. Restart all other services on the Agent server.
  9. Verify that all services have restarted properly.
  10. Wait at least an hour and check to see if the WLA Archive files are being written.

If none of these steps resolve your problem and you don’t see any errors in the logs, it’s time to open an SR with EMC.  I’ve found the EMC staff  that supports ECC to be very knowledgeable and helpful.