Tag Archives: doesn’t work

Performance Data Collection/Discovery issues in ProSphere 1.5.0

I was an early adopter of ProSphere 1.0, it was deployed at all of our data centers globally within a few weeks of it’s release.  I gave up on 1.0.3, as the syncing between instances didn’t work and EMC told me that it wouldn’t be fixed until the next major release.  Well, that next major release was 1.5 so I jumped back in when it was released in early March 2012.

My biggest frustration initially was that I performance data didn’t seem to be collecting for any of the arrays.  I was able to discover all of the arrays but there wasn’t any detailed information available for any of them.  No LUN detail, no performance data.  Why?  Well, it seems ProSphere data collection is extremely dependant on a full path discovery, from host to switch to array.  Simply discovering the arrays by themselves isn’t sufficient.  Unless at least one host is seeing the complete path the performance collection on the array is not triggered.

With that said, my next step was to get everything properly discovered.  Below is an overview of what I did to get everything discovered and performance data collection working.

1 – Switch Discovery.

Because an SMI-S agent is required to discover arrays and switches, you’ll need a separate server to run the SMI-S agents. I’m using a Windows 2008 server.  If you want to keep data collection separated between geographical locations, you’ll need to install separate instances of ProSphere at each site and have separate SMI-S agent servers at each site.  The instances can then be synchronized together in a federated configuration (in Admin | System | Synchronize ProSphere Applications).

We use brocade switches so I initially downloaded and installed the brocade SMI-S agent.  It can be downloaded directly from Brocade here:  http://www.brocade.com/services-support/drivers-downloads/smi-agent/index.page.  I installed 120.9.0 and had some issues with discoveries.  EMC later told me that I needed to use 120.11.0 or later, which didn’t seem to be available on Brocade’s website. After speaking to an EMC rep regarding the Brocade SMI-S agent version issue, it was recommended to me that I use EMC’s software instead. Either should work, however.  You can use the SMI-S agent that’s included with Connectrix Manager Converged Network Edition (CMCNE).  The product itself requires a license, but you do not need to use a license to use only the SMI-S agent.  After installation, launch “C:\CMCNE 11.1.4\bin\smc.bat” and click on the Configure SMI Agent button to add the IP’s of your switches.  The one issue I ran in to with this was user security.  Only one userid and password can be used across all switches, so you may need to create a new id/password across all of your switches.  I had to do that and spent about a half of a day finishing that up. Once you add the switches in, use the IP of the host that the agent is installed on as your target for switch discovery in ProSphere. The default userid and password is administrator / password.

Make sure that port 5988 is open on the server you’re running this agent on. If it is Windows 2K8, disable the windows firewall or add an exception for ports 5988 and 5989 as well as the SMI-S processes ECOM and SLPD.exe.

2 – EMC Array Discovery

I had initially downloaded and installed the Solutions Enabler vApp thinking that it would work for my Clariion & VNX discoveries.  I was told later (after opening an SR) that it does provide SMI-S services.  EMC has their own SMI-S agent that will need to be installed a on a separate server, as it will use the same ports (5988/5989) as the Brocade agent (or CMCNE).  It can be downloaded here:  http://powerlink.emc.com/km/appmanager/km/secureDesktop?_nfpb=true&_pageLabel=servicesDownloadsTemplatePg&internalId=0b014066800251b8&_irrt=true, or by navigating in Powerlink to Home > Support > Software Downloads and Licensing > Downloads S > SMI-S Provider.

Once EMC’s SMI-S agent is installed you’ll need to add your arrays to it.  Open a command prompt and navigate to C:\Program Files\EMC\ECIM\ECOM\bin, and launch testsmiprovider.   When it prompts, choose “localhost”, “port 5988”, and use admin / #1Password as the login credentials.  Once logged in, you can use the “addsys” command to add the IP’s of your arrays.

Just like before, make sure that port 5988 is open on the server you’re running this agent on and disable the windows firewall or add an exception for ports 5988 and 5989.  You’ll again use the IP of the host that the agent is installed on as your target for array discovery.

3 – Host Discoveries

Host discoveries can be done directly without an agent.  You can use the root password for UNIX or ESX and any AD account in windows that has local administrator rights on each server.  Of course you can also set up specialized service accounts with the appropriate rights based on your company’s security regulations.

4 – Enable Path Data Collection

In order to see specific information about LUNs on the array, you will need to enable Path Performance Collection for each host.  If the host isn’t discovered and performance collection isn’t enabled, you won’t see any LUN information when looking at arrays.  To enable it, go to Discovery | Objects list | Hosts from the ProSphere console and click on the “On | Off” slider button to turn it on for each host.

5 – Verify Full path connectivity

Once all of the discoveries are complete, you can verify full path connectivity for an array by going to Discovery | Objects list | Arrays, click on any array, and look at the map.  If there is a cloud representing a switch with a red line to the array, you’re seeing the path.  You can use the same method for a host, if you go to Discovery | Objects list | Hosts and click on a host, you should see the host, the switch fabric, and the array on the map.  If you don’t see that full path you won’t get any data collected.

Comments and Thoughts

You can go directly to EMC’s community forum for general support and information here:   https://community.emc.com/community/support/prosphere.

After using ProSphere 1.5.0  for a little while now, I must say it’s definitely NOT Control Center.  It isn’t quite as advanced or full featured, but I don’t think it’s supposed to be.  It’s supposed to be an easy to deploy tool to get basic, useful information quickly.

I use the pmcli.exe command line tool in ECC extensively for custom data exports and reporting, and ProSphere does not provide a similar tool.  EMC does have an API built in to ProSphere that can be used to pull information over http (for example, to get a host list, type https://<prosphere_app_server>/srm/hosts).  I haven’t done too much research into that feature yet.  Version 1.5 added API support for array capacity, performance, and topology data.  You can read about it more in the white paper titled “ProSphere Integration: An overview of the REST API’s and information model” (h8893), which should be available on powerlink.

My initial upgrade from 1.0.3  to 1.5.0 did not go smoothly, I had about a 50% success rate across all of my installations.  My issues related to upgrades that would work but services wouldn’t start afterwards, and in one case the update web page simply stayed blank and would not let me run the upgrade to begin with.  Beware of upgrades if you want to preserve existing historical data, I ended up deleting the vApp and starting over for most of my deployments.

I’ve only recently been able to get all of my discoveries completed. I feel that the ProSphere documentation is somewhat lacking, I found myself wanting/needing more detail in many areas.  Most of my time has been spent doing trial and error testing with a little help from EMC support after I opened an SR.  I’ll give a more detailed post in the future about actually using ProSphere in the real world once I’ve had more time to use it.

Other items to note:

-ProSphere does not provide the same level of detail for historical data that you get in StorageScope, nor does it give the same amount of detail as Performance Manager.  It’s meant more for a quick “at a glance” view.

-ProSphere does not include the root password in the documentation, customers aren’t necessarily supposed to log in to the console.  I’m sure with a call to support you could obtain it.  Having the ability to at least start and stop services would be useful, as I had an issue with one of my upgrades where services wouldn’t start.  You can view the status of the services on any ProSphere server by navigating to https://prosphere_app_server/cgi-bin/mon.cgi?command=query_opstatus_full.

-ProSphere doesn’t gather the same level of detail about hosts and storage as ECC, but that’s the price you pay for agentless discovery.  Agents are needed for more detailed information.

Advertisements

How to troubleshoot EMC Control Center WLA Archive issues

We’re running EMC Control Center 6.1 UB12, and we use it primarly for it’s robust performance data collection and reporting capabilities.  Performance Manager is a great tool and I use it frequently.

Over the years I’ve had occasional issues with the WLA Archives not collecting performance data and I’ve had to open service requests to get it fixed.  Now that I’ve been doing this for a while, I’ve collected enough info to troubleshoot this issue and correct it without EMC’s assistance in most cases.

Check your ..\WLAArchives\Archives directory and look under the Clariion (or Celerra) folder, then the folder with your array’s serial number, then the interval folder.  This is where the “*.ttp” (text) and “*.btp” (binary) performance data files are stored for Performance Manager.  Sort by date.  If there isn’t a new file that’s been written in the last few hours data is not being collected.

Here are the basic items I generally review when data isn’t being collected for an array:

  1. Log in to every array in Unisphere, go to system properties, and on the ‘General’ tab make sure statistics logging is enabled.  I’ve found that if you don’t have an analyzer license on your array and start the 7 day data collection for a “naz” file, after the 7 days is up the stats logging option will be disabled.  You’ll have to go back in and re-enable it after the 7 day collection is complete.  If stats logging isn’t enabled on the array the WLA data collection will fail.
  2. If you recently changed the password on your clarion domain account, Make sure that naviseccli is updated properly for security access to all of your arrays (use the “addusersecurity” CLI option) and perform a rediscovery of all your arrays as well from within the ECC console.  There is no way from within the ECC console to update the password on an array, you must go through the discovery process again for all of them.
  3.  Verify the agents are running.  In the ECC console, click on the gears icon in the lower right hand corner.  It will create a window that shows the status of all the agents, including the WLA Archiver.  If WLA isn’t started, you can start it by right clicking on any array, choosing Agents, then start.  Check the WLAArchives  directories again (after waiting about an hour) and see if it’s collecting data again.

If those basic steps don’t work, checking the logs may point you in the right direction:

  1.  Review the Clariion agent logs for errors.  You’re not looking for anything specific here, just do a search for “error”, “unreachable” or for the specific IP’s of your arrays and see if there is anything obvious wrong. 
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL.log
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Bx.log.gz
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL.ini
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Err.log
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Bx_Err.log
            %ECC_INSTALL_ROOT%\exec\MGL610\MGL_Discovery.log.gz
 

Here’s an example of an error I found in one case:

            MGL 14:10:18 C P I 2536   (29.94 MB) [MGLAgent::ProcessAlert] => Processing SP
            Unreachable alert. MO = APM00100600999, Context = Clariion, Category = SP
            Element = Unreachable
 

      2.   Review the WLA Agent logs.  Again, just search for errors and see if there is anything obvious that’s wrong. 

            %ECC_INSTALL_ROOT%\exec\ENW610\ENW.log
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW_Bx.log.gz
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW.ini
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW_Err.log
            %ECC_INSTALL_ROOT%\exec\ENW610\ENW_Bx_Err.log
 

If the logs don’t show anything obvious, here are the steps I take to restart everything.  This has worked on several occasions for me.

  1. From the Control Center console, stop all agents on the ECC Agent server.  Do this by right clicking on the agent server (in the left pane), choose agents and stop.  Follow the prompts from there.
  2. Log in to the ECC Agent server console and stop the master agent.  You can do this in Computer Management | Services, stop the service titled “EMC ControlCenter Master Agent”.
  3. From the Control Center console, stop all agents on the Infrastructure server.  Do this by right clicking on the agent server (in the left pane), choose agents and stop.  Follow the prompts from there.
  4. Verify that all services have stopped properly.
  5. From the ECC Agent server console, go to C:\Windows\ECC\ and delete all .comfile and .lck files.
  6. Restart all agents on the Infrastructure server.
  7. Restart the Master Agent on the Agent server.
  8. Restart all other services on the Agent server.
  9. Verify that all services have restarted properly.
  10. Wait at least an hour and check to see if the WLA Archive files are being written.

If none of these steps resolve your problem and you don’t see any errors in the logs, it’s time to open an SR with EMC.  I’ve found the EMC staff  that supports ECC to be very knowledgeable and helpful.