Tag Archives: data mover

What is EMC’s CAVA / Common Event Enabler?

I was recently asked to do a bit of research on EMC’s CAVA product, as we are looking for AntiVirus solutions for our CIFS based shares.  I found very little info with general google searches about exactly what CAVA is and what it does, so I thought I’d share some of the information that I did find after a bit of research and talking to my local EMC rep. 

Basically CAVA is a service runs on the Celerra (or VNX) data mover in conjunction with a Windows server running a 3rd Party Anti-Virus engine (along with EMC’s CAVA API agent) to handle the conversation.  It only facilitates the communication to an existing AV server, EMC doesn’t provide the actual AV software.  It supports Symantec, McAfee, eTrust, Sophos, Kaspersky, and Trend Micro.  In a nutshell, CAVA employs three key components:  Software on the data mover (VC Client), Software on a windows AV server (CAVA), and your 3rd party AV engine on a Windows server. 

CAVA used to stand for “Celerra Anti Virus Agent”, but was changed to “Common AntiVirus Agent”.  Quite convenient that they could re-use the “C” without changing the acronym, right? The product is now officially known as “Common Event Enabler for Windows” by EMC and the package includes CEPA, or the EMC Common Event Publishing Agent, and CAVA, the aforementioned Common Antivirus Agent.  For this post I’m focusing on the Antivirus agent.

CAVA is a fairly straightforward install, however if implemented incorrectly it can adversely affect your performance. It’s important to know how it scans your files and essential to know how to troubleshoot it and do performance monitoring.  There is definitely a performance hit when using CAVA. 

When are files scanned for a virus? 

Each time the Celerra receives a file, it will be locked for read access first, at which time a request is sent to the AV server (or servers) to scan the file.  The Celerra will send the UNC path name to the windows server and wait for verification that the file is not affected.  Once that verification is complete, the file is made available for user access. 

CAVA will scan a file in the following instances: 

  •          CAVA will scan files for a virus the first time that a file is read, subsequent to the initial implementation of CAVA and any updates to virus definitions.
  •          Creating, modifying, or moving a file
  •          When restoring a file (or files) from backup
  •          When renaming a file with a different file extension
  •          Whenever an administrator performs a full file system scan (with the server_viruschk command) 

What are the features of CAVA? 

  •          Automatic Virus Definition Updates. Files opened after the update will be re-scanned.
  •          CAVA Calculator (a free sizing tool to assist in implementation)
  •          User Notifications on Virus detection, cofigurable by administrators to be sent as notifications to the client, event log entries, or both.
  •          Scan on read can be enabled
  •          Event reporting and configuration 

What are some implementation considerations? 

  •          EMC recommends that an MPFS client system not be configured as the AV server system.
  •          CAVA doesn’t support a data mover CIFS server using share level access.
  •          Always update the viruschecker.conf file to avoid scanning temp files. It can be modified with the Celerra AV Management Snap-In.
  •          It’s CIFS only. There is no support for NFS or FTP.  If those protocols are used to open, modify, or move files the files will not be scanned.
  •          You must check for compatibility with your installed 3rd party AV software.

How is it licensed, and how much does it cost?

CAVA is licensed per array, on the VNX series it is in the Security and Compliance Suite.   Pricing will vary of course, but it’s not very expensive relative to the cost of the array.  It should be in the range of thousands rather than tens of thousands of dollars.

 

Advertisements

Celerra data mover performance and port configuration

I had a request to review my experience with data mover performance and port configuration on our production Celerras.  When I started supporting our Celerras I had no experience at all, so my current configuration is the result of trial and error troubleshooting and tackling performance problems as they appeared.

To keep this simple, I’ll review my configuration for a Celerra with only one primary data mover and one standby.  There really is no specific configuration needed on your standby data mover, just remember to perfectly match all active network ports on both primary and standby, so in the event of a failover the port configuration matches between the two.

Our primary data mover has two Ethernet modules with four ports each (for a total of eight ports).  I’ll map out how each port is configured and then explain why I did it that way.

Cge 1-0             Failsafe Config for Primary CIFS  (combined with cge1-1), assigned to ‘CIFS1’ prod file server.

Cge 1-1             Failsafe Config for Primary CIFS (combined with cge1-0), assigned to ‘CIFS1’ prod file server.

Cge 1-2             Interface configured for backup traffic, assigned to ‘CIFSBACKUP1’ server, VLAN 1.

Cge 1-3             Interface configured for backup traffic, assigned to ‘CIFSBACKUP2’ server. VLAN 1.

Cge 2-0             Interface configured for backup traffic, assigned to ‘CIFSBACKUP3’ server, VLAN 2.

Cge 2-1             Interface configured for backup traffic, assigned to ‘CIFSBACKUP4’ server, VLAN 2.

Cge 2-2             Interface configured for replication traffic, assigned to replication interconnect.

Cge 2-3             Interface configured for replication traffic, assigned to replication interconnect.

Primary CIFS Server – You do have a choice in this case to use either link aggregation or a fail safe network configuration.  Fail safe is an active/passive configuration.  If one port fails the other will take over.  I chose a fail safe configuration for several reasons, but there are good reasons to choose aggregation as well.  I chose fail safe primarily due to the ease of configuration, as there was no need for me to get the network team involved to make changes to our production switch (fail safe is configured only on the Celerra side), and our CIFS server performance requirements don’t necessitate two active links.  If you need the extra bandwidth, definitely go for aggregation.

I originally set up the fail safe network in an emergency situation, as the single interface to our prod CIFS server went down and could not be brought back online.  EMC’s answer was to reboot the data mover.  That fixed it, but it’s not such a good solution during the middle of a business day.

Backup Interfaces – We were having issues with our backups exceeding the time we had for our backup window.  In order to increase backup performance, I created four additional CIFS servers, all sharing the same file systems as production.  Our backup administrator splits the load on the four backup interfaces between multiple media servers and tape libraries (on different VLANs), and does not consume any bandwidth on the production interface that users need to access the CIFS shares.  This configuration definitely improved our backup performance.

Replication – All of our production file systems are replicated to another Celerra in a different country for disaster recovery purposes.   Because of the huge amount of data that needs to be replicated, I created two interfaces specifically for replication traffic.  Just like the backup interfaces, it separates replication traffic from the production CIFS server interface.  Even with the separate interfaces, I still have imposed a bandwidth limitation (no more than 50MB/s) in the interconnect configuration, as I need to share the same 100MB WAN link with our data domain for replication.

This configuration has proven to be very effective for me.  Our links never hit 100% utilization and I rarely get complaints about CIFS server performance.  The only real performance related troubleshooting I’ve had to do on our production CIFS servers has been related to file system deduplication, I’ve disabled it on certain file systems that see a high amount of activity.

Other thoughts about celerra configuration:

  1. We recently added a third data mover to the Celerra in our HQ data center because of the file system limitation on one data mover.  You can only have 2048 total filesystems on one data mover.  We hit that limitation due to the number of checkpoints that we keep for operational file restores.  If you make a checkpoint of one filesystem twice a day for a month, that would be 61 filesystems used against the 2048 total, which adds up quickly if you have a CIFS server filled with dozens of small shares.  I simply added another CIFS server and all new shares are now created on the new CIFS server.  The names and locations of the shares are transparent to all of our users as all file shares are presented to users with DFS links, so there were no major changes required for our Active Directory/Windows administrators.
  2. Use the Celerra monitor to keep an eye on CPU and Memory usage throughout the day.  Once you launch it from Unisphere, it runs independently of your Unisphere session (unisphere can be closed) and has a very small memory footprint on your laptop/PC.
  3. Always create your CIFS server on VDM’s, especially if you are replicating data for disaster recovery.   VDM’s are designed specifically for windows environments, allow for easy migration between data movers and allow for easy recreation of a CIFS server and it’s shares in a replication/DR scenario.  They store all the information for local groups, shares, security credentials, audit logs, and home directory info.  If you need to recreate a CIFS server from scratch, you’ll need to re-do all of those things from scratch as well.  Always use VDM’s!
  4. Write scripts for monitoring purposes.  I have only one running on my Celerras now that emails me a report of the status all replication jobs in the morning.  Of course, you can put any valid command into a bash script (adding a mailx command to email you the results), stick it in crontab, and away you go.

Adding/Removing modules from a datamover

I recently had an issue where a brand new datamover installed by EMC would not allow me to make it a standby for the existing datamovers.  It turns out that the hardware (specifically the number of FC and ethernet interfaces) must match PRECISELY, the number of ports and the slots the modules are installed in have to match across all datamovers.

The new datamover that was installed had an extra 4 port ethernet module installed in it.  Below is the procedure I used to remove the module, including all the commands to take it down, reconfigure it, and bring it back up successfully.  Removing the extra module solved the problem, it matched the config of the others and allowed me to configure it as a standby.

First, log in to the CLI on the control station with root priviliges.  Next, just run the commands below in order.

Turn off connecthome and emails to avoid false alarms.
 /nas/sbin/nas_connecthome -service stop
 /nas/bin/nas_emailuser -modify -enabled no
 /nas/bin/nas_emailuser -info

Copy and paste this to save, it will list the current datamover config.
 nas_server -i -a

Run this to shut the datamover down.  Run getreason to verify when it’s down.
 server_cpu server_<x> -halt now
 /nasmcd/sbin/getreason

Remove/replace the module now.

Power the datamover back on.
 /nasmcd/sbin/t2reset pwron -s <slot number>

Watch getreason for status
 /nasmcd/sbin/getreason
(Wait for it to reboot and say ‘Hardware Misconfigured’)

Once it is in a ‘misconfigured’ state, run setup_slot to configure it:
 /nasmcd/sbin/setup_slot -i 4

Run this command to view the current hardware config, verify that your change was made:
 server_sysconfig server_4 -p

Restart connecthome and email services.
 /nas/sbin/nas_connecthome -service start -clear
 /nas/sbin/nas_connecthome -i
 /nas/bin/nas_emailuser -modify -enabled yes
 /nas/bin/nas_emailuser -info

That’s it!  your datamover has been updated and reconfigured.

DM Interconnect failure with Celerra Replicator

We just installed a new VNX 5500 a few weeks ago in the UK, and i intially set up a VDM replication job between it and it’s replication partner, an NS-960 in Canada.  The setup went fine with no errors, and replication of the VDM has completed successfully every day up until yesterday when I noticed that the status on the main replications screen says “network communication has been lost”.   I am able to use the server_ping command to ping the data mover/replication interface from UK to Canada, so network connectivity appears to be ok.

I was attempting to set up new replication jobs for the filesystems on this VDM, and the background tasks to create the replication jobs are stuck at “Establishing communication with secondary side for Create task” with a status of “Incomplete”.

I went to the DM interconnect next to validate that it was working, and the validation test failed with the following message: “Validate Data Mover Interconnect server_2:<SAN_name>. The following interfaces cannot connect: source interface=10.x.x.x destination interface=10.x.x.x, Message_ID=13160415446: Authentication failed for DIC communication.”

So, why is the DM Interconnect is failing?   It was working fine for several weeks!

My next trip was to the server log (>server_log server_2) where I spotted another issue.  Hundreds of entries that looked just like these:

2011-07-07 16:32:07: CMD: 6: CmdReplicatev2ReversePri::startSecondary dicSt 16 cmdSt 214
2011-07-07 16:32:10: CIC: 3: <DicXmlSyncMsgService> Sending Cmd to 10.x.x.x failed (16=Bad authentication)
2011-07-07 16:32:10: CMD: 3: DicXmlSyncRequest::sendMessage sendCmd failed:16
2011-07-07 16:32:12: CIC: 3: <DicXmlSyncMsgService> Sending Cmd to 10.x.x.x failed (16=Bad authentication)
2011-07-07 16:32:12: CMD: 3: DicXmlSyncRequest::sendMessage sendCmd failed:16

Bad Authentication? Hmmm.  There is something amiss with the trusted relationship between the VNX and the NS960.  I did a quick read of EMC’s VNX replication manual (yep, rtfm!) and found the command to update the interconnect, nas_cel.

First, run nas_cel -list to view all of your interconnects, noting the ID number of the one you’re having difficulty with.

[nasadmin@<name> ~]$ nas_cel -list
id    name          owner mount_dev  channel    net_path                                      CMU
0     <name_1>  0                               10.x.x.x                                   APM007039002350000
2     <name_2>      0                           10.x.x.x                                   APM001052420000000
4     <name_3>      0                           10.x.x.x                                   APM009015016510000
5     <name_4>       0                           10.x.x.x                                  APM000827205690000

In this case, I was having trouble with <name_3>, which is ID 4.

Run this command next:  nas_cel -update id=4.   After that command completed, my interconnect immediately started working and I was able to create new replication jobs.