Celerra data mover performance and port configuration

I had a request to review my experience with data mover performance and port configuration on our production Celerras.  When I started supporting our Celerras I had no experience at all, so my current configuration is the result of trial and error troubleshooting and tackling performance problems as they appeared.

To keep this simple, I’ll review my configuration for a Celerra with only one primary data mover and one standby.  There really is no specific configuration needed on your standby data mover, just remember to perfectly match all active network ports on both primary and standby, so in the event of a failover the port configuration matches between the two.

Our primary data mover has two Ethernet modules with four ports each (for a total of eight ports).  I’ll map out how each port is configured and then explain why I did it that way.

Cge 1-0             Failsafe Config for Primary CIFS  (combined with cge1-1), assigned to ‘CIFS1’ prod file server.

Cge 1-1             Failsafe Config for Primary CIFS (combined with cge1-0), assigned to ‘CIFS1’ prod file server.

Cge 1-2             Interface configured for backup traffic, assigned to ‘CIFSBACKUP1’ server, VLAN 1.

Cge 1-3             Interface configured for backup traffic, assigned to ‘CIFSBACKUP2’ server. VLAN 1.

Cge 2-0             Interface configured for backup traffic, assigned to ‘CIFSBACKUP3’ server, VLAN 2.

Cge 2-1             Interface configured for backup traffic, assigned to ‘CIFSBACKUP4’ server, VLAN 2.

Cge 2-2             Interface configured for replication traffic, assigned to replication interconnect.

Cge 2-3             Interface configured for replication traffic, assigned to replication interconnect.

Primary CIFS Server – You do have a choice in this case to use either link aggregation or a fail safe network configuration.  Fail safe is an active/passive configuration.  If one port fails the other will take over.  I chose a fail safe configuration for several reasons, but there are good reasons to choose aggregation as well.  I chose fail safe primarily due to the ease of configuration, as there was no need for me to get the network team involved to make changes to our production switch (fail safe is configured only on the Celerra side), and our CIFS server performance requirements don’t necessitate two active links.  If you need the extra bandwidth, definitely go for aggregation.

I originally set up the fail safe network in an emergency situation, as the single interface to our prod CIFS server went down and could not be brought back online.  EMC’s answer was to reboot the data mover.  That fixed it, but it’s not such a good solution during the middle of a business day.

Backup Interfaces – We were having issues with our backups exceeding the time we had for our backup window.  In order to increase backup performance, I created four additional CIFS servers, all sharing the same file systems as production.  Our backup administrator splits the load on the four backup interfaces between multiple media servers and tape libraries (on different VLANs), and does not consume any bandwidth on the production interface that users need to access the CIFS shares.  This configuration definitely improved our backup performance.

Replication – All of our production file systems are replicated to another Celerra in a different country for disaster recovery purposes.   Because of the huge amount of data that needs to be replicated, I created two interfaces specifically for replication traffic.  Just like the backup interfaces, it separates replication traffic from the production CIFS server interface.  Even with the separate interfaces, I still have imposed a bandwidth limitation (no more than 50MB/s) in the interconnect configuration, as I need to share the same 100MB WAN link with our data domain for replication.

This configuration has proven to be very effective for me.  Our links never hit 100% utilization and I rarely get complaints about CIFS server performance.  The only real performance related troubleshooting I’ve had to do on our production CIFS servers has been related to file system deduplication, I’ve disabled it on certain file systems that see a high amount of activity.

Other thoughts about celerra configuration:

  1. We recently added a third data mover to the Celerra in our HQ data center because of the file system limitation on one data mover.  You can only have 2048 total filesystems on one data mover.  We hit that limitation due to the number of checkpoints that we keep for operational file restores.  If you make a checkpoint of one filesystem twice a day for a month, that would be 61 filesystems used against the 2048 total, which adds up quickly if you have a CIFS server filled with dozens of small shares.  I simply added another CIFS server and all new shares are now created on the new CIFS server.  The names and locations of the shares are transparent to all of our users as all file shares are presented to users with DFS links, so there were no major changes required for our Active Directory/Windows administrators.
  2. Use the Celerra monitor to keep an eye on CPU and Memory usage throughout the day.  Once you launch it from Unisphere, it runs independently of your Unisphere session (unisphere can be closed) and has a very small memory footprint on your laptop/PC.
  3. Always create your CIFS server on VDM’s, especially if you are replicating data for disaster recovery.   VDM’s are designed specifically for windows environments, allow for easy migration between data movers and allow for easy recreation of a CIFS server and it’s shares in a replication/DR scenario.  They store all the information for local groups, shares, security credentials, audit logs, and home directory info.  If you need to recreate a CIFS server from scratch, you’ll need to re-do all of those things from scratch as well.  Always use VDM’s!
  4. Write scripts for monitoring purposes.  I have only one running on my Celerras now that emails me a report of the status all replication jobs in the morning.  Of course, you can put any valid command into a bash script (adding a mailx command to email you the results), stick it in crontab, and away you go.
Advertisements

2 thoughts on “Celerra data mover performance and port configuration”

  1. Hi,

    I have only 1 datamover (well, 1 active and 1 stand_by). So, 4 ethernet ports in total.
    EMC told me my bandwith is 4 Gb… is it true ?

    At the moment I have 1 trunk wich is defined as 1 lacp over the 4 ports. Potential problem : the backups and data share the same bandwith (and backups never finish on sunday…).

    So, I plan to remove that trunk, and recreate 1 trunk (lacp over 2 ports) for the production data, and another trunk (lacp over 2 ports) for the backups.

    Do you think this will effectively reduce the impact of the backups on the prod data ?

    Instead of creating a lacp trunk for the backups, i could also create 2 simple interfaces. This would allow me to create 2 cifs servers for the backups… would this be efficient in terms of performance ?

    Thanks !

    jeff

    1. Hi Jeff,

      I think it’s a best practice to separate production trafic from backups to avoid backup jobs causing performance issues for users who are trying to access their data while a backup or restore job is running. Some of our jobs are run during business hours so it was especially important in my case. Your idea of creating two lacp trunks would be a good choice for redundancy, performance, and isolating backup traffic and is probably what I would go with in your case. I used multple backup CIFS servers because some of our media servers are on different subnets. During testing I saw improved performance with that configuration. If that doesn’t apply to you, I think having just one backup CIFS server over a two port trunk would be fine.

      Thanks,

      Steve

Leave a Reply