Errors when creating new replication jobs

I was attempting to create a new replication job on one of our VNX5500’s and was receiving several errors when selecting our DR NS-960 as the ‘destination celerra network server’.

It was displaying the following errors at the top of the window:

– “Query VDMs All.  Cannot access any Data Mover on the remote system, <celerra_name>”. The error details directed me to check that all the Data Moverss are accessible, that the time difference between the source and destination doesn’t exceed 10 min, and that the passphrase matches.  I confirmed that all of those were fine.

– “Query Storage Pools All.  Remote command failed:\nremote celerra – <celerra_name>\nremote exit status =0\nremote error = 0\nremote message = HTTP Error 500: Internal Server Error”.  The error details on this message say to search powerlink, not a very useful description.

– “There are no destination pools available”.  The details on this error say to check available space on the destination storage pool.  There is 3.5TB available in the pool I want to use on the destination side, so that wasn’t the issue either.

All existing replication jobs were still running fine so I knew there was not a network connectivity problem.  I reviewed the following items as well:

– I was able to validate all of the interconnects successfully, that wasn’t the issue.

– I ran nas_cel -update on the interconnects on both sides and received no errors, but it made no difference.

– I checked the server logs and didn’t see any errors relating to replication.

Not knowing where to look next, I opened an SR with EMC.  As it turns out, it was a security issue.

About a month ago an EMC CE accidently deleted our global security accounts during a service call.  I had recreated all of the deleted accounts and didn’t think there would be any further issues.  Logging in with the re-created nasadmin account after the accidental deletion was the root cause of the problem.  Here’s why:

The clariion global user account is tied to a local user account on the control station in /etc/passwd. When nasadmin was recreated on the domain, it attempted to create the nasadmin account on the control station as well.  Because the account already existed as a local account on the control station, it created a local account named ‘nasadmin1‘ instead, which is what caused the problem.  The two nasadmin accounts were no longer synchronized between the Celerra and the Clariion domain, so when logging in with the global nasadmin account you were no longer tied to the local nasadmin account on the control station.  Deleting all nasadmin accounts from the global domain and from the local /etc/passwd on the Celerra, and then recreating nasadmin in the domain solves the problem.  Because the issue was related only to the nasadmin account in this case, I could have also solved the problem by simply creating a new global account (with administrator priviliges) and using that to create the replication job.  I tested that as well and it worked fine.

Advertisements

2 thoughts on “Errors when creating new replication jobs”

  1. Thank you for sharing your vnx experiences this is one of my favourite blogs and I have about 500 rss feeds!

    Keep up the great work!

    Packetboy

Leave a Reply