Tag Archives: problem

Errors when creating new replication jobs

I was attempting to create a new replication job on one of our VNX5500’s and was receiving several errors when selecting our DR NS-960 as the ‘destination celerra network server’.

It was displaying the following errors at the top of the window:

– “Query VDMs All.  Cannot access any Data Mover on the remote system, <celerra_name>”. The error details directed me to check that all the Data Moverss are accessible, that the time difference between the source and destination doesn’t exceed 10 min, and that the passphrase matches.  I confirmed that all of those were fine.

– “Query Storage Pools All.  Remote command failed:\nremote celerra – <celerra_name>\nremote exit status =0\nremote error = 0\nremote message = HTTP Error 500: Internal Server Error”.  The error details on this message say to search powerlink, not a very useful description.

– “There are no destination pools available”.  The details on this error say to check available space on the destination storage pool.  There is 3.5TB available in the pool I want to use on the destination side, so that wasn’t the issue either.

All existing replication jobs were still running fine so I knew there was not a network connectivity problem.  I reviewed the following items as well:

– I was able to validate all of the interconnects successfully, that wasn’t the issue.

– I ran nas_cel -update on the interconnects on both sides and received no errors, but it made no difference.

– I checked the server logs and didn’t see any errors relating to replication.

Not knowing where to look next, I opened an SR with EMC.  As it turns out, it was a security issue.

About a month ago an EMC CE accidently deleted our global security accounts during a service call.  I had recreated all of the deleted accounts and didn’t think there would be any further issues.  Logging in with the re-created nasadmin account after the accidental deletion was the root cause of the problem.  Here’s why:

The clariion global user account is tied to a local user account on the control station in /etc/passwd. When nasadmin was recreated on the domain, it attempted to create the nasadmin account on the control station as well.  Because the account already existed as a local account on the control station, it created a local account named ‘nasadmin1‘ instead, which is what caused the problem.  The two nasadmin accounts were no longer synchronized between the Celerra and the Clariion domain, so when logging in with the global nasadmin account you were no longer tied to the local nasadmin account on the control station.  Deleting all nasadmin accounts from the global domain and from the local /etc/passwd on the Celerra, and then recreating nasadmin in the domain solves the problem.  Because the issue was related only to the nasadmin account in this case, I could have also solved the problem by simply creating a new global account (with administrator priviliges) and using that to create the replication job.  I tested that as well and it worked fine.

Advertisements

VMWare/ESX can’t write to a Celerra Read/Write NFS mounted datastore

I had just created serveral new Celerra NFS mounted datastores for our ESX administrator.  When he tried to create new VM hosts using the new datastores, he would get this error:   Call “FileManager.MakeDirectory” for object “FileManager” on vCenter Server “servername.company.com” failed.

Searching for that error message on powerlink, the VMWare forums, and general google searches didn’t bring back any easy answers or solutions.  It looked like ESX was unable to write to the NFS mount for some reason, even though it was mounted as Read/Write.  I also had the ESX hosts added to the R/W access permissions for the NFS export.

After much digging and experimentation, I did resolve the problem.  Here’s what you have to check:

1. The VMKernel IP must be in the root hosts permissions on the NFS export.   I put in the IP of the ESX server along with the VMKernel IP.

2. The NFS export must be mounted with the no_root_squash option.  By default, the root user with UID 0 is not given access to an NFS volume, mounting the export with no_root_squash allows the root user access.  The VMkernal must be able to access the NFS volume with UID 0.

I first set up the exports and permissions settings in the GUI, then went to the CLI to add the mount options.
command:  server_mount server_2 -option rw,uncached,sync,no_root_squash <sharename> /<sharename>

3. From within the ESX Console/Virtual Center, the Firewall settings should be updated to add the NFS Client.   Go to ‘Configuration’ | ‘Security Profile’ | ‘Properties’ | Click the NFS Client checkbox.

4. One other important item to note when adding NFS mounted datastores is the default limit of 8 in ESX.  You can increase the limit by going to ‘Configuration’ | ‘Advanced Settings’ | ‘NFS’ in the left column | Scroll to ‘NFS.MaxVolumes’ on the left, increase the number up to 64.  If you try to add a new datastore above the NFS.MaxVolumes limit, you will get the same error in red at the top of this post.

That’s it.  Adding the VMKernel IP to the root permissions, mounting with no_root_squash, and adding the NFS Client to ESX resolved the problem.