Tag Archives: root

VMWare/ESX can’t write to a Celerra Read/Write NFS mounted datastore

I had just created serveral new Celerra NFS mounted datastores for our ESX administrator.  When he tried to create new VM hosts using the new datastores, he would get this error:   Call “FileManager.MakeDirectory” for object “FileManager” on vCenter Server “servername.company.com” failed.

Searching for that error message on powerlink, the VMWare forums, and general google searches didn’t bring back any easy answers or solutions.  It looked like ESX was unable to write to the NFS mount for some reason, even though it was mounted as Read/Write.  I also had the ESX hosts added to the R/W access permissions for the NFS export.

After much digging and experimentation, I did resolve the problem.  Here’s what you have to check:

1. The VMKernel IP must be in the root hosts permissions on the NFS export.   I put in the IP of the ESX server along with the VMKernel IP.

2. The NFS export must be mounted with the no_root_squash option.  By default, the root user with UID 0 is not given access to an NFS volume, mounting the export with no_root_squash allows the root user access.  The VMkernal must be able to access the NFS volume with UID 0.

I first set up the exports and permissions settings in the GUI, then went to the CLI to add the mount options.
command:  server_mount server_2 -option rw,uncached,sync,no_root_squash <sharename> /<sharename>

3. From within the ESX Console/Virtual Center, the Firewall settings should be updated to add the NFS Client.   Go to ‘Configuration’ | ‘Security Profile’ | ‘Properties’ | Click the NFS Client checkbox.

4. One other important item to note when adding NFS mounted datastores is the default limit of 8 in ESX.  You can increase the limit by going to ‘Configuration’ | ‘Advanced Settings’ | ‘NFS’ in the left column | Scroll to ‘NFS.MaxVolumes’ on the left, increase the number up to 64.  If you try to add a new datastore above the NFS.MaxVolumes limit, you will get the same error in red at the top of this post.

That’s it.  Adding the VMKernel IP to the root permissions, mounting with no_root_squash, and adding the NFS Client to ESX resolved the problem.

Advertisements

VNX Root Replication Checkpoints

Where did all my savvol space go?  I noticed last week that some of my Celerra replication jobs had stalled and were not sending any new data to the replication partner.  I then noticed that the storage pool designated for checkpoints was at 100%.  Not good. Based on the number of file system checkpoints that we perform, it didn’t seem possible that the pool could be filled up already.  I opened a case with EMC to help out.

I learned something new after opening this call – every time you create a replication job, a new checkpoint is created for that job and stored in the savvol.  You can view these in Unisphere by changing the “select a type” filter to “all checkpoints including replication”.  You’ll notice checkpoints named something like root_rep_ckpt_483_72715_1 in the list, they all begin with root_rep.   After working with EMC for a little while on the case, he helped me determine that one of my replication jobs had a root_rep_ckpt that was 1.5TB in size.

Removing that checkpoint would immediately solve the problem, but there was one major drawback.  Deleting the root_rep checkpoint first requires that you delete the replication job entirely, requiring a complete re-do from scratch.  The entire filesystem would have to be copied over our WAN link and resynchronized with the replication partner Celerra.  That didn’t make me happy, but there was no choice.  At least the problem was solved.

Here are a couple of tips for you if you’re experiencing a similar issue.

You can verify the storage pool the root_rep checkpoints are using by doing an info against the checkpoint from the command line and look for the ‘pool=’ field.

nas_fs –list | grep root_rep  (the first colum in the output is the ID# for the next command)

nas_fs –info id=<id from above>

 You can also see the replication checkpoints and IDs for a particular filesystem with this command:

fs_ckpt <production file system> -list –all

You can check the size of a root_rep checkpoint from the command line directly with this command:

./nas/sbin/rootnas_fs -size root_rep_ckpt_883_72715_1