Tag Archives: celerra

Using Microsoft’s BranchCache with the Celerra & VNX

We recently moved the data from several of our small offices to being hosted at a central regional data center. In one case, a site that formerly had it’s own Celerra for file access was now accessing NAS through a WAN link.  As expected, access to files was much slower.  After doing some research on how to speed up file access for users at the branch office locations I came across Microsoft’s BranchCache feature.

BranchCache is a WAN bandwidth optimization technology and is available on Windows 7 & 8, Server 2008 R2 and Server 2012. When BrancheCache is enabled, it creates a cache of the content from the file server locally within the branch office. A client from the same network can then access the file very quickly from cache instead of having to download it across the WAN again. It’s a great feature for optimizing local link utilization, increasing the responsiveness of applications, and reducing WAN bandwidth consumption.

BranchCache can operate in either Hosted Cache Mode or Distributed Cache mode. In Hosted Cache mode, there is a Windows server configured to store the cached files. In distributed cache mode (appropriate for smaller sites) local clients in the office keep a copy of the content and make it available to other clients that access the same files.

Once your Windows Administrator has BranchCache configured on a server (or it’s been enabled on the local client PC’s), enabling it on the Celerra/VNX side is very simple. Log in to the CLI and su to gain root credentials. Then type in the following command:

server_cifs server_2 -smbhash -service enable

If you’d like to enable BranchCache auditing so the Windows server administrators can see audit info the Windows event logs, type in this command:

server_cifs server_2 -smbhash -audit enable

After that, you will need to restart the CIFS service on the data mover. Here are the commands to stop and start CIFS:

server_setup server_2 -P cifs -o stop
server_setup server_2 -P cifs -o start

To confirm that BranchCache was successfully enabled, type the following command:

server_cifs server_2 -smbhash -info

The output should look like this:

server_2:
Current smbhash parameters:
—————————
Enabled : Yes
Started : Yes

Finally, To enable hash support on each individual CIFS Share that you want to use for Branch Cache clients, use the command syntax below.  Note that EMC’s “Configuring  Branch Cache” document I link to at the bottom of this post does not contain this next command (major oversight).  It was pulled from page 58 of EMC’s Configuring CIFS Guide.

# server_export <datamover_name> -name <fs_name> -option <netbios_name_of_CIFS_Server>, type=hash /<fs_name>

EMC has a detailed document on how to configure BranchCache, including all the steps you’d need to take to configure the server and PC clients. If you have an EMC support account you can download it here:

https://support.emc.com/docu42265_Configuring-BranchCache-V2-on-VNX.pdf?language=en_US

There is also a lengthy section on BranchCache in EMC’s CIFS Management for VNX guide here:

https://mydocs.emc.com/VNXDocs/CIFS.pdf.

Here is the link to Microsoft’s TechNet guide for BranchCache:

http://technet.microsoft.com/en-us/library/hh831696.aspx.

Advertisements

What is EMC’s CAVA / Common Event Enabler?

I was recently asked to do a bit of research on EMC’s CAVA product, as we are looking for AntiVirus solutions for our CIFS based shares.  I found very little info with general google searches about exactly what CAVA is and what it does, so I thought I’d share some of the information that I did find after a bit of research and talking to my local EMC rep. 

Basically CAVA is a service runs on the Celerra (or VNX) data mover in conjunction with a Windows server running a 3rd Party Anti-Virus engine (along with EMC’s CAVA API agent) to handle the conversation.  It only facilitates the communication to an existing AV server, EMC doesn’t provide the actual AV software.  It supports Symantec, McAfee, eTrust, Sophos, Kaspersky, and Trend Micro.  In a nutshell, CAVA employs three key components:  Software on the data mover (VC Client), Software on a windows AV server (CAVA), and your 3rd party AV engine on a Windows server. 

CAVA used to stand for “Celerra Anti Virus Agent”, but was changed to “Common AntiVirus Agent”.  Quite convenient that they could re-use the “C” without changing the acronym, right? The product is now officially known as “Common Event Enabler for Windows” by EMC and the package includes CEPA, or the EMC Common Event Publishing Agent, and CAVA, the aforementioned Common Antivirus Agent.  For this post I’m focusing on the Antivirus agent.

CAVA is a fairly straightforward install, however if implemented incorrectly it can adversely affect your performance. It’s important to know how it scans your files and essential to know how to troubleshoot it and do performance monitoring.  There is definitely a performance hit when using CAVA. 

When are files scanned for a virus? 

Each time the Celerra receives a file, it will be locked for read access first, at which time a request is sent to the AV server (or servers) to scan the file.  The Celerra will send the UNC path name to the windows server and wait for verification that the file is not affected.  Once that verification is complete, the file is made available for user access. 

CAVA will scan a file in the following instances: 

  •          CAVA will scan files for a virus the first time that a file is read, subsequent to the initial implementation of CAVA and any updates to virus definitions.
  •          Creating, modifying, or moving a file
  •          When restoring a file (or files) from backup
  •          When renaming a file with a different file extension
  •          Whenever an administrator performs a full file system scan (with the server_viruschk command) 

What are the features of CAVA? 

  •          Automatic Virus Definition Updates. Files opened after the update will be re-scanned.
  •          CAVA Calculator (a free sizing tool to assist in implementation)
  •          User Notifications on Virus detection, cofigurable by administrators to be sent as notifications to the client, event log entries, or both.
  •          Scan on read can be enabled
  •          Event reporting and configuration 

What are some implementation considerations? 

  •          EMC recommends that an MPFS client system not be configured as the AV server system.
  •          CAVA doesn’t support a data mover CIFS server using share level access.
  •          Always update the viruschecker.conf file to avoid scanning temp files. It can be modified with the Celerra AV Management Snap-In.
  •          It’s CIFS only. There is no support for NFS or FTP.  If those protocols are used to open, modify, or move files the files will not be scanned.
  •          You must check for compatibility with your installed 3rd party AV software.

How is it licensed, and how much does it cost?

CAVA is licensed per array, on the VNX series it is in the Security and Compliance Suite.   Pricing will vary of course, but it’s not very expensive relative to the cost of the array.  It should be in the range of thousands rather than tens of thousands of dollars.

 

Reporting on Celerra / VNX NAS Pool capacity with a bash script

I recently created a script that I run on all of our celerras and VNX’s that reports on NAS pool size.   The output from each array is then converted to HTML and combined on a single intranet page to provide a quick at-a-glance view of our global NAS capacity and disk space consumption.  I made another post that shows how to create a block storage pool report as well:  http://emcsan.wordpress.com/2013/08/09/reporting-on-celerravnx-block-storage-pool-capacity-with-a-bash-script/

The default command unfortunately outputs in Megabytes with no option to change to GB or TB.  This script performs the MB to GB conversion and adds a comma as the numerical separator (what we use in the USA) to make the output much more readable.

First, identify the ID number for each of your NAS pools.  You’ll need to insert the ID numbers into the script itself.

[nasadmin@celerra]$ nas_pool -list
 id      inuse   acl     name                      storage system
 10      y       0       NAS_Pool0_SPA             AKM00111000000
 18      y       0       NAS_Pool1_SPB             AKM00111000000
Note that the default output of the command that provides the size of each pool is in a very hard to read format.  I wanted to clean it up to make it easier to read on our reporting page.  Here’s the default output:
[nasadmin@celerra]$ nas_pool -size -all
id           = 10
name         = NAS_Pool0_SPA
used_mb      = 3437536
avail_mb     = 658459
total_mb     = 4095995
potential_mb = 0
id           = 18
name         = NAS_Pool1_SPB
used_mb      = 2697600
avail_mb     = 374396
total_mb     = 3071996
potential_mb = 1023998
 My script changes the output to look like the example below.
Name (Site)   ; Total GB ; Used GB  ; Avail GB
 NAS_Pool0_SPA ; 4,000    ; 3,356.97 ; 643.03
 NAS_Pool1_SPB ; 3,000    ; 2,634.38 ; 365.62
 In this example there are two NAS pools and this script is set up to report on both.  It could be easily expanded or reduced depending on the number of pools on your array. The variable names I used include the Pool ID number from the output above, that should be changed to match your ID’s.  You’ll also need to update the ‘id=’ portion of each command to match your Pool ID’s.

Here’s the script:

#!/bin/bash

NAS_DB="/nas"
export NAS_DB

# Set the Locale to English/US, used for adding the comma as a separator in a cron job
export LC_NUMERIC="en_US.UTF-8"
TODAY=$(date)

 

# Gather Pool Name, Used MB, Avaialble MB, and Total MB for First Pool

# Set variable to pull the Name of the pool from the output of 'nas_pool -size'.
name18=`/nas/bin/nas_pool -size id=18 | /bin/grep name | /bin/awk '{print $3}'`

# Set variable to pull the Used MB of the pool from the output of 'nas_pool -size'.
usedmb18=`/nas/bin/nas_pool -size id=18 | /bin/grep used_mb | /bin/awk '{print $3}'`

# Set variable to pull the Available MB of the pool from the output of 'nas_pool -size'.
availmb18=`/nas/bin/nas_pool -size id=18 | /bin/grep avail_mb | /bin/awk '{print $3}'`
# Set variable to pull the Total MB of the pool from the output of 'nas_pool -size'.

totalmb18=`/nas/bin/nas_pool -size id=18 | /bin/grep total_mb | /bin/awk '{print $3}'`

# Convert MB to GB, Add Comma as separator in output

# Remove '...b' variables if you don't want commas as a separator

# Convert Used MB to Used GB
usedgb18=`/bin/echo $usedmb18/1024 | /usr/bin/bc -l | /bin/sed 's/^\./0./;s/0*$//;s/0*$//;s/\.$//'`

# Add comma separator
usedgb18b=`/usr/bin/printf "%'.2f\n" "$usedgb18" | /bin/sed 's/\.00$// ; s/\(\.[1-9]\)0$/\1/'`

# Convert Available MB to Available GB
availgb18=`/bin/echo $availmb18/1024 | /usr/bin/bc -l | /bin/sed 's/^\./0./;s/0*$//;s/0*$//;s/\.$//'`

# Add comma separator
availgb18b=`/usr/bin/printf "%'.2f\n" "$availgb18" | /bin/sed 's/\.00$// ; s/\(\.[1-9]\)0$/\1/'`

# Convert Total MB to Total GB
totalgb18=`/bin/echo $totalmb18/1024 | /usr/bin/bc -l | /bin/sed 's/^\./0./;s/0*$//;s/0*$//;s/\.$//'`

# Add comma separator
totalgb18b=`/usr/bin/printf "%'.2f\n" "$totalgb18" | /bin/sed 's/\.00$// ; s/\(\.[1-9]\)0$/\1/'`

# Gather Pool Name, Used MB, Avaialble MB, and Total MB for Second Pool

# Set variable to pull the Name of the pool from the output of 'nas_pool -size'.
name10=`/nas/bin/nas_pool -size id=10 | /bin/grep name | /bin/awk '{print $3}'`

# Set variable to pull the Used MB of the pool from the output of 'nas_pool -size'.
usedmb10=`/nas/bin/nas_pool -size id=10 | /bin/grep used_mb | /bin/awk '{print $3}'`

# Set variable to pull the Available MB of the pool from the output of 'nas_pool -size'.
availmb10=`/nas/bin/nas_pool -size id=10 | /bin/grep avail_mb | /bin/awk '{print $3}'`

# Set variable to pull the Total MB of the pool from the output of 'nas_pool -size'.
totalmb10=`/nas/bin/nas_pool -size id=10 | /bin/grep total_mb | /bin/awk '{print $3}'`
 
# Convert MB to GB, Add Comma as separator in output

# Remove '...b' variables if you don't want commas as a separator
 
# Convert Used MB to Used GB
usedgb10=`/bin/echo $usedmb10/1024 | /usr/bin/bc -l | /bin/sed 's/^\./0./;s/0*$//;s/0*$//;s/\.$//'`

# Add comma separator
usedgb10b=`/usr/bin/printf "%'.2f\n" "$usedgb10" | /bin/sed 's/\.00$// ; s/\(\.[1-9]\)0$/\1/'`

# Convert Available MB to Available GB
availgb10=`/bin/echo $availmb10/1024 | /usr/bin/bc -l | /bin/sed 's/^\./0./;s/0*$//;s/0*$//;s/\.$//'`

# Add comma separator
availgb10b=`/usr/bin/printf "%'.2f\n" "$availgb10" | /bin/sed 's/\.00$// ; s/\(\.[1-9]\)0$/\1/'`

# Convert Total MB to Total GB
totalgb10=`/bin/echo $totalmb10/1024 | /usr/bin/bc -l | /bin/sed 's/^\./0./;s/0*$//;s/0*$//;s/\.$//'`

# Add comma separator
totalgb10b=`/usr/bin/printf "%'.2f\n" "$totalgb10" | /bin/sed 's/\.00$// ; s/\(\.[1-9]\)0$/\1/'`

# Create Output File

# If you don't want the comma separator in the output file, substitute the variable without the 'b' at the end.

# I use the semicolon rather than the comma as a separator due to the fact that I'm using the comma as a numerical separator.

# The comma could be substituted here if desired.

/bin/echo $TODAY > /scripts/NasPool.txt
/bin/echo "Name" ";" "Total GB" ";" "Used GB" ";" "Avail GB" >> /scripts/NasPool.txt
/bin/echo $name18 ";" $totalgb18b ";" $usedgb18b ";" $availgb18b >> /scripts/NasPool.txt
/bin/echo $name10 ";" $totalgb10b ";" $usedgb10b ";" $availgb10b >> /scripts/NasPool.txt
 Here’s what the Output looks like:
Wed Jul 17 23:56:29 JST 2013
 Name (Site) ; Total GB ; Used GB ; Avail GB
 NAS_Pool0_SPA ; 4,000 ; 3,356.97 ; 643.03
 NAS_Pool1_SPB ; 3,000 ; 2,634.38 ; 365.62
 I use a cron job to schedule the report daily and copy it to our internal web server.  I then run the csv2html.pl perl script (from http://www.jpsdomain.org/source/perl.html) to convert it to an HTML output file to add to our intranet report page.

Note that I had to modify the csv2html.pl command to accomodate the use of a semicolon instead of the default comma in a csv file.  Here is the command I use to do the conversion:

./csv2htm.pl -e -T -D “;” -i /reports/NasPool.txt -o /reports/NasPool.html
 Below is what the output looks like after running the HTML conversion tool.

NASPool

Using the database query option on Celerra & VNX File commands

vnx1

EMC has added a hidden query option on some of the nas commands that allows you to directly query the NAS database. I’ve tested the ‘-query’ option n the nas_fs, nas_server, nas_slice and nas_disk commands, however it may be available on other commands as well. It’s a powerful command with a lot of options, so I took some time to play around with it today. The EMC documentation on this option in their command line reference guide is very sparse and I haven’t done much experimentation with it yet, but I thought I’d share what I’ve found so far.

You can view all of the possible query tags by typing in the commands below. There are dozens of tags available for query. The output list for all the commands is large enough that it’s not practical to add it into this blog post.

nas_fs -query:tags
nas_server -query:tags
nas_slice -query:tags
nas_disk -query:tags
 

Here’s a snippet of the tags available for nas_fs (the first seven):

Supported Query Tags for nas_fs:
Tag           Synopsis
———————————————————–
ACL         The Access Control List of the item
ACLCHKInProgress         Is ACLCHK running on this fs?
ATime         The UTC date of the item’s table last access
AccessPolicyTranslationErrorMessage         The error message of the MPD translation error, if any.
AccessPolicyTranslationPercentComplete         The percentage of translation that has been completed by the Access Policy translation thread for this item.
AccessPolicyTranslationState         The state of the Access Policy translation thread for this item.
AutoExtend         Auto extend True/False
 

The basic syntax for a query looks like this:

nas_fs -query:inuse==y:type=uxfs:IsRoot=False -Fields:Name,Id,StoragePoolName,Size,SizeValues -format:’%s,%s,%s,%d,%d\n’

In the above example, we are running the query based on three options. We want the file system to be in use, be of the type uxfs, and not be root. In the fields parameter, we want the file system’s name, ID, Storage Pool name, size, and size values. Because our output has five fields, and we want each file system to have it’s own line in the output, we add five formatting options separated by commas (for a csv type output), followed by a ‘\n’ to create a carriage return after each file system’s information has been outputted.

Here are the valid query operators:

= pattern match
== exact match
=- not less than
=+ not more than
=* any
=^ not having pattern
=^= not an exact match
=^- is less than
=^+ is greater than
=^* not any (none)

The format option is not well documented. The only parameters I’ve used for the format option are q and s. From what I’ve seen from testing the options, The tag used in the ‘-fields’ parameter is either simple or complex. A complex tag must be formatted with q, a simple tag must be formatted with s. Adding the ‘\n’ to the format option adds a carriage return to the output. If you use the wrong format parameter, it will display an error like this: “Error 2110: Invalid query syntax: the tag (‘[Tag Option]’) corresponding to this format specifier (“%q”) is not a complex tag”.

-format:’%s’ : Simple Formatting
-format:’%q’ : Complex Formatting

Below are some examples of using the query option. I will add more useful examples in the future after I’ve had more time to dive in to it.

To get the file system ID:

nas_fs -query:Name==[file system name] -format:’%s\n’ -fields:VolumeID

List all file systems with their sizes:

nas_fs -query:inuse==y:type=uxfs:IsRoot=False -Fields:Name,Id,StoragePoolName,Size,SizeValues -format:’%s,%s,%s,%d,%d\n’

List all file system quotas on the array:

nas_fs -query:\* -fields:ID,TreeQuotas -format:’%s:\n%q#\n’ -query:\* -fields:FileSystem,Path,BlockHardLimit,BlockSoftLimit,BlockGracePeriod,BlockUsage -format:’%s : %s : %s : %s : %s : %s\n’

List quota info for all file systems:

nas_fs -query:\* -fields:ID,TreeQuotas -format:’%s:\n%q#\n’ -query:\* -fields:FileSystem,Path,BlockHardLimit,BlockSoftLimit,BlockGracePeriod,BlockUsage -format:’%s : %s : %s : %s : %s : %s\n’

List all Checkpoint file systems:

nas_fs -query:inuse=y:type=ckpt:isroot=false -fields:ServersNumeric,Id,Name,SizeValues -format:’%s,%s,%s,%s\n’

List all the Server_params for a specific server_X:

nas_server -query:Name=server_2 -format:’%q’ -fields:ParamTable -query:Name== -fields:ChangeEffective,ConfiguredDefaultValue,ConfiguredValue,Current,CurrentValue,Default,Facility,IsRebootRequired,IsVisible,Name,Type,Description -format:’%s|%s|
%s|%s|%s|%s|%s|%s|%s|%s|%s|%s\n’

List the current Wins/DNSDomain Config:

nas_server -query:Name=server -fields:Name,DefaultWINS,DNSDomain -format:”%s,%s,%s\n”

For a detailed file system deduplication report:

nas_fs -query:inuse=y:isroot=false:type=uxfs -fields:Deduplications -format:’%q’ -query:RdeState=On -fields:Name,FsCapacity,SpaceSaved,SpaceReducedDataSize,DedupeRate,UnreducedDataSize,TimeOfLastScan -format:’%s,%s,%s,%s,%s,%s,%s\\n’

Disk space reporting on sub folders on a VNX File CIFS shared file system

I recently had a comment on a different post asking how to report on the size of multiple folders on a single file system, so I thought I’d share the method I use to create reports and alerts on the space that sub folders on file systems consume. There is a way to navigate to all of the file systems from the control station, simply navigate to /nas/quota/slot_<x>. Slot_<x> refers to the data mover. The file systems on server_2 would be in the slot_2 folder.  Because we have access to those folders, we can simply run the standard unix ‘du’ command to get the amount of used disk space for each sub folder on any file system.

Running this command:

sudo du -h /nas/quota/slot_2/File_System/Sub_Folder

Will give an output that looks like this:

24K     /nas/quota/slot_2/File_System/Sub_Folder/1
0       /nas/quota/slot_2/File_System/Sub_Folder/2
16K     /nas/quota/slot_2/File_System/Sub_Folder

Each sub folder of the file system named “File_System” is listed with the space each sub folder uses. You’ll notice that I used the sudo command to run the du command. Unfortunately you need root access in order to run the du command on the /nas/quota directory, so you’ll have to add whatever account you log in to the celerra with to the /etc/sudoers file. If you don’t, you’ll get the error “Sorry, user nasadmin is not allowed to execute ‘/usr/bin/du -h -s /nas/quota/slot_2/File_System’ as root on “. I generally log in to the celerra as nasadmin, so I added that account to sudoers. To modify the file, su to root first and then (using vi) add the following to the very bottom of the file of /etc/sudoers (substitute nasadmin for the username you will be using):

nasadmin ALL=/usr/bin/du, /usr/bin/, /sbin/, /usr/sbin, /nas/bin, /nas/bin/, /nas/sbin, /nas/sbin/, /bin, /bin/
nasadmin ALL=/nas/bin/server_df

Once that is complete, you’ll have access to run the command.  You can now write a script that will report on specific folders for you. I have two scripts created that I’m going to share, one I use to send out a daily email report and the other will send an email only if the folder sizes have crossed a certain threshold of space utilization. I have them both scheduled as a cron job on the Celerra itself.  It should be easy to modify these scripts for anyone’s environment.

The following script will generate a report of folder and sub folder sizes for any given file system. In this example, I am reporting on three specific subfolders on a file system (Production, Development, and Test). I also use the grep and awk options at the beginning to pull out the percent full from the df command for the entire file system and include that in the subject line of the email.

#!/bin/bash
 NAS_DB="/nas"
 export NAS_DB
 TODAY=$(date)
 HOST=$(hostname)

PERCENT=`sudo df -h /nas/quota/slot_2/File_System | grep server_2 | awk '{print $5}'`

echo "File_System Folder Size Report Date: $TODAY Host:$HOST" > /home/nasadmin/report/fs_report.txt
 echo " " >> /home/nasadmin/report/fs_report.txt
 echo "Production" >> /home/nasadmin/report/fs_report.txt
 echo " " >> /home/nasadmin/report/fs_report.txt

sudo du -h -S /nas/quota/slot_2/File_System/Prod >> /home/nasadmin/report/fs_report.txt

echo " " >> /home/nasadmin/report/fs_report.txt
 echo "Development" >> /home/nasadmin/report/fs_report.txt
 echo " " >> /home/nasadmin/report/fs_report.txt

sudo du -h -S /nas/quota/slot_2/File_System/Dev >> /home/nasadmin/report/fs_report.txt

echo " " >> /home/nasadmin/report/fs_report.txt
 echo "Test" >> /home/nasadmin/report/fs_report.txt
 echo " " >> /home/nasadmin/report/fs_report.txt

sudo du -h -S /nas/quota/slot_2/File_System/Test >> /home/nasadmin/report/fs_report.txt

echo " " >> /home/nasadmin/report/fs_report.txt
 echo "% Remaining on File_System Filesystem" >> /home/nasadmin/report/fs_report.txt
 echo " " >> /home/nasadmin/report/fs_report.txt

sudo df -h /nas/quota/slot_2/File_System/ >> /home/nasadmin/report/fs_report.txt

echo $PERCENT

mail -s "Folder Size Report ($PERCENT In Use)" youremail@domain.com < /home/nasadmin/report/fs_report.txt

Below is what the output of that script looks like. It is included in the body of the email as plain text.

File_System Folder Size Report
Date: Fri Jun 28 08:00:02 CDT 2013
Host:<celerra_name>

Production

 24K     /nas/quota/slot_2/File_System/Production/1
 0       /nas/quota/slot_2/File_System/Production/2
 16K     /nas/quota/slot_2/File_System/Production

Development

 8.0K    /nas/quota/slot_2/File_System/Development/1
 108G    /nas/quota/slot_2/File_System/Development/2
 0          /nas/quota/slot_2/File_System/Development/3
 16K     /nas/quota/slot_2/File_System/Development

Test

 0       /nas/quota/slot_2/File_System/Test/1
 422G    /nas/quota/slot_2/File_System/Test/2
 0       /nas/quota/slot_2/File_System/Test/3
 16K     /nas/quota/slot_2/File_System/Test

% Remaining on File_System Filesystem

Filesystem Size Used Avail Use% Mounted on
 server_2:/ 2.5T 529G 1.9T 22% /nasmcd/quota/slot_2
 .

The following script will only send an email if the folder size has crossed a defined threshold. Simply change the path to the folder you want to report on and change the value of the ‘threshold’ variable in the script to whatever you want it to be, I have mine set at 95%.

#Get the % Disk Utilization from the server_df command
 percent=`sudo df -h /nas/quota/slot_2/File_System | grep server_2 | awk '{print $5}'`

#Strip the % Sign from the output value
 percentvalue=`echo $percent | awk -F"%" '{print $1}'`

#Set the critical threshold that will trigger an email
 threshold=95

#compare the threshold value to the reported value, send an email if needed
 if [ $percentvalue -eq 0 ] && [ $threshold -eq 0 ]
 then
  echo "Both are zero"
 elif [ $percentvalue -eq $threshold ]
 then
  echo "Both Values are equal"
 elif [ $percentvalue -gt $threshold ]
 then
 mail -s "File_System Critical Disk Space Alert: $percentvalue% utilization is above the threshold of $threshold%" youremail@domain.com < /home/nasadmin/report/fs_report.txt

else
  echo "$percentvalue is less than $threshold"
 fi

First day at EMC World 2013

It’s been a great first day at EMC World 2013 so far.  I’ve been to three breakout sessions, two of which were very informative and useful. I focused my day today on learning more about best practices for the EMC technologies that I already use.  I’m not going to go into great detail now as I need to get to the next session. 🙂 The notes I made are specific to a brocade switch implemenation as that’s what I use. Here are some useful bullet points from a few of my sessions today.

Storage Area Networking Best Practices:

1.  Single Initiator Zoning.  Only put one initiator and the targets it will access in one zone.  I’ve already practiced this for many years, but it was interesting to hear the reasons for doing it that way.  It drastically reduces the number of server queries to the switch.

2.  Dynamic Interface Management.  Use Brocade port fencing.  It can prevent a SAN outage by giving you the ability to shut down a single host or port.

3.  Monitor for Congestion.  Congestion from one host can spread an cause problems in other operating environments.  Definitely enable Brocade bottleneck detection.

4.  Periodic SAN health checks.  Use the EMC SANity or Brocade SAN health tools regularly.  If you’re reading this, you’re verly likely already doing that. 🙂

5.  Monitor for bit errors.  These can lead to bb_credit loss.  The Brocade parameter portcvglongdistance should be set (bb_credit recovery option), which will prevent the problem.  Bit errors could still be a problem on F Ports, however.

6.  Cable Hygeine.  Always clean your cables!  They are a major contributor to physical connectivity problems.

7.  Target Releases.  Always run a target release of Firmware.

VNX NAS Configuration Best Practices:

1. For transactional NAS, always use the correct transfer size for your workload.  The latest VNX OE release supports up to 1MB.

2. Use Jumbo frames end to end.

3. 10G is available now. Don’t use 1G anymore!

4. Use link aggregation for redundancy and load balancing, failsafe for high availability.

5. Use AVM for typical NAS configurations, MVM for transactional NAS where you have high concurrency from a single NFS stream to a single NFS export. 

6. Use Direct Writes for apps that use concurrent IO or are asynchronous (Oracle, VMWare, etc.).

7. For NAS Pool LUNs, use thick LUNs that are all the same size, balance the SP designation for each one, use the same tiering policy for each one, use approximately 1 LUN for every 4 drives, and use thin enabled file systems to maximize the consumption of the highest tier.

8. Always enable FASTCache for NAS LUNs.  They benefit from it even more than block LUNs.

 

Relocating Celerra Checkpoints

On our NS-960 Celerra we have multiple storage pools defined and one of them that is specifically defined for checkpoint storage. Out Of the 100 or so file systems that we run a checkpoint schedule on, about 2/3 of them were incorrectly writing their checkpoints to the production file system pool rather than the defined checkpoint pool, and the production pool was starting to fill up. I started to research how you could change where checkpoints are stored. Unfortunately you can’t actually relocate existing checkpoints, you need to start over.

In order to change where checkpoints are stored, you need to stop and delete any running replications (which automatically store root replication checkpoints) and delete all current checkpoints for the specific file system you’re working on. Depending on your checkpoint schedule and how often it runs you may want to pause it temporarily. Having to stop and delete remote replications was painful for me as I compete for bandwidth with our data domain backups to our DR site. Because of that I’ve been working on these one at a time.

Once you’ve deleted the relevant checkpoints and replications, you can choose where to store the new checkpoints by creating a single checkpoint first before making a schedule. From the GUI, go to Data Protection | Snapshots | Checkpoints tab, and click create. If a checkpoint does not exist for the file system, it will give you a choice of which pool you’d like to store it in. Below is what you’ll see in the popup window.

——————————————————————————–
Choose Data Mover [Drop Down List ▼]
Production File System [Drop Down List ▼]
Writeable Checkpoint [Checkbox]:
Data Movers: server_2
Checkpoint Name: [Fill in the Blank]

Configure Checkpoint Storage:
There are no checkpoints currently on this file system. Please specify how to allocate
storage for this and future checkpoints of this file system.

Create from:
*Storage Pool [Option Box]
*Meta Volume [Option Box]

——————————————————————————–
Current Storage System: CLARiiON CX4-960 APM00900900999
Storage Pool: [Drop Down List ▼]    <—*Select the appropriate Storage Pool Here*
Storage Capacity (MB): [Fill in the Blank]

Auto Extend Configuration:
High Water Mark: [Drop Down List for Percentage ▼]

Maximum Storage Capacity (MB): [Fill in the Blank]

——————————————————————————–

Close requests fail on data mover when using Riverbed Steelhead appliance

We recently had a problem with one of our corporate applications having file close requests fail, resulting in 200,000+ open files on our production data mover.  This was causing numerous issues within the application.  We determined that the problem was a result of our Riverbed Steelhead appliance requiring a certain level of DART code in order to properly close the files.  The Steelhead applicance would fail when attempting to optimize SMBV2 connections.

Because a DART code upgrade is required to resolve the problem, the only temporary fix is to reboot the data mover.  I wrote a quick script on the Celerra to grab the number of open files, write it to a text file, and publish to our internal web server.  The command to check how many open files are on the data mover is below.

This command provides all the detailed information:

/nas/bin/server_stats server_2 -monitor cifs-std -interval 1 -count 1

server_2    CIFS     CIFS     CIFS    CIFS Avg   CIFS     CIFS    CIFS Avg     CIFS       CIFS
Timestamp   Total    Read     Read      Read     Write    Write    Write       Share      Open
            Ops/s    Ops/s    KiB/s   Size KiB   Ops/s    KiB/s   Size KiB  Connections   Files
11:15:36     3379      905     9584        11        9      272        30         1856     4915   

server_2    CIFS     CIFS     CIFS    CIFS Avg   CIFS     CIFS    CIFS Avg     CIFS       CIFS
Summary     Total    Read     Read      Read     Write    Write    Write       Share      Open
            Ops/s    Ops/s    KiB/s   Size KiB   Ops/s    KiB/s   Size KiB  Connections   Files
Minimum      3379      905     9584        11        9      272        30         1856     4915   
Average      3379      905     9584        11        9      272        30         1856     4915   
Maximum      3379      905     9584        11        9      272        30         1856     4915   

Adding a grep for Maximum and using awk to grab only the last column, this command will output only the number of open files, rather than the large output above:

/nas/bin/server_stats server_2 -monitor cifs-std -interval 1 -count 1 | grep Maximum | awk ‘{print $10}’

The output of that command would simply be ‘4915’ based on the sample full output I used above.

The solution number from Riverbed’s knowledgebase is S16257.  Your DART code needs to be at least 6.0.60.2 or 7.0.52.  You will also see in your steelhead logs a message similar to the one below indicating that the close request has failed for a particular file:

Sep 1 18:19:52 steelhead port[9444]: [smb2cfe.WARN] 58556726 {10.0.0.72:59207 10.0.0.72:445} Close failed for fid: 888819cd-z496-7fa2-2735-0000ffffffff with ntstatus: NT_STATUS_INVALID_PARAMETER

Collecting info on active shares, clients, protocols, and authentication on the VNX

I had a comment in one of my Celerra/VNX posts asking for more specific info on listing and counting shares, clients, protocol and authentication information, as well as virus scan information.  I knew the answers to most of those questions however  I’d need to pull out the Celerra documentation from EMC for virus scan info.  I thought it might be more useful to put this information into a new post rather than simply replying to a comment on an old post.

How to list and count shares/exports (by protocol):

You can use the server_export command to list and count how many shares you have.

server_export server_2 -Protocol cifs -list -all:  This command will give you a list of all CIFS Shares across all of your CIFS servers.  It will count a file system twice if it’s shared on more than one CIFS server.

server_export server_2 -Protocol cifs -list -all | grep :  This will give you a list of all CIFS shares on a specific CIFS server

server_export server_2 -Protocol cifs -list -all | grep | wc:  This will give you the number of CIFS shares on a specific CIFS server.  The “wc” command will output three numbers, the first number listed is the number of shares.

server_export server_2 -Protocol nfs -list -all:  This command will give you a list of all NFS exports.  It’s just like the previous commands, you can add “| wc” at the end to get a count.

How to list client connections by OS type:

To obtain information for the number of client connections you have by OS type, you’d have to use the server_cifs audit command.

To get a full list of every active connection by client type, use this command:

server_cifs server_2 -option audit,full | grep “Client(“

The output would look like this:

|||| AUDIT Ctx=0x022a6bdc08, ref=2, W2K Client(10.0.5.161) Port=49863/445
|||| AUDIT Ctx=0x01f18efc08, ref=2, XP Client(10.0.5.42) Port=3890/445
|||| AUDIT Ctx=0x02193a2408, ref=2, W2K Client(10.0.5.61) Port=59027/445
|||| AUDIT Ctx=0x01b89c2808, ref=2, Fedora6 Client(10.0.5.194) Port=17130/445
|||| AUDIT Ctx=0x0203ae3008, ref=2, Fedora6 Client(10.0.5.52) Port=55731/445
...

In this case, if I wanted to count only the number of Fedora6 clients, I’d use the command server_cifs server_2 -option audit,full | grep “Fedora6 Client”.  I could then add “| wc” at the end to get a count.

To do a full audit report:

The command server_cifs server_2 -option audit,full will do a full, detailed audit report and should capture just about anything else you’d need.  Every connection will have a detailed audit included in the report. Based on that output, it would be easy to run the command with a grep statement to pull only the information out that you need to create custom reports.

Below is a subset of what the output looks like from that command:

|||| AUDIT Ctx=0x0177206407, ref=2, W2K8 Client(10.0.0.1) Port=65340/445
||| CIFSSERVER1[DOMAIN] on if=<interface_name>
||| CurrentDC 0x0169fee808=<Domain_Controller>
||| Proto=SMB2.10, Arch=Win2K, RemBufsz=0xffff, LocBufsz=0xffff, popupMsg=1
||| Client GUID=c2de9f99-1945-11e2-a512-005056af0
||| SMB2 credits: Granted=31, Max=500
||| 0 FNN in FNNlist NbUsr=1 NbCnx=1
||| Uid=0x1 NTcred(0x0125fc9408 RC=2 KERBEROS Capa=0x2) 'DOMAIN\Username'
|| Cnxp(0x0230414c08), Name=FileSystem1, cUid=0x1 Tid=0x1, Ref=1, Aborted=0
| readOnly=0, umask=22, opened files/dirs=0
| Absolute path of the share=\Filesystem1
| NTFSExtInfo: shareFS:fsid=49, rc=830, listFS:nb=1 [fsid=49,rc=830]

|||| AUDIT Ctx=0x0210f89007, ref=2, W2K8 Client(10.0.0.1) Port=51607/445
||| CIFSSERVER1[DOMAIN] on <interface_name>
||| CurrentDC 0x01f79aa008=<Domain_Controller>
||| Proto=SMB2.10, Arch=Win2K8, RemBufsz=0xffff, LocBufsz=0xffff, popupMsg=1
||| Client GUID=5b410977-bace-11e1-953b-005056af0
||| SMB2 credits: Granted=31, Max=500
||| 0 FNN in FNNlist NbUsr=1 NbCnx=1
||| Uid=0x1 NTcred(0x01c2367408 RC=2 KERBEROS Capa=0x2) 'DOMAIN\Username'
|| Cnxp(0x0195d2a408), Name=Filesystem2, cUid=0x1 Tid=0x1, Ref=1, Aborted=0
| readOnly=0, umask=22, opened files/dirs=0
| Absolute path of the share=\Filesystem2
| NTFSExtInfo: shareFS:fsid=4214, rc=19, listFS:nb=0

|||| AUDIT Ctx=0x006aae8408, ref=2, XP Client(10.0.0.99) Port=1258/445
 ||| CIFSSERVER1[DOMAIN] on if=<interface_name>
 ||| CurrentDC 0x01f79aa008=<Domain_Controller>
 ||| Proto=NT1, Arch=Win2K, RemBufsz=0xffff, LocBufsz=0xffff, popupMsg=1
 ||| 0 FNN in FNNlist NbUsr=1 NbCnx=1
 ||| Uid=0x3f NTcred(0x01ccebc008 RC=2 KERBEROS Capa=0x2) 'DOMAIN\Username'
 || Cnxp(0x019edd7408), Name=Filesystem8, cUid=0x3f Tid=0x3f, Ref=1, Aborted=0
 | readOnly=0, umask=22, opened files/dirs=3
 | Absolute path of the share=\Filesystem8
 | NTFSExtInfo: shareFS:fsid=35, rc=43, listFS:nb=0
 | Fid=2901, FNN=0x0012d46b40(FREE,0x0000000000,0), FOF=0x0000000000  DIR=\Directory1
 |    Notify commands received:
 |    Event=0x17, wt=1, curSize=0x0, maxSize=0x20, buffer=0x0000000000
 |    Tid=0x3f, Pid=0x2310, Mid=0x3bca, Uid=0x3f, size=0x20
 | Fid=3335, FNN=0x00193baaf0(FREE,0x0000000000,0), FOF=0x0000000000  DIR=\Directory1\Subdirectory1
 |    Notify commands received:
 |    Event=0x17, wt=0, curSize=0x0, maxSize=0x20, buffer=0x0000000000
 |    Tid=0x3f, Pid=0x2310, Mid=0xe200, Uid=0x3f, size=0x20
 | Fid=3683, FNN=0x00290471c0(FREE,0x0000000000,0), FOF=0x0000000000  DIR=\Directory1\Subdirectory1\Subdirectory2
 |    Notify commands received:
 |    Event=0x17, wt=0, curSize=0x0, maxSize=0x20, buffer=0x0000000000
 |    Tid=0x3f, Pid=0x2310, Mid=0x3987, Uid=0x3f, size=0x20

Adding a Celerra to a Clariion storage domain from the CLI

If you’re having trouble joining your Celerra to the storage domain from Unisphere, there is an EMC service Workaround for Joining it from the Navisphere CLI. When attempting it from Unisphere, it would appear to work and allow me to join but would never actually show up on the domain list.  Below is a workaround for the problem that worked for me. Run these commands from the Control Station.

Run this first:

curl -kv “hps://<Celerra_CS_IP>/cgi-bin/set_incomingmaster?master=<Clariion_SPA_DomainMaster_IP>,”

Next, run the following navicli command to the domain master in order to add the Celerra Control Station to the storage domain:

/nas/sbin/naviseccli -h 10.3.215.73 -user <userid> -password <password> -scope 0 domain -add 10.32.12.10

After a successful Join the /nas/http/domain folder should be populated with the domain_list, domain_master, and domain_users files.

Run this command to verify:

ls -l /nas/http/domain

You should see this output:

-rw-r–r– 1 apache apache 552 Aug  8  2011 domain_list
-rw-r–r– 1 apache apache  78 Feb 15  2011 domain_master
-rw-r–r– 1 apache apache 249 Oct  5  2011 domain_users
 

You can also check the domain list to make sure that an entry has been made.

Run this command to verify:

/nas/sbin/naviseccli -h <Clariion_SPA_DomainMaster_IP> domain -list

You should then see a list of all domain members.  The output will look like this:

Node:                     <DNS Name of Celerra>
IP Address:           <Celerra_CS_IP>
Name:                    <DNS Name of Celerra>
Port:                        80
Secure Port:          443
IP Address:           <Celerra_CS_IP>
Name:                    <DNS Name of Celerra>
Port:                        80
Secure Port:          443

Making a case for file archiving

We’ve been investigating options for archiving unstructured (file based) data that resides on our Celerra for a while now. There are many options available, but before looking into a specific solution I was asked to generate a report that showed exactly how much of the data has been accessed by users for the last 60 days and for the last 12 months.  As I don’t have permissions to the shared folders from my workstation I started looking into ways to run the report directly from the Celerra control station.  The method I used will also work on VNX File.

After a little bit of digging I discovered that you can access all of the file systems from the control station by navigating to /nas/quota/slot_.  The slot_2 folder would be for the server_2 data mover, slot_3 would be for server_3, etc.  With full access to all the file systems, I simply needed to write a script that scanned each folder and counted the number of files that had been modified within a certain time window.

I always use excel for scripts I know are going to be long.  I copy the file system list from Unisphere then put the necessary commands in different columns, and end it with a concatenate formula that pulls it all together.  If you put echo -n in A1, “Users_A,” in B1, and >/home/nasadmin/scripts/Users_A.dat in C1, you’d just need to type the formula “=CONCATENATE(A1,B1,C1)” into cell D1.  D1 would then contain echo -n “Users_A,” > /home/nasadmin/scripts/Users_A.dat. It’s a simple and efficient way to make long scripts very quickly.

In this case, the script needed four different sections.  All four of these sections I’m about to go over were copied into a single shell script and saved in my /home/nasadmin/scripts directory.  After creating the .sh file, I always do a chmod +X and chmod 777 on the file.  Be prepared for this to take a very long time to run.  It of course depends on the number of file systems on your array, but for me this script took about 23 hours to complete.

First, I create a text file for each file system that contains the name of the filesystem (and a comma) which is used later to populate the first column of the final csv output.  It’s of course repeated for each file system.

echo -n "Users_A," > home/nasadmin/scripts/Users_A.dat
echo -n "Users_B," > home/nasadmin/scripts/Users_B.dat

... <continued for each filesystem>
 Second, I use the ‘find’ command to walk each directory tree and count the number of files that were accessed over 60 days ago.  The output is written to another text file that will be used in the csv output file later.
find /nas/quota/slot_2/ Users_A -mtime +365 | wc -l > /home/nasadmin/scripts/ Users_A_wc.dat

find /nas/quota/slot_2/ Users_B -mtime +365 | wc -l > /home/nasadmin/scripts/ Users_B_wc.dat

... <continued for each filesystem>
 Third, I want to count the total number of files in each file system.  A third text file is written with that number, again for the final combined report that’s generated at the end.
find /nas/quota/slot_2/Users_B | wc -l > /home/nasadmin/scripts/Users_B_total.dat

find /nas/quota/slot_2/Users_B | wc -l > /home/nasadmin/scripts/Users_B_total.dat

... <continued for each filesystem>
 Finally, each file is combined into the final report.  The output will show each filesystem with two columns, Total Files & Files Accessed 60 days ago.  You can then easily update the report in Excel and add columns that show files accessed in the last 60 days, the percentage of files accessed in the last 60 days, etc., with some simple math.
cat /home/nasadmin/scripts/Users_A.dat /home/nasadmin/scripts/Users_A_wc.dat /home/nasadmin/scripts/comma.dat /home/nasadmin/scripts/Users_A_total.dat | tr -d "\n" > /home/nasadmin/scripts/fsoutput.csv | echo " " > /home/nasadmin/scripts/fsoutput.csv

cat /home/nasadmin/scripts/Users_B.dat /home/nasadmin/scripts/Users_B_wc.dat /home/nasadmin/scripts/comma.dat /home/nasadmin/scripts/Users_B_total.dat | tr -d "\n" >> /home/nasadmin/scripts/fsoutput.csv | echo " " > /home/nasadmin/scripts/fsoutput.csv

... <continued for each filesystem>

My final output looks like this:

Total Files Accessed 60+ days ago Accessed in Last 60 days % Accessed in last 60 days
Users_A            827,057                734,848               92,209                                                   11.15
Users_B              61,975                  54,727                 7,248                                                   11.70
Users_C            150,166                132,457               17,709                                                   11.79

The three example filesystems above show that only about 11% of the files have been accessed in the last 60 days.   Most user data has a very short lifecycle, it’s ‘hot’ for a month or less then dramatically tapers off as the business value of the data drops.  These file systems would be prime candidates for archiving.

My final report definitely supported the need for archving, but we’ve yet to start a project to complete it.  I like the possibility of using EMC’s cloud tiering appliance which can archive data directly to the cloud service of your choice.  I’ll make another post in the future about archiving solutions once I’ve had more time to research it.

Can’t join CIFS Server to domain – sasl protocol violation

I was running a live disaster recovery test of our Celerra CIFS Server environment last week and I was not able to get the CIFS servers to join the replica of the domain controller on the DR network.  I would get the error ‘Sasl protocol violation’ on every attempt to join the domain.

We have two interfaces configured on the data mover, one connects to production and one connects to the DR private network.  The default route on the Celerra points to the DR network and we have static routes configured for each of our remote sites in production to allow replication traffic to pass through.  Everything on the network side checked out, I could ping DC’s and DNS servers, and NTP was configured to a DR network time server and was working.

I was able to ping the DNS Server and the domain controller:

[nasadmin@datamover1 ~]$ server_ping server_2 10.12.0.5
server_2 : 10.12.0.5 is alive, time= 0 ms
 
[nasadmin@datamover1 ~]$ server_ping server_2 10.12.18.5
server_2 : 10.12.18.5 is alive, time= 3 ms
 

When I tried to join the CIFS Server to the domain I would get this error:

[nasadmin@datamover1 ~]$ server_cifs prod_vdm_01 -Join compname=fileserver01,domain=company.net,admin=myadminaccount -option reuse prod_vdm_01 : Enter Password:********* Error 13157007706: prod_vdm_01 : DomainJoin::connect:: Unable to connect to the LDAP service on Domain Controller ‘domaincontroller.company.net’ (@10.12.0.5) for compname ‘fileserver01’. Result code is ‘Sasl protocol violation’. Error message is Sasl protocol violation.
 

I also saw this error messages during earlier tests:

Error 13157007708: prod_vdm_01 : DomainJoin::setAccountPassword:: Unable to set account password on Domain Controller ‘domaincontroller.company.net’ for compname ‘fileserver01’. Kerberos gssError is ‘Miscellaneous failure. Cannot contact any KDC for requested realm. ‘. Error message is d0000,-1765328228.
 

I noticed these error messages in the server log:

2012-06-21 07:03:00: KERBEROS: 3: acquire_accept_cred: Failed to get keytab entry for principal host/fileserver01.company.net@COMPANY.NET – error No principal in keytab matches desired name (39756033) 2012-06-21 07:03:00: SMB: 3: SSXAK=LOGON_FAILURE Client=x.x.x.x origin=510 stat=0x0,39756033 2012-06-21 07:03:42: KERBEROS: 5: Warning: send_as_request: Realm COMPANY.NET – KDC X.X.X.X returned error: Clients credentials have been revoked (18)
 

The final resolution to the problem was to reboot the data mover. EMC determined that the issue was because the kerberos keytab entry for the CIFS server was no longer valid. It could be caused by corruption or because the the machine account password expired. A reboot of the data mover causes the kerberos keytab and SPN credentials to be resubmitted, thus resolving the problem.

Errors when creating new replication jobs

I was attempting to create a new replication job on one of our VNX5500’s and was receiving several errors when selecting our DR NS-960 as the ‘destination celerra network server’.

It was displaying the following errors at the top of the window:

– “Query VDMs All.  Cannot access any Data Mover on the remote system, <celerra_name>”. The error details directed me to check that all the Data Moverss are accessible, that the time difference between the source and destination doesn’t exceed 10 min, and that the passphrase matches.  I confirmed that all of those were fine.

– “Query Storage Pools All.  Remote command failed:\nremote celerra – <celerra_name>\nremote exit status =0\nremote error = 0\nremote message = HTTP Error 500: Internal Server Error”.  The error details on this message say to search powerlink, not a very useful description.

– “There are no destination pools available”.  The details on this error say to check available space on the destination storage pool.  There is 3.5TB available in the pool I want to use on the destination side, so that wasn’t the issue either.

All existing replication jobs were still running fine so I knew there was not a network connectivity problem.  I reviewed the following items as well:

– I was able to validate all of the interconnects successfully, that wasn’t the issue.

– I ran nas_cel -update on the interconnects on both sides and received no errors, but it made no difference.

– I checked the server logs and didn’t see any errors relating to replication.

Not knowing where to look next, I opened an SR with EMC.  As it turns out, it was a security issue.

About a month ago an EMC CE accidently deleted our global security accounts during a service call.  I had recreated all of the deleted accounts and didn’t think there would be any further issues.  Logging in with the re-created nasadmin account after the accidental deletion was the root cause of the problem.  Here’s why:

The clariion global user account is tied to a local user account on the control station in /etc/passwd. When nasadmin was recreated on the domain, it attempted to create the nasadmin account on the control station as well.  Because the account already existed as a local account on the control station, it created a local account named ‘nasadmin1‘ instead, which is what caused the problem.  The two nasadmin accounts were no longer synchronized between the Celerra and the Clariion domain, so when logging in with the global nasadmin account you were no longer tied to the local nasadmin account on the control station.  Deleting all nasadmin accounts from the global domain and from the local /etc/passwd on the Celerra, and then recreating nasadmin in the domain solves the problem.  Because the issue was related only to the nasadmin account in this case, I could have also solved the problem by simply creating a new global account (with administrator priviliges) and using that to create the replication job.  I tested that as well and it worked fine.

Undocumented Celerra / VNX File commands

vnx1

The .server_config command is undocumented from EMC, I assume they don’t want customers messing with it. Use these commands at your own risk. 🙂

Below is a list of some of those undocumented commands, most are meant for viewing performance stats. I’ve had EMC support use the fcp command during a support call in the past.   When using the command for fcp stats,  I believe you need to run the ‘reset’ command first as it enables the collection of statistics.

There are likely other parameters that can be used with .server_config but I haven’t discovered them yet.

TCP Stats:

To view TCP info:
.server_config server_x -v “printstats tcpstat”
.server_config server_x -v “printstats tcpstat full”
.server_config server_x -v “printstats tcpstat reset”

Sample Output (truncated):
TCP stats :
connections initiated 8698
connections accepted 1039308
connections established 1047987
connections dropped 524
embryonic connections dropped 3629
conn. closed (includes drops) 1051582
segs where we tried to get rtt 8759756
times we succeeded 11650825
delayed acks sent 537525
conn. dropped in rxmt timeout 0
retransmit timeouts 823

SCSI Stats:

To view SCSI IO info:
.server_config server_x -v “printstats scsi”
.server_config server_x -v “printstats scsi reset”

Sample Output:
This output needs to be in a fixed width font to view properly.  I can’t seem to adjust the font, so I’ve attempted to add spaces to align it.
Ctlr: IO-pending Max-IO IO-total Idle(ms) Busy(ms) Busy(%)
0:      0         53    44925729       122348758     19159954   13%
1:      0                                           1 1 141508682       0          0%
2:      0                                           1 1 141508682       0          0%
3:      0                                           1 1 141508682       0          0%
4:      0                                           1 1 141508682       0          0%

File Stats:

.server_config server_x -v “printstats filewrite”
.server_config server_x -v “printstats filewrite full”
.server_config server_x -v “printstats filewrite reset”

Sample output (Full Output):
13108 writes of 1 blocks in 52105250 usec, ave 3975 usec
26 writes of 2 blocks in 256359 usec, ave 9859 usec
6 writes of 3 blocks in 18954 usec, ave 3159 usec
2 writes of 4 blocks in 2800 usec, ave 1400 usec
4 writes of 13 blocks in 6284 usec, ave 1571 usec
4 writes of 18 blocks in 7839 usec, ave 1959 usec
total 13310 blocks in 52397489 usec, ave 3936 usec

FCP Stats:

To view FCP stats, useful for checking SP balance:
.server_config server_x -v “printstats fcp”
.server_config server_x -v “printstats fcp full”
.server_config server_x -v “printstats fcp reset”

Sample Output (Truncated):
This output needs to be in a fixed width font to view properly.  I can’t seem to adjust the font, so I’ve attempted to add spaces to align it.
Total I/O Cmds: +0%——25%——-50%——-75%—–100%+ Total 0
FCP HBA 0 |                                                                                            | 0%  0
FCP HBA 1 |                                                                                            | 0%  0
FCP HBA 2 |                                                                                            | 0%  0
FCP HBA 3 |                                                                                            | 0%  0
# Read Cmds: +0%——25%——-50%——-75%—–100%+ Total 0
FCP HBA 0 |                                                                                            | 0% 0
FCP HBA 1 |                                                                                            | 0% 0
FCP HBA 2 |                                                                                            | 0% 0
FCP HBA 3 |  XXXXXXXXXXX                                                          | 25% 0

Usage:

‘fcp’ options are:       bind …, flags, locate, nsshow, portreset=n, rediscover=n
rescan, reset, show, status=n, topology, version

‘fcp bind’ options are:  clear=n, read, rebind, restore=n, show
showbackup=n, write

Description:

Commands for ‘fcp’ operations:
fcp bind <cmd> ……… Further fibre channel binding commands
fcp flags ………….. Show online flags info
fcp locate …………. Show ScsiBus and port info
fcp nsshow …………. Show nameserver info
fcp portreset=n …….. Reset fibre port n
fcp rediscover=n ……. Force fabric discovery process on port n
Bounces the link, but does not reset the port
fcp rescan …………. Force a rescan of all LUNS
fcp reset ………….. Reset all fibre ports
fcp show …………… Show fibre info
fcp status=n ……….. Show link status for port n
fcp status=n clear ….. Clear link status for port n and then Show
fcp topology ……….. Show fabric topology info
fcp version ………… Show firmware, driver and BIOS version

Commands for ‘fcp bind’ operations:
fcp bind clear=n ……. Clear the binding table in slot n
fcp bind read ………. Read the binding table
fcp bind rebind …….. Force the binding thread to run
fcp bind restore=n ….. Restore the binding table in slot n
fcp bind show ………. Show binding table info
fcp bind showbackup=n .. Show Backup binding table info in slot n
fcp bind write ……… Write the binding table

NDMP Stats:

To Check NDMP Status:
.server_config server_x -v “printstats vbb show”

CIFS Stats:

This will output a CIFS report, including all servers, DC’s, IP’s, interfaces, Mac addresses, and more.

.server_config server_x -v “cifs”

Sample Output:

1327007227: SMB: 6: 256 Cifs threads started
1327007227: SMB: 6: Security mode = NT
1327007227: SMB: 6: Max protocol = SMB2
1327007227: SMB: 6: I18N mode = UNICODE
1327007227: SMB: 6: Home Directory Shares DISABLED
1327007227: SMB: 6: Usermapper auto broadcast enabled
1327007227: SMB: 6:
1327007227: SMB: 6: Usermapper[0] = [127.0.0.1] state:active (auto discovered)
1327007227: SMB: 6:
1327007227: SMB: 6: Default WINS servers = 172.168.1.5
1327007227: SMB: 6: Enabled interfaces: (All interfaces are enabled)
1327007227: SMB: 6:
1327007227: SMB: 6: Disabled interfaces: (No interface disabled)
1327007227: SMB: 6:
1327007227: SMB: 6: Unused Interface(s):
1327007227: SMB: 6:  if=172-168-1-84 l=172.168.1.84 b=172.168.1.255 mac=0:60:48:1c:46:96
1327007227: SMB: 6:  if=172-168-1-82 l=172.168.1.82 b=172.168.1.255 mac=0:60:48:1c:10:5d
1327007227: SMB: 6:  if=172-168-1-81 l=172.168.1.81 b=172.168.1.255 mac=0:60:48:1c:46:97
1327007227: SMB: 6:
1327007227: SMB: 6:
1327007227: SMB: 6: DOMAIN DOMAIN_NAME FQDN=DOMAIN_NAME.net SITE=STL-Colo RC=24
1327007227: SMB: 6:  SID=S-1-5-15-7c531fd3-6b6745cb-ff77ddb-ffffffff
1327007227: SMB: 6:  DC=DCAD01(172.168.1.5) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD02(172.168.29.8) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD03(172.168.30.8) ref=2 time=0 ms
1327007227: SMB: 6:  DC=DCAD04(172.168.28.15) ref=2 time=0 ms
1327007227: SMB: 6: >DC=SERVERDCAD01(172.168.1.122) ref=334 time=1 ms (Closest Site)
1327007227: SMB: 6: >DC=SERVERDCAD02(172.168.1.121) ref=273 time=1 ms (Closest Site)
1327007227: SMB: 6:
1327007227: SMB: 6: CIFS Server SERVERFILESEMC[DOMAIN_NAME] RC=603
1327007227: UFS: 7: inc ino blk cache count: nInoAllocs 361: inoBlk 0x0219f2a308
1327007227: SMB: 6:  Full computer name=SERVERFILESEMC.DOMAIN_NAME.net realm=DOMAIN_NAME.NET
1327007227: SMB: 6:  Comment=’EMC-SNAS:T6.0.41.3′
1327007227: SMB: 6:  if=172-168-1-161 l=172.168.1.161 b=172.168.1.255 mac=0:60:48:1c:46:9c
1327007227: SMB: 6:   FQDN=SERVERFILESEMC.DOMAIN_NAME.net (Updated to DNS)
1327007227: SMB: 6:  Password change interval: 0 minutes
1327007227: SMB: 6:  Last password change: Fri Jan  7 19:25:30 2011 GMT
1327007227: SMB: 6:  Password versions: 2, 2
1327007227: SMB: 6:
1327007227: SMB: 6: CIFS Server SERVERBKUPEMC[DOMAIN_NAME] RC=2 (local users supported)
1327007227: SMB: 6:  Full computer name=SERVERbkupEMC.DOMAIN_NAME.net realm=DOMAIN_NAME.NET
1327007227: SMB: 6:  Comment=’EMC-SNAS:T6.0.41.3′
1327007227: SMB: 6:  if=172-168-1-90 l=172.168.1.90 b=172.168.1.255 mac=0:60:48:1c:10:54
1327007227: SMB: 6:   FQDN=SERVERbkupEMC.DOMAIN_NAME.net (Updated to DNS)
1327007227: SMB: 6:  Password change interval: 0 minutes
1327007227: SMB: 6:  Last password change: Thu Sep 30 16:23:50 2010 GMT
1327007227: SMB: 6:  Password versions: 2
1327007227: SMB: 6:
 

Domain Controller Commands:

These commands are useful for troubleshooting a windows domain controller connection issue on the control station.  Use these commands along with checking the normal server log (server_log server_2) to troubleshoot that type of problem.

To view the current domain controllers visible on the data mover:

.server_config server_2 -v “pdc dump”

Sample Output (Truncated):

1327006571: SMB: 6: Dump DC for dom='<domain_name>’ OrdNum=0
1327006571: SMB: 6: Domain=<domain_name> Next trusted domains update in 476 seconds1327006571: SMB: 6:  oldestDC:DomCnt=1,179531 Time=Sat Oct 15 15:32:14 2011
1327006571: SMB: 6:  Trusted domain info from DC='<Windows_DC_Servername>’ (423 seconds ago)
1327006571: SMB: 6:   Trusted domain:<domain_name>.net [<Domain_Name>]
   GUID:00000000-0000-0000-0000-000000000000
1327006571: SMB: 6:    Flags=0x20 Ix=0 Type=0x2 Attr=0x0
1327006571: SMB: 6:    SID=S-1-5-15-d1d612b1-87382668-9ba5ebc0
1327006571: SMB: 6:    DC=’-‘
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x547,1355
1327006571: SMB: 6:   Trusted domain: <Domain_Name>
1327006571: SMB: 6:    Flags=0x22 Ix=0 Type=0x1 Attr=0x1000000
1327006571: SMB: 6:    SID=S-1-5-15-76854ac0-4c527104-321d5138
1327006571: SMB: 6:    DC=’\\<Windows_DC_Servername>’
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x0,0
1327006571: SMB: 6:   Trusted domain:<domain_name>.net [<domain_name>]
1327006571: SMB: 6:    Flags=0x20 Ix=0 Type=0x2 Attr=0x0
1327006571: SMB: 6:    SID=S-1-5-15-88d60754-f3ed4f9d-b3f2cbc4
1327006571: SMB: 6:    DC=’-‘
1327006571: SMB: 6:    Status Flags=0x0 DCStatus=0x547,1355
DC=DC0x0067a82c18 <Windows_DC_Servername>[<domain_name>](10.3.0.5) ref=2 time(getdc187)=0 ms LastUpdt=Thu Jan 19 20:45:14 2012
    Pid=1000 Tid=0000 Uid=0000
    Cnx=UNSUCCESSFUL,DC state Unknown
    logon=Unknown 0 SecureChannel(s):
    Capa=0x0 Nego=0x0000000000,L=0 Chal=0x0000000000,L=0,W2kFlags=0x0
    refCount=2 newElectedDC=0x0000000000 forceInvalid=0
    Discovered from: WINS

To enable or disable a domain controller on the data mover:

.server_config server_2 -v “pdc enable=<ip_address>”  Enable a domain controller

.server_config server_2 -v “pdc disable=<ip_address>”  Disable a domain controller

MemInfo:

 .server_config server_2 -v “meminfo”

Sample Output (truncated):

CPU=0
3552907011 calls to malloc, 3540029263 to free, 61954 to realloc
Size     In Use       Free      Total nallocs nfrees
16       3738        870       4608   161720370   161716632
32      18039      17289      35328   1698256206   1698238167
64       6128       3088       9216   559872733   559866605
128       6438      42138      48576   255263288   255256850
256       8682      19510      28192   286944797   286936115
512       1507       2221       3728   357926514   357925007
1024       2947       9813      12760   101064888   101061941
2048       1086        198       1284    5063873    5062787
4096         26        138        164    4854969    4854943
8192        820         11        831   19562870   19562050
16384         23         10         33       5676       5653
32768          6          1          7        101         95
65536         12          0         12         12          0
524288          1          0          1          1          0
Total Used     Total Free    Total Used + Free
all sizes   18797440   23596160   42393600

MemOwners:

.server_config server_2 -v “help memowners”

Usage:
memowners [dump | showmap | set … ]

Description:
memowners [dump] – prints memory owner description table
memowners showmap – prints a memory usage map
memowners memfrag [chunksize=#] – counts free chunks of given size
memowners set priority=# tag=# – changes dump priority for a given tag
memowners set priority=# label=’string’ – changes dump priority for a given label
The priority value can be set to 0 (lowest) to 7 (highest).

Sample Output (truncated):

1408979513: KERNEL: 6: Memory_Owner dump.
nTotal Frames 1703936 Registered = 75,  maxOwners = 128
1408979513: KERNEL: 6:   0 (   0 frames) No owner, Dump priority 6
1408979513: KERNEL: 6:   1 (3386 frames) Free list, Dump priority 0
1408979513: KERNEL: 6:   2 (40244 frames) malloc heap, Dump priority 6
1408979513: KERNEL: 6:   3 (6656 frames) physMemOwner, Dump priority 7
1408979513: KERNEL: 6:   4 (36091 frames) Reserved Mem based on E820, Dump priority 0
1408979513: KERNEL: 6:   5 (96248 frames) Address gap based on E820, Dump priority 0
1408979513: KERNEL: 6:   6 (   0 frames) Rmode isr vectors, Dump priority 7

Auto transferring reports from VNX to an IIS web server via FTP

I previously posted on how to create a script that monitors your Celerra replication jobs.  I have an intranet web page that is updated daily with many other reports (most of which I’ve posted about here), so I thought I’d add this one to the web page as well rather than having to search through my inbox for it every day.

Developing an easy and automated method of getting files from the Celerra to a windows based web server was my challenge.  I figured out an easy way to do this with FTP.  As my internal windows web server is also my internal FTP server, I can place the file directly in the public folder for easy web publishing.  Now that I’ve got the report working and updating on the intranet page every day my next task will be to come up with a more secure method using SSH or SCP, but this works well for now.

The big challenge in creating a bash script using FTP is figuring out how to pass the user id and password.  I tried various methods unsuccessfully and finally settled on using the .netrc file.  Create an empty file named .netrc in your home directory (in my case I put it in /home/nasadmin) with the following syntax:

machine <ftp_server_name> login <ftp_login_id> password <ftp_password>

Once that is created, you need to do a chmod 600 on the .netrc file in order for it to work.  If the permissions are not set to 600 on that file the auto-login to the FTP server will fail.

My next step was to create the script that sends the replication status report to the IIS web server:

#!/bin/bash
cd /home/nasadmin/scripts
ftp <ftp_server_name> <<SCRIPT
put <filename>.csv
quit
SCRIPT
 I always chmod the script with 755 and +X after creating it in vi.  The script always ran fine manually, but I struggled for a while getting it to work properly when run from crontab.  I figured out that you must cd to the correct directory in the script before you call the ftp command, if not you will get “file not found” errors when you run it.  I was always running it manually from within that directory, so I didn’t immediately catch that problem. 🙂

I then added the above script to crontab on the Celerra.  I run it at 6AM every morning with the following entry:

0 6 * * * /scripts/repl_status.sh

For those not familiar with cron, you can add an entry using “crontab -e”, and list your current entries with “crontab -l”.  The first two entries in the line “0 6” represent minutes and the hour of each day, in this case it will run at 6:00AM every day.

I have a download link to the csv file on my web page, and I also have a script on my web server that converts the csv file to HTML output with a perl script called csv2html.pl so the data can be easily viewed without having to download the csv and open it in excel.  You can find csv2html.pl easily with a google search, I’ve blogged about it in previous posts as well.

That’s it!  An easy way to automatically push your reports to another server from the Celerra.  Now that I have the transfer method down, I’ll be adding more daily reports in the near future.  If anyone has experience doing this type of transfer from a Celerra (or Linux server) to a windows server via SSH or SCP, please comment! 🙂