Close requests fail on data mover when using Riverbed Steelhead appliance

We recently had a problem with one of our corporate applications having file close requests fail, resulting in 200,000+ open files on our production data mover.  This was causing numerous issues within the application.  We determined that the problem was a result of our Riverbed Steelhead appliance requiring a certain level of DART code in order to properly close the files.  The Steelhead applicance would fail when attempting to optimize SMBV2 connections.

Because a DART code upgrade is required to resolve the problem, the only temporary fix is to reboot the data mover.  I wrote a quick script on the Celerra to grab the number of open files, write it to a text file, and publish to our internal web server.  The command to check how many open files are on the data mover is below.

This command provides all the detailed information:

/nas/bin/server_stats server_2 -monitor cifs-std -interval 1 -count 1

server_2    CIFS     CIFS     CIFS    CIFS Avg   CIFS     CIFS    CIFS Avg     CIFS       CIFS
Timestamp   Total    Read     Read      Read     Write    Write    Write       Share      Open
            Ops/s    Ops/s    KiB/s   Size KiB   Ops/s    KiB/s   Size KiB  Connections   Files
11:15:36     3379      905     9584        11        9      272        30         1856     4915   

server_2    CIFS     CIFS     CIFS    CIFS Avg   CIFS     CIFS    CIFS Avg     CIFS       CIFS
Summary     Total    Read     Read      Read     Write    Write    Write       Share      Open
            Ops/s    Ops/s    KiB/s   Size KiB   Ops/s    KiB/s   Size KiB  Connections   Files
Minimum      3379      905     9584        11        9      272        30         1856     4915   
Average      3379      905     9584        11        9      272        30         1856     4915   
Maximum      3379      905     9584        11        9      272        30         1856     4915   

Adding a grep for Maximum and using awk to grab only the last column, this command will output only the number of open files, rather than the large output above:

/nas/bin/server_stats server_2 -monitor cifs-std -interval 1 -count 1 | grep Maximum | awk ‘{print $10}’

The output of that command would simply be ‘4915’ based on the sample full output I used above.

The solution number from Riverbed’s knowledgebase is S16257.  Your DART code needs to be at least 6.0.60.2 or 7.0.52.  You will also see in your steelhead logs a message similar to the one below indicating that the close request has failed for a particular file:

Sep 1 18:19:52 steelhead port[9444]: [smb2cfe.WARN] 58556726 {10.0.0.72:59207 10.0.0.72:445} Close failed for fid: 888819cd-z496-7fa2-2735-0000ffffffff with ntstatus: NT_STATUS_INVALID_PARAMETER
Advertisements

Collecting info on active shares, clients, protocols, and authentication on the VNX

I had a comment in one of my Celerra/VNX posts asking for more specific info on listing and counting shares, clients, protocol and authentication information, as well as virus scan information.  I knew the answers to most of those questions however  I’d need to pull out the Celerra documentation from EMC for virus scan info.  I thought it might be more useful to put this information into a new post rather than simply replying to a comment on an old post.

How to list and count shares/exports (by protocol):

You can use the server_export command to list and count how many shares you have.

server_export server_2 -Protocol cifs -list -all:  This command will give you a list of all CIFS Shares across all of your CIFS servers.  It will count a file system twice if it’s shared on more than one CIFS server.

server_export server_2 -Protocol cifs -list -all | grep :  This will give you a list of all CIFS shares on a specific CIFS server

server_export server_2 -Protocol cifs -list -all | grep | wc:  This will give you the number of CIFS shares on a specific CIFS server.  The “wc” command will output three numbers, the first number listed is the number of shares.

server_export server_2 -Protocol nfs -list -all:  This command will give you a list of all NFS exports.  It’s just like the previous commands, you can add “| wc” at the end to get a count.

How to list client connections by OS type:

To obtain information for the number of client connections you have by OS type, you’d have to use the server_cifs audit command.

To get a full list of every active connection by client type, use this command:

server_cifs server_2 -option audit,full | grep “Client(“

The output would look like this:

|||| AUDIT Ctx=0x022a6bdc08, ref=2, W2K Client(10.0.5.161) Port=49863/445
|||| AUDIT Ctx=0x01f18efc08, ref=2, XP Client(10.0.5.42) Port=3890/445
|||| AUDIT Ctx=0x02193a2408, ref=2, W2K Client(10.0.5.61) Port=59027/445
|||| AUDIT Ctx=0x01b89c2808, ref=2, Fedora6 Client(10.0.5.194) Port=17130/445
|||| AUDIT Ctx=0x0203ae3008, ref=2, Fedora6 Client(10.0.5.52) Port=55731/445
...

In this case, if I wanted to count only the number of Fedora6 clients, I’d use the command server_cifs server_2 -option audit,full | grep “Fedora6 Client”.  I could then add “| wc” at the end to get a count.

To do a full audit report:

The command server_cifs server_2 -option audit,full will do a full, detailed audit report and should capture just about anything else you’d need.  Every connection will have a detailed audit included in the report. Based on that output, it would be easy to run the command with a grep statement to pull only the information out that you need to create custom reports.

Below is a subset of what the output looks like from that command:

|||| AUDIT Ctx=0x0177206407, ref=2, W2K8 Client(10.0.0.1) Port=65340/445
||| CIFSSERVER1[DOMAIN] on if=<interface_name>
||| CurrentDC 0x0169fee808=<Domain_Controller>
||| Proto=SMB2.10, Arch=Win2K, RemBufsz=0xffff, LocBufsz=0xffff, popupMsg=1
||| Client GUID=c2de9f99-1945-11e2-a512-005056af0
||| SMB2 credits: Granted=31, Max=500
||| 0 FNN in FNNlist NbUsr=1 NbCnx=1
||| Uid=0x1 NTcred(0x0125fc9408 RC=2 KERBEROS Capa=0x2) 'DOMAIN\Username'
|| Cnxp(0x0230414c08), Name=FileSystem1, cUid=0x1 Tid=0x1, Ref=1, Aborted=0
| readOnly=0, umask=22, opened files/dirs=0
| Absolute path of the share=\Filesystem1
| NTFSExtInfo: shareFS:fsid=49, rc=830, listFS:nb=1 [fsid=49,rc=830]

|||| AUDIT Ctx=0x0210f89007, ref=2, W2K8 Client(10.0.0.1) Port=51607/445
||| CIFSSERVER1[DOMAIN] on <interface_name>
||| CurrentDC 0x01f79aa008=<Domain_Controller>
||| Proto=SMB2.10, Arch=Win2K8, RemBufsz=0xffff, LocBufsz=0xffff, popupMsg=1
||| Client GUID=5b410977-bace-11e1-953b-005056af0
||| SMB2 credits: Granted=31, Max=500
||| 0 FNN in FNNlist NbUsr=1 NbCnx=1
||| Uid=0x1 NTcred(0x01c2367408 RC=2 KERBEROS Capa=0x2) 'DOMAIN\Username'
|| Cnxp(0x0195d2a408), Name=Filesystem2, cUid=0x1 Tid=0x1, Ref=1, Aborted=0
| readOnly=0, umask=22, opened files/dirs=0
| Absolute path of the share=\Filesystem2
| NTFSExtInfo: shareFS:fsid=4214, rc=19, listFS:nb=0

|||| AUDIT Ctx=0x006aae8408, ref=2, XP Client(10.0.0.99) Port=1258/445
 ||| CIFSSERVER1[DOMAIN] on if=<interface_name>
 ||| CurrentDC 0x01f79aa008=<Domain_Controller>
 ||| Proto=NT1, Arch=Win2K, RemBufsz=0xffff, LocBufsz=0xffff, popupMsg=1
 ||| 0 FNN in FNNlist NbUsr=1 NbCnx=1
 ||| Uid=0x3f NTcred(0x01ccebc008 RC=2 KERBEROS Capa=0x2) 'DOMAIN\Username'
 || Cnxp(0x019edd7408), Name=Filesystem8, cUid=0x3f Tid=0x3f, Ref=1, Aborted=0
 | readOnly=0, umask=22, opened files/dirs=3
 | Absolute path of the share=\Filesystem8
 | NTFSExtInfo: shareFS:fsid=35, rc=43, listFS:nb=0
 | Fid=2901, FNN=0x0012d46b40(FREE,0x0000000000,0), FOF=0x0000000000  DIR=\Directory1
 |    Notify commands received:
 |    Event=0x17, wt=1, curSize=0x0, maxSize=0x20, buffer=0x0000000000
 |    Tid=0x3f, Pid=0x2310, Mid=0x3bca, Uid=0x3f, size=0x20
 | Fid=3335, FNN=0x00193baaf0(FREE,0x0000000000,0), FOF=0x0000000000  DIR=\Directory1\Subdirectory1
 |    Notify commands received:
 |    Event=0x17, wt=0, curSize=0x0, maxSize=0x20, buffer=0x0000000000
 |    Tid=0x3f, Pid=0x2310, Mid=0xe200, Uid=0x3f, size=0x20
 | Fid=3683, FNN=0x00290471c0(FREE,0x0000000000,0), FOF=0x0000000000  DIR=\Directory1\Subdirectory1\Subdirectory2
 |    Notify commands received:
 |    Event=0x17, wt=0, curSize=0x0, maxSize=0x20, buffer=0x0000000000
 |    Tid=0x3f, Pid=0x2310, Mid=0x3987, Uid=0x3f, size=0x20