Using Cron with EMC VNX and Celerra

I’ve shared numerous shell scripts over the years on this blog, many of which benefit from being scheduled to run automatically on the Control Station.  I’ve received emails and comments asking “How do I schedule Unix or Linux crontab jobs to run at intervals like every five minutes, every ten minutes, Every hour, etc.”?  I’ll run through some specific examples next. While it’s easy enough to simply type “man crontab” from the CLI to review the syntax, it can be helpful to see specific examples.

What is cron?

Cron is a time-based job scheduler used in most Unix operating systems, including the VNX File OS (or DART). It’s used schedule jobs (either commands or shell scripts) to run periodically at fixed times, dates, or intervals. It’s primarily used to automate system maintenance, administration, and can also be used for troubleshooting.

What is crontab?

Cron is driven by a crontab (cron table) file, a configuration file that specifies shell commands to run periodically on a given schedule. The crontab files are stored where the lists of jobs and other instructions to the cron daemon are kept. Users can have their own individual crontab files and often there is a system-wide crontab file (usually in /etc or a subdirectory of /etc) that only system administrators can edit.

On the VNX, the crontab files are located at /var/spool/cron but are not intended to be edited directly, you should use the “crontab –e” command.  Each NAS user has their own crontab, and commands in any given crontab will be executed as the user who owns the crontab.  For example, the crontab file for nasadmin is stored as /var/spool/cron/nasadmin.

Best Practices and Requirements

First, let’s review how to edit and list crontab and some of the requirements and best practices for using it.

1. Make sure you use “crontab -e” to edit your crontab file. As a best practice you shouldn’t edit the crontab files directly.  Use “crontab -l” to list your entries.

2. Blank lines and leading spaces and tabs are ignored. Lines whose first non-space character is a pound sign (#) are comments, and are ignored. Comments are not allowed on the same line as cron commands as they will be taken to be part of the command.

3. If the /etc/cron.allow file exists, then you must be listed therein in order to be allowed to use this command. If the /etc/cron.allow file does not exist but the /etc/cron.deny file does exist, then you must not be listed in the /etc/cron.deny file in order to use this command.

4. Don’t execute commands directly from within crontab, place your commands within a script file that’s called from cron. Crontab cannot accept anything on stdout, which is one of several reasons you shouldn’t put commands directly in your crontab schedule.  Make sure to redirect stdout somewhere, either a log file or /dev/null.  This is accomplished by adding “> /folder/log.file” or “> /dev/null” after the script path.

5. For scripts that will run under cron, make sure you either define actual paths or use fully qualified paths to all commands that you use in the script.

6. I generally add these two lines to the beginning of my scripts as a best practice for using cron on the VNX.

export NAS_DB=/nas
export PATH=$PATH:/nas/bin

Descriptions of the crontab date/time fields

Commands are executed by cron when the minute, hour, and month of year fields match the current time, and when at least one of the two day fields (day of month, or day of week) match the current time.

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of week (0 - 7) 
# │ │ │ │ │                          
# │ │ │ │ │
# │ │ │ │ │
# * * * * *  command to execute
# 1 2 3 4 5  6

field #   meaning          allowed values
-------   ------------     --------------
   1      minute           0-59
   2      hour             0-23
   3      day of month     1-31
   4      month            1-12
   5      day of week      0-7 (0 or 7 is Sun)

Run a command every minute

While it’s not as common to want to run a command every minute, there can be specific use cases for it.  It would most likely be used when you’re in the middle of troubleshooting an issue and need data to be recorded more frequently.  For example, you may want to run a command every minute to check and see if a specific process is running.  To run a Unix/Linux crontab command every minute, use this syntax:

# Run “check.sh” every minute of every day
* * * * * /home/nasadmin/scripts/check.sh

Run a command every hour

The syntax is similar when running a cron job every hour of every day.  In my case I’ve used hourly scripts for performance monitoring, for example with the server_stats VNX script. Here’s a sample crontab entry that runs at 15 minutes past the hour, 24 hours a day.

# Brocade Backup
# This command will run at 12:15, 1:15, 2:15, etc., 24 hours a day.
15 * * * * /home/nasadmin/scripts/stat_collect.sh

Run a command once a day

Here’s an example that shows how to run a command from the cron daemon once a day. In my case, I’ll usually run daily commands for report updates on our web page and for backups.  As an example, I run my Brocade Zone Backup script once daily.

# Run the Brocade backup script at 7:30am
30 7 * * * /home/nasadmin/scripts/brocade.sh

Run a command every 5 minutes

There are multiple methods to run a crontab entry every five minutes.  It is possible to enter a single, specific minute value multiple times, separated by commas.  While this method does work, it makes the crontab list a bit harder to read and there is a shortcut that you can use.

0,5,10,15,20,25,30,35,40,45,50,55  * * * * /home/nasadmin/scripts/script.sh

The crontab “step value” syntax (using a forward slash) allows you use a crontab entry in the format sample below.  It will run a command every five minutes and accomplish the same thing as the command above.

# Run this script every 5 minutes
*/5 * * * * /home/nasadmin/scripts/test.sh

Ranges, Lists, and Step Values

I just demonstrated the use of a step value to specify a schedule of every five minutes, but you can actually get even more granular that that using ranges and lists.

Ranges.  Ranges of numbers are allowed (two numbers separated with a hyphen). The specified range is inclusive. For example, using 7-10 for an “hours” entry specifies execution at hours 7, 8, 9, & 10.

Lists. A list is a set of numbers (or ranges) separated by commas. Examples: “1,2,5,9”, “0-4,8-12”.

Step Values. Step values can be used in conjunction with ranges. Following a range with “/” specifies skips of the number’s value through the range. For example, “0-23/2” can be used in the hours field to specify command execution every other hour (the alternative being “0,2,4,6,8,10,12,14,16,18,20,22”). Steps are also permitted after an asterisk, so if you want to say “every two hours” you can use “*/2”.

Special Strings

While I haven’t personally used these, there is a set of built in special strings you can use, outlined below.

string         meaning
------         -------
@reboot        Run once, at startup.
@yearly        Run once a year, "0 0 1 1 *".
@annually      (same as @yearly)
@monthly       Run once a month, "0 0 1 * *".
@weekly        Run once a week, "0 0 * * 0".
@daily         Run once a day, "0 0 * * *".
@midnight      (same as @daily)
@hourly        Run once an hour, "0 * * * *".

Using a Template

Below is a template you can use in your crontab file to assist with the valid values that can be used in each column.

# Minute|Hour  |Day of Month|Month |WeekDay |Command
# (0-59)|(0-23)|(1-31)      |(1-12)|(0-7)             
  0      2      12           *      *        test.sh

Gotchas

Here’s a list of the known limitations of cron and some of the issues you may encounter.

1. When cron job is run it is executed as the user that created it. Verify security requirements for the job.

2. Cron jobs do not use any files in the user’s home directory (like .cshrc or .bashrc). If you need cron to read any file that your script will need, you will need to call it from the script cron is using. This includes setting paths, sourcing files, setting environment variables, etc.

3. If your cron jobs are not running, make sure the cron daemon is running. The cron daemon can be started or stopped with the following VNX Commands (run as root):

# /sbin/service crond stop
 # /sbin/service crond start

4.  If your job isn’t running properly you should also check the /etc/cron.allow and /etc/cron.deny files.

5. Crontab is not parsed for environmental substitutions. You can not use things like $PATH, $HOME, or ~/sbin.

6. Cron does not deal with seconds, minutes is the most granular it allows.

7. You can not use % in the command area. They will need to be escaped and if used with command substitution like the date command you can put it in backticks. Ex. `date +\%Y-\%m-\%d`. Or use bash’s command substitution $().

8. Be cautious using the day of the month and the day of week together.  The day of month and day of week fields with restrictions (no *) makes this an “or” condition not an “and” condition.  When either field is true it will be executed.

 

Advertisements

Storage Class Memory and Emerging Technologies

I mentioned in my earlier post, The Future of Storage Administration, that Flash will continue to dominate the industry and will be embraced by the enterprise, which I believe will drive newer technologies like NVMe and diminish older technologies like fiber channel.  While there is a lot of agreement over the latest storage technologies that are driving the adoption of flash in the enterprise, including the aforementioned NVMe technology, there doesn’t seem to be nearly as much agreement on what the “next big thing” will be in the enterprise storage space.  NVMe and NVMe-oF are definitely being driven by the trend towards the all-flash data center, and Storage Class Memory (SCM) is certainly a relevant trend that could be that “next big thing”.  Before I continue, what are NVMe, NVMe-oF and SCM?

  • NVMe is a protocol that allows for fast access for direct attached flash storage. NVMe is considered an evolutionary step toward exploiting the inherent parallelism built into SSDs.
  • NVMe-oF allows the advantages of NVMe to be used on a fabric connecting hosts with networked storage. With the increased adoption of low latency, high bandwidth network fabrics like 10GB+ Ethernet and InfiniBand, it becomes possible to build an infrastructure that extends the performance advantages of NVMe over standard fabrics to access low latency nonvolatile persistent storage.
  • SCM (Storage Class Memory) is a technology that places memory and storage on what looks like a standard DIMM board, which can be connected over NVMe or the memory bus.  I’ll dive in a bit more later on.

In the coming years, you’ll likely see every major storage vendor rolling out their own solutions for NVMe, NVMe-oF, and SCM.  The technologies alone won’t mean anything without optimization of the OS/hypervisor, drivers, and protocols, however. The NVMe software will need to be designed to take advantage of the low latency transport and media.

Enter Storage Class Memory

SCM is a hybrid memory and storage paradigm, placing memory and storage on what looks like a standard DIMM board.  It’s been gaining a lot of attention at storage industry conferences for the past year or two.  Modern solid-state drives are a compromise because they’re inherently all-flash and are still configured with all the bottlenecks of legacy standard drives even when bundled in to modern enterprise arrays.  SCM is not exactly memory and it’s not exactly storage.  It physically connects to memory slots in a mainboard just like traditional DRAM.  It is also a little bit slower than DRAM, but it is persistent, so just like traditional storage all content is saved after a power cycle.  Compared to flash SCM is orders of magnitude faster and offers equal performance gains on read and write operations.  In addition, SCM tiers are much more resilient and do not have the same wear pattern problems as flash.

A large gap exists between DRAM as a main memory and traditional SSD and HDD storage in terms of performance vs. cost, and SCM looks to address that gap.

The next-generation technologies that will drive SCM aim to be denser than current DRAM along with being faster, more durable, and hopefully cheaper than NAND solutions.  SCM, when connected over NVMe technology or directly on the memory bus, will enable device latencies to be about 10x lower than those provided by NAND-based SSDs.  SCM can also be up to 10x faster than NAND flash although at a higher cost than NAND-based SSDs. Similarly, NAND flash started out at least 10x more expensive than the dominant 15K RPM HDD media when it was introduced. Prices will come down.

Because the expected media latencies for SCM (<2us) are lower than the network latencies (<5us), SCM will probably end up being more common on servers rather than on the network.  Either way, SCM on a storage system will help accelerate metadata access and result in improvement of overall system performance.  Using NVMe-oF to provide low-latency access to networked storage SCM could potentially be used to create a different tier of network storage.

The SCM Vision

It sounds great, right?  The concept of Storage Class Memory has been around for a while, but it’s become a hard to reach albeit very desirable goal for storage professionals. The common vision seems to be a new paradigm where data can live in fast, DRAM-like storage areas in which data in memory is the center of the computer instead of the compute functions. The main problem with this vision is how we get the system and applications to recognize that something beyond just DRAM is available for use and that it can be used as either data storage or as persistent memory.

We know that SCM will allow for huge volumes of I/O to be served from memory and potentially stored in memory.  There will be fewer requirements needed to create multiple copies to protect against controller or server failure.  Exactly how this will be done remains to be seen, but there are obvious benefits from not having to continuously commit to slow external disks.  Once all the hurdles are overcome, SCM should have broad applicability in SSDs, storage controllers, PCI or NVMe boards and DIMMs.

Sofware Support

With SCM, applications won’t need to execute write IOs to get data into persistent storage. A memory level, zero copy operation moving data into XPoint will take care of that. That is just one example of the changes that systems and software will have to take on board when a hardware option like XPoint is treated as persistent storage-class memory, however.  Most importantly, the following must also be developed:

  • File systems that are aware of persistent memory must be developed
  • Operating system support for storage-class memory must be developed
  • Processors designed to use hybrid DRAM and XPoint memory must be developed

With that said, the industry is well on its way. Microsoft has added XPoint storage-class memory support into Windows Server 2016.  It provides zero-copy access and Direct Access Storage volumes, known as DAX volumes.  Red Hat Linux Operating system support is in place to use these devices as fast disks in sector mode with btt, and this usecase is fully supported in RHEL 7.3.

Hardware

SCM can be implemented with a variety of current technologies, notably Intel Optane, ReRAM, and NVDIMM-P.

Intel has introduced Optane brand XPoint SSDs and XPoint DIMMs, instead of the relatively slower PCIe bus used by the NVMe XPoint drives.

Resistive Random-Access Memory (ReRAM) is still an up-and-coming technology and comparable to Intel’s XPoint. It is currently under development by a number of companies and is a viable replacement for flash memory. The costs and performance of ReRAM are not currently at a level that makes the technology ready for the mass market. Developers of ReRAM technology all face similar challenges: overcoming temperature sensitivity, integrating with standard CMOS technology and manufacturing processes, and limiting the effects of sneak path currents, which would otherwise disrupt the stability of the data contained in each memory cell.

NVDIMM stands for “Nonvolatile Dual-Inline Memory Module.” The NVDIMM-P specification is being developed to support NAND flash directly on the host memory interface.  NVDIMMs use predictive software that allocates data in advance between DRAM and NAND.  NVDIMM-P is limited in that even though NAND flash is physically located at DIMM along with DRAM, the traditional memory hierarchy is still the same. The NAND implementation still works as a storage device and the DRAM implementation still works as main memory.

HP worked for years developing its Machine project.  Their effort revolved around memory-driven computing and an architecture aimed at big data workloads, and their goal was eliminating inefficiencies in how memory, storage, and processors interact.  While the project appears to now be dead, the technologies they developed will live on in current and future HP products. Here’s what we’ll likely see out of their research:

  • Now: ProLiant boxes with persistent memory for applications to use, using a mix of DRAM and flash.
  • Next year: Improved DRAM-based persistent memory.
  • Two-three years: True non-volatile memory (NVM) for software to use as slow but high volume RAM.
  • Three-Four years: NVM technology across many product categories.

SCM Use Cases

I think SCM’s possibly most exciting use case for high performance computing will be its use as nonvolatile memory that is tightly coupled to an application. SCM has the potential to dramatically affect the storage landscape in high performance computing, and application and storage developers will have fantastic opportunities to take advantage of this unique new technology.

Intel touts fast storage, cache, and extended memory as the primary use cases for their Optane product line.  Fast storage or cache refers to the tiering and layering which enable a better memory-to-storage hierarchy. The Optane product provides a new storage tier that breaks through the bottlenecks of traditional NAND storage to accelerate applications, and enable more work to get done per server. Intel’s extended memory use case describes the use of an Optane SSD to participate in a shared memory pool with DRAM at either the OS or application level enabling bigger memory or more affordable memory.

What the next generation of SCM will require is the industry coming together to agree on what we are all talking about and generate some standards.  Those standards will be critical to support innovation. Industry experts seem to be saying that the adoption of SCM will evolve around use cases and workloads, and task-specific, engineered machines that are built with real-time analytics in mind.  We’ll see what happens.

No matter what, new NVMe-based products coming out will definitely do a lot toward enabling fast data processing at a large scale, especially solutions that support the new NVMe-oF specification. SCM combined with software-defined storage controllers and NVMe-oF will enable users to pool flash storage drives and treat them as if they are one big local flash drive. Exciting indeed.

SCM may not turn out to be a panacea, and current NVMe flash storage systems will provide enough speed and bandwidth to handle the even the most demanding compute requirements for the foreseeable future.  I’m looking forward to seeing where this technology takes us.