Best Practices for FAST Cache

I recently received a comment asking for more information on EMC’s FAST Cache, specifically about why increased CPU Utilization was observed after a FAST Cache expansion. It’s likely due to the rebuilding of the cache after the expansion and possibly having it enabled on LUNs that shouldn’t, like those with high sequential I/O. It’s hard to pinpoint the exact cause of an issue like that without a thorough analysis of the array itself, however.   I thought I’d do a quick write-up of EMC’s best practices for implementing FAST Cache and the caveats to consider when implementing it.

What is FAST Cache?

First, a quick overview of what it is.  EMC’s FAST Cache uses a RAID set of EFD drives that sits between DRAM Cache and the disks themselves. It holds a large percentage of the most often used data in high performance EFD drives.  It hits a price/performance sweet spot between DRAM and traditional spinning disks for cache, and can greatly increase array performance.

The theory behind FAST Cache is simple:  we divide the array’s storage up in 64KB blocks, we count the number of hits on those blocks, and then we create a cache page on the FAST Cache EFDs if there have been three read (or write) hits on that block.  If FAST Cache fills up, the array will start to seek pages in the EFDs that will make a full stripe write to the spinning disks in the array, and then force flush out to traditional spinning disks.

FAST Cache uses a “three strikes” algorithm.  If you are moving large amounts of data, the FAST Cache algorithm does not activate, which is by design, as cache does not help at all in large copy transactions.  Random hits on active blocks, however,  will ultimately cache those blocks into FAST Cache.  This is where the 64KB granularity makes a difference.  Typical workloads I/O are 64KB or less, and there is a significant chance that even if a workload is performing 4KB reads and writes to different blocks, they will still hit the same 64KB FAST Cache block, resulting in the promotion of that data into FAST Cache.  Cool, right?  It works very well in practice.  With all that said, there are still plenty of implementation considerations for an ideal FAST Cache configuration.  Below is an overview of EMC’s best practices.

Best Practices for LUNs and Pools

  • Only use it where you need it. The FAST Cache driver has to track every I/O to calculate whether a block needs promotion to FAST Cache, which then adds to the SP CPU utilization.  As a best practice, you should disabling FAST Cache for LUNs that won’t need it.  It will cut this overhead and thus can improve overall performance levels.  Having a separate storage pool for LUNs that don’t need FASTCache would be ideal.

Disable FASTCache for the following LUN types:

– Secondary Mirror and Clone destination LUNs
– LUNs with small, high sequential I/O, such as Oracle Database Logs & snapsure dvols
– LUNs in the reserved LUN pool.
– Recoverpoint Journal LUNs
– SnapView Clones and MirrorView Secondary Mirrors

  • Analyze where you need it most.  Based on a workload analysis, I’d consider restricting the use of FAST Cache to the LUNs or Pools that need it the most.  For every new block that is added into FAST Cache, old blocks that are the oldest in terms of the most recent access are removed.  If your FAST Cache capacity is limited, even frequently accessed blocks may be removed before they’re accessed again.
  • Upgrade to the latest OS Release. On the VNX platform, upgrading to the latest FLARE or MCx release can greatly improve the performance of FAST Cache.  It’s been a few years now, but as an example r32 recovers performance much faster after a FAST Cache drive failure compared to r31, as well as automatically avoiding the promotion of small sequential block I/O to FAST Cache.  It’s always a good idea to run a current version of the code.

Best Practices For VNX arrays with MCx:

  • Spread it out. Spread the drives as evenly as possible across the available backend busses.  Be careful, though, as you shouldn’t add more than 8 FAST Cache flash drives per bus including any unused flash drives for use as hot-spares.
  • Always use DAE 0. Try and use DAE 0 on each bus for flash drives as it provides for the lowest latency.

Best Practices for VNX and CX4 arrays with FLARE 30-32: 

  • CX4? No more than 4 per bus. If you’re still using an older CX4 series array, don’t use more than 4 FAST Cache drives per bus, and don’t put all of them on bus 0. If they are all on the same bus, they could completely saturate this bus with I/O.
  • Spread it out. Spread the FAST Cache drives over as many buses as possible. This would especially be an issue if the drives were all on bus 0, because it is used to access the vault drives.  Note that the VNX has six times the back-end bandwidth per bus compared to a CX, so it’s less of a concern.
  • Match the drive sizes. All the drives in FAST Cache must be of the same capacity; otherwise the workload on each drive would rise proportionately with its capacity.  In other words, a 200GB drive would have double the workload of a 100Gb drive.
  • VNX? Use enclosure 0. Put the EFD drives in the first DAE on any bus (i.e. Enclosure 0).  The I/O has to pass through the LCC of each DAE between the drive and the SP, and each extra LCC it passes through will add a small amount of latency. The latency would normally be negligible, but is significant for flash drives.  Note that on the CX4, all I/O has to pass through every LCC anyway.
  • Mind the order the disks are added.  The order the drives are added dictates which drives are primary & secondary. The first drive added is the primary for the first mirror, the next drive added is its secondary for the first mirror, the third drive is the primary for the second mirror, etc.
  • Location, Location, Location. It’s a more advanced configuration and requires the use of the CLI, but for highest availability place the primary and secondary for each FAST Cache RAID 1 pair are on different buses.