Storage Performance Benchmarking with FIO

Flexible IO tester (FIO) is an open-source synthetic benchmark tool first developed by Jens Axboe.  FIO can generate various IO workloads: sequential reads or random writes, synchronous or asynchronous, all based on the options provided by the user.  FIO provides various global options through which different type of workloads can be generated.  FIO is the easiest and versatile tool to quickly perform IO performance tests on storage system, and allows you to simulate different types of IO loads and tweak several parameters, among others, the write/read mix and the amount of processes.  I’ll likely make a few additional posts with some of the other storage benchmarking tools I’ve used, but I’m focusing on FIO for this post.  Why FIO?  It’s a great tool, and it’s pros outweigh it’s cons for me.


  • It has a batch mode and a very extensive set of parameters.
  • Unlike IOMeter, it is still being actively developed.
  • It has multi-OS support.
  • It’s free.


  • It is CLI only, there are no GUI or Graphics features.
  • It has a rather complex syntax and it takes some time to get the hang of it.

Download and Installation

FIO can be run from either Linux or Windows, although Windows will first require an installation of Cygwin.  FIO works on Linux, Solaris, AIX, HP-UX, OSX, NetBSD, OpenBSD, Windows, FreeBSD, and DragonFly.  Some features and options may only be available on some of the platforms, typically because those features only apply to that platform (like the solarisaio engine, or the splice engine on Linux).  Note that you can check github for the latest version before you get started.

You can run the following commands from a Linux server to download and install the FIO package:

cd /root

yum install -y make gcc libaio-devel || ( apt-get update && apt-get install -y make gcc libaio-dev  </dev/null )

wget ; tar xf fio*

cd fio*


How to compile FIO on 64-bit Windows:

Install Cygwin ( Install **make** and all     packages starting with **mingw64-i686** and **mingw64-x86_64**.

Open the Cygwin Terminal.

Go to the fio directory (source files).

Run ``make clean && make -j``.

To build fio on 32-bit Windows, run ``./configure --build-32bit-win`` before ``make``.

FIO Cheat sheet

With FIO compiled, we can now run some tests.  For reference, I’ll start off with some basic commands for simulating different types of workloads.

Sequential Reads – Async mode – 8K block size – Direct IO – 100% Reads

fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=8k --numjobs=8 --size=1G --runtime=600  --group_reporting

Sequential Writes – Async mode – 32K block size – Direct IO – 100% Writes

fio --name=seqwrite --rw=write --direct=1 --ioengine=libaio --bs=32k --numjobs=4 --size=2G --runtime=600 --group_reporting

Random Reads – Async mode – 8K block size – Direct IO – 100% Reads

fio --name=randread --rw=randread --direct=1 --ioengine=libaio --bs=8k --numjobs=16 --size=1G --runtime=600 --group_reporting

Random Writes – Async mode – 64K block size – Direct IO – 100% Writes

fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=64k --numjobs=8 --size=512m --runtime=600 --group_reporting

Random Read/Writes – Async mode – 16K block size – Direct IO – 90% Reads/10% Writes

fio --name=randrw --rw=randrw --direct=1 --ioengine=libaio --bs=16k --numjobs=8 --rwmixread=90 --size=1G --runtime=600 --group_reporting

Host Considerations

To avoid IOs reporting out of the host system cache, use the direct option which will directly read/write to the disk.  Use the Linux native asynchronous IO using the ioengine directive with libaio.  When FIO is launched, it will create the file with the name provided in –name to the size as provided in –size with block size as –bs.  If the –numjobs are provided, it will create the files in the format of name.n.0 where n will be between 0 and –numjobs.

–jobs = The more jobs, the higher the performance can be, based on the resource availability.  If your server is limited on the resources (TCP or FC), I’d recommend running FIO across multiple servers to push a higher workload to the storage array.

Block Size Matters

Many storage vendors will advertise performance benchmarks based on 4k block sizes, which can artificially inflate the total IO number that the array is capable of handling.  In my professional experience with the workloads I’ve supported, the most popular read size is between 32KB and 64KB and the most popular write size is between 8KB and 32KB.  VMWare-heavy environments may skew a bit lower in read block size.  Read IO is typically more common than Write IO, at a rate of around 3:1.  It’s important to know the characteristics of your workload before you begin testing, as we need to look at IO Size as a weight attached to the IO. An IO of size 64KB will have a weight 8 times higher than an IO of size 8KB since it will move 8 times as many bytes.  A 256K block has 64 times the payload of a 4K block.  Both examples take substantially more effort for every component of the storage stack to satisfy the IO request. Applications and the operating systems they run on generate a wide, ever-changing mix of block sizes based on the characteristics of the application and the workloads being serviced. Reads and writes are often delivered using different block sizes as well. Block size has a significant impact on the latency your applications see.

Try to understand the IO size distributions of your workload and use those IO size modalities when you develop your FIO test commands. If a single IO size is a requirement for a quick rule-of-thumb comparison, then 32KB has been a pretty reasonable number for me to use, as it is a logical convergence of the weighted IO size distribution of most of the shared workload arrays I’ve supported. Your mileage may vary, of course.

Because block sizes have different effects on different storage systems, visibility into this metric is critical. The storage fabric, the protocol, the processing overhead on the HBAs, the switches, the storage controllers, and the storage media are all affected by it.

General Tips on Testing

Work on large datasets.  Your dataset should be at least double the amount of RAM in the OS.  For example, if the OS RAM is 16GB, test 32GB datasets multiplied by the number of CPU cores.

The Rule of Thumb:  75/25.  Although it really depends on your workloads, typically the rule of thumb is that there are 25% writes and 75% reads on the dataset.

Test from small to large blocks of I/O.  Consider testing small blocks of I/O up to large blocks of I/O in the following order: 512 bytes, 4K, 16K, 64K, 1MB to get proper measurement that can be the visualized as a histogram. This makes it easier to interpret.

Test multiple workload patterns.  Not everything is sequential read/write. Test all scenarios: read / write, write only, read only, random read / random write, random read only, and random write only.

Sample Output

Here’s a sample command string for FIO that includes many of the command switches you’ll want to use.  Each parameter can be tweaked to your specific environment.  It creates 8 files (numjobs=8) each with size 512MB (size) at 64K block size (bs=64k) and will perform random read/write (rw=randrw) with the mixed workload of 70% reads and 30% writes. The job will run for full 5 minutes (runtime=300 & time_based) even if the files were created and read/written.

[root@server1 fio]# fio --name=randrw --ioengine=libaio --iodepth=1 --rw=randrw --bs=64k --direct=1 --size=512m --numjobs=8 --runtime=300 --group_reporting --time_based --rwmixread=70


 Starting 8 processes

 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 randrw: Laying out IO file(s) (1 file(s) / 512MB)
 Jobs: 8 (f=8): [mmmmmmmm] [2.0% done] [252.0MB/121.3MB/0KB /s] [4032/1940/0 iops] [eta 04m:55s]
randrw: (groupid=0, jobs=8): err= 0: pid=31900: Mon Jun 13 01:01:08 2016
 read : io=78815MB, bw=269020KB/s, iops=4203, runt=300002msec
 slat (usec): min=6, max=173, avg= 9.99, stdev= 3.63
 clat (usec): min=430, max=23909, avg=1023.31, stdev=273.66
 lat (usec): min=447, max=23917, avg=1033.46, stdev=273.78
 clat percentiles (usec):
 | 1.00th=[ 684], 5.00th=[ 796], 10.00th=[ 836], 20.00th=[ 892],
 | 30.00th=[ 932], 40.00th=[ 964], 50.00th=[ 996], 60.00th=[ 1032],
 | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1208], 95.00th=[ 1288],
 | 99.00th=[ 1560], 99.50th=[ 2256], 99.90th=[ 3184], 99.95th=[ 3408],
 | 99.99th=[13888]
 bw (KB /s): min=28288, max=39217, per=12.49%, avg=33596.69, stdev=1709.09
 write: io=33899MB, bw=115709KB/s, iops=1807, runt=300002msec
 slat (usec): min=7, max=140, avg=11.42, stdev= 3.96
 clat (usec): min=1246, max=24744, avg=2004.11, stdev=333.23
 lat (usec): min=1256, max=24753, avg=2015.69, stdev=333.36
 clat percentiles (usec):
 | 1.00th=[ 1576], 5.00th=[ 1688], 10.00th=[ 1752], 20.00th=[ 1816],
 | 30.00th=[ 1880], 40.00th=[ 1928], 50.00th=[ 1976], 60.00th=[ 2040],
 | 70.00th=[ 2096], 80.00th=[ 2160], 90.00th=[ 2256], 95.00th=[ 2352],
 | 99.00th=[ 2576], 99.50th=[ 2736], 99.90th=[ 4256], 99.95th=[ 4832],
 | 99.99th=[16768]
 bw (KB /s): min=11776, max=16896, per=12.53%, avg=14499.30, stdev=907.78
 lat (usec) : 500=0.01%, 750=1.61%, 1000=33.71%
 lat (msec) : 2=50.35%, 4=14.27%, 10=0.04%, 20=0.02%, 50=0.01%
 cpu : usr=0.46%, sys=1.60%, ctx=1804510, majf=0, minf=196
 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, &gt;=64=0.0%
 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, &gt;=64=0.0%
 issued : total=r=1261042/w=542389/d=0, short=r=0/w=0/d=0
 latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
 READ: io=78815MB, aggrb=269020KB/s, minb=269020KB/s, maxb=269020KB/s, mint=300002msec, maxt=300002msec
 WRITE: io=33899MB, aggrb=115708KB/s, minb=115708KB/s, maxb=115708KB/s, mint=300002msec, maxt=300002msec

Additional Samples

I’ll run through an additional set of simple examples of using FIO as well using different workload patterns.

Random read/write performance

If you want to compare disk performance with a simple 3:1 4K read/write test, use the following command:

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

This command string create a 4 GB file and perform 4KB reads and writes using a 75%/25% split within the file, with 64 operations running at a time. The 3:1 ratio represents a typical database.

The output is below, with the IO numbers highlighted in red.

Jobs: 1 (f=1): [m] [100.0% done] [43496K/14671K /s] [10.9K/3667 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=31214: Fri May 9 16:01:53 2014
read : io=3071.1MB, bw=39492KB/s, iops=8993 , runt= 79653msec
write: io=1024.7MB, bw=13165KB/s, iops=2394 , runt= 79653msec
cpu : usr=16.26%, sys=71.94%, ctx=25916, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=786416/w=262160/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=3071.1MB, aggrb=39492KB/s, minb=39492KB/s, maxb=39492KB/s, mint=79653msec, maxt=79653msec
WRITE: io=1024.7MB, aggrb=13165KB/s, minb=13165KB/s, maxb=13165KB/s, mint=79653msec, maxt=79653msec
Disk stats (read/write):
vda: ios=786003/262081, merge=0/22, ticks=3883392/667236, in_queue=4550412, util=99.97%

This tests shows the array performed 8993 read operations per second and 2394 write operations per second.

Random read performance

To measure random reads, we’ll change FIO command a bit:

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randread


Jobs: 1 (f=1): [r] [100.0% done] [62135K/0K /s] [15.6K/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=31181: Fri May 9 15:38:57 2014
read : io=1024.0MB, bw=62748KB/s, iops=19932 , runt= 16711msec
cpu : usr=5.94%, sys=90.13%, ctx=1885, majf=0, minf=89
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=262144/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=1024.0MB, aggrb=62747KB/s, minb=62747KB/s, maxb=62747KB/s, mint=16711msec, maxt=16711msec
Disk stats (read/write):
vda: ios=259063/2, merge=0/1, ticks=951356/20, in_queue=951308, util=96.83%

This test shows the storage array performing 19,932 read operations per second.

Random write performance

Modify the FIO command slightly to use randwrite instead of randread for the random write test.

./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite


Jobs: 1 (f=1): [w] [100.0% done] [0K/26326K /s] [0 /6581 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=31235: Fri May 9 16:16:21 2014
write: io=1024.0MB, bw=29195KB/s, iops=5434, runt= 35916msec
cpu : usr=77.42%, sys=13.74%, ctx=2306, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=29195KB/s, minb=29195KB/s, maxb=29195KB/s, mint=35916msec, maxt=35916msec
Disk stats (read/write):
vda: ios=0/260938, merge=0/11, ticks=0/2315104, in_queue=2316372, util=98.87%

This tests shows storage scoring 5,434 write operations per second.

8 thoughts on “Storage Performance Benchmarking with FIO”

  1. Minor nitpick: You only need a Cygwin environment to build fio for Windows. Once built a fio binary will run on appropriate “stock” Windows without any extras 🙂

  2. The example CLI command just below “The job will run for full 5 minutes…” is mis-pasted. You’ve included some of the fio output as part of the command to run. Specifically, you need to remove the following from your command as it is fio output: “randrw: (g=0): rw=randrw, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=1”

  3. In the example from the article:

    ./fio –randrepeat=1 –ioengine=libaio –direct=1 –gtod_reduce=1 –name=test –filename=test –bs=4k –iodepth=64 –size=4G –readwrite=randwrite

    If this is tweaked to have say multiple jobs such as numjobs=10.  Then after this is ran the drive space should be 40GB filled assuming the drive was at 0GB of drive capacity at beginning of test?

    What if you run this test and the drive space only shows 4GB filled?

  4. Hi Steve,

    I’m a performance SME for an IT company and I’m helping a customer with bench-marking his SMB FS on a backend AFA. Can we use FIO ioengine=windowaio to simulate SMB write to storage?

    I feel like this isn’t correct, like I may need a specific NAS performance testing tool.



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.