Optimizing Persistent Disk and Local SSD Performance

Persistent disks are the most common storage options due to their price, performance, and predictability. However, you can create instances with local SSDs for even greater performance and lower latency, but without the data redundancy that you get from persistent disks. When you configure a storage option for applications that run on your instances, use the following processes.

Determine how much space you need.
Determine what performance characteristics your applications require.
Configure your instances to optimize storage performance.

Disk type performance comparison

Consider your storage size and performance requirements to help you determine the correct disk type and size for your instances. Performance requirements for a given application are typically separated into two distinct IO patterns.

Small reads and writes
Large reads and writes

For small reads and writes, the limiting factor is random input/output operations per second IOPS.

For large reads and writes, the limiting factor is throughput.

	Standard persistent disks	SSD persistent disks	Local SSD (SCSI)	Local SSD (NVMe)
Maximum sustained IOPS
Read IOPS per GB	0.75	30	266.7	453.3
Write IOPS per GB	1.5	30	186.7	240
Read IOPS per instance	3000	15,000	400,000	680,000
Write IOPS per instance	15,000	15,000	280,000	360,000
Maximum sustained throughput (MB/s)
Read throughput per GB	0.12	0.48	1.04	1.77
Write throughput per GB	0.12	0.48	0.73	0.94
Read throughput per instance	180	240	1,560	2,650
Write throughput per instance	120	240	1,090	1,400

How does Standard Persistent Disk compare to a physical hard drive

You may have never tested the IOPS or throughput you get from the hard drives used in your existing deployments, and so for the discussion of Persistent Disk performance you might need additional context. The following chart shows what size of volume is required to have the same top performance as a 7200 RPM SATA drive, which typically performs 75 IOPS or 120 MB/s.

IO Pattern	Size of volume to approximate a typical 7200 RPM SATA drive
75 small random reads	250 GB
75 small random writes	50 GB
120 MB/s streaming reads	1000 GB
120 MB/s streaming writes	1333 GB

To determine what size of volume is required to have the same optimal performance as a typical 7200 RPM SATA drive, you must first identify the I/O pattern of the volume. The chart below describes some I/O patterns and the size of each persistent disk type you would need to create for that I/O pattern.

IO pattern	Volume size of standard persistent disk (GB)	Volume size of solid-state persistent disk (GB)
Small random reads	250	3
Small random writes	50	3
Streaming large reads	1000	250
Streaming large writes	1333	250

Relationship between size and performance

Performance for Persistent Disk increases with volume size up to the per-VM maximums. While performance increases with volume size for both Standard and SSD PD, the performance increases much more quickly for SSD PD.

Standard Persistent Disk

Standard Persistent Disk sustained performance caps increase with the size of the volume until reaching the maximum performance available. These performance caps are for sustained activity to disk, not peak IO rates. We recognize that for many applications, IO is bursty and so for small volumes (less than 1 TB) Google has implemented a bursting capability that enables short bursts of IO above the documented caps. Once that burst is exhausted, the IO rates drop to the documented caps. This bursting capability can enable you to select your volumes based on their sustained rate rather than the peak rate. Depending on the burstiness of the workload, this can result in substantial cost savings.

The following table illustrates the performance limits for a 100 GB Standard Persistent Disk volume for each of the IO patterns.

	Maximum Sustained IOPS / 100 GB	Maximum Sustained throughput / 100 GB	Maximum Sustained throughput / VM
Read	75 IOPS	12 MB/s	180 MB/s
Write	150 IOPS	12 MB/s	120 MB/s

Standard Persistent Disk IOPS performance caps increase linearly with the size of the disk from the smallest 1 GB disk up to a 10 TB disk, but does not scale for volumes larger than 10 TB. Volumes between 10 TB and 64 TB have identical IOPS performance characteristics. If you require only 60 small random read IOPS per volume, you only need 200 GB. If you require 600 small random reads IOPS, then you would purchase at least a 2 TB volume.

Throughput maximums also scale with volume size up to the 10 TB disk size. Throughput performance is the same for all volume sizes between 10 TB and 64 TB.

For each IO pattern, the limits listed are for reads or writes. When discussing simultaneous reads and writes, the limits are on a sliding scale. The more reads being performed, the fewer writes can be performed and vice-versa.

Each of the following are example IOPS limits for simultaneous reads and writes:

Read	Write
3000 IOPS	0 IOPS
2250 IOPS	3750 IOPS
1500 IOPS	7500 IOPS
750 IOPS	11250 IOPS
0 IOPS	15000 IOPS

Each of the following are example throughput limits for simultaneous reads and writes:

Read	Write
180 MB/s	0 MB/s
135 MB/s	30 MB/s
90 MB/s	60 MB/s
45 MB/s	90 MB/s
0 MB/s	120 MB/s

SSD persistent disks

SSD persistent disk performance increases with the size of the volume. The IOPS performance increases faster for SSD persistent disks than standard persistent disks. Throughput increases at the same rate.

The following table illustrates the performance limits for a 100 GB SSD Persistent Disk volume for each of the IO patterns.

	Expected IOPS / 100 GB	Expected throughput / 100 GB	Maximum Sustained throughput / VM
Read	3000 IOPS	48 MB/s	240 MB/s
Write	3000 IOPS	48 MB/s	240 MB/s

The VM limit for SSD Persistent Disk throughput is 240 MB/s for reads and 240 MB/s for writes. Generally speaking, larger VMs will achieve higher bandwidth.

SSD Persistent Disk volumes reach the per-VM limit of 15000 random read IOPS at 333 GB. SSD Persistent Disk volumes reach the per-VM limit of 15000 random write IOPS at 500 GB.

Persistent Disk Size, Price, and Performance Summary

While you have several inputs to consider in deciding which volume type and size that is right for your application, one factor you do not need to consider is the price of using your volume. Persistent Disk has no per-IO costs, so there is no need to estimate monthly I/O to calculate budget for what you will spend on disks.

Consider only the relative costs of Standard PD versus SSD PD. Each is priced per GB. Standard PD is priced at $0.04 per GB and SSD PD is priced at $0.17 per GB. But since PD performance caps increase with the size of the volume, for IOPS oriented workloads it is instructive to look at the price per IOPS.

Standard PD is approximately $0.133 per random read IOPS and $0.0266 per random write IOPS. SSD PD is $0.0057 per random read IOPS and $0.0057 per random write IOPS.

Note that the price per IOPS for SSD PD is true up to the point where SSD PD reaches per-VM maximums. SSD PD reaches the per-VM limit of 10,000 random read IOPS at 333 GB and the per-VM limit of 15,000 random write IOPS at 500 GB. Standard PDs reach the per-VM limits at 10 TB. Adding PD space beyond 10 TB does not provide higher IOPS or more throughput.

Viewed in this light, we can give some quick rules of thumb for selecting the right Persistent Disk type.

Use the chart below as a quick reference for the performance and cost of some common Standard Persistent Disk volume sizes.

Volume Size (GB)	Monthly Price	Sustained Random Read IOPS Limit	Sustained Random Write IOPS Limit	Sustained Read Throughput Limit (MB/s)	Sustained Write Throughput Limit (MB/s)
10	$0.40	*	*	*	*
50	$2	37.5	75	6	6
100	$4	75	150	12	12
200	$8	150	300	24	24
500	$20	375	750	60	60
1000	$40	750	1500	120	120
2000	$80	1500	3000	180	120
5000	$200	3000	7500	180	120
10000	$400	3000	15000	180	120
16000	$640	3000	15000	180	120
32000	$1280	3000	15000	180	120
64000	$2560	3000	15000	180	120

* We suggest that you only use this volume size for boot volumes. IO bursting will be relied upon for any meaningful tasks.

Use the chart below as a quick reference for the performance and cost of some common SSD Persistent Disk volume sizes:

Volume Size (GB)	Monthly Price	Sustained Random Read IOPS Limit	Sustained Random Write IOPS Limit	Sustained Read Throughput Limit (MB/s)	Sustained Write Throughput Limit (MB/s)
10	$1.70	300	300	4.8	4.8
50	$8.50	1500	1500	24	24
100	$17.00	3000	3000	48	48
200	$34.00	6000	6000	96	96
333	$56.61	10000	10000	160	160
500	$85.00	15000	15000	240	240
1000	$170.00	15000	15000	240	240
5000	$340.00	15000	15000	240	240
10000	$680.00	15000	15000	240	240
16000	$2720.00	15000	15000	240	240
32000	$5440.00	15000	15000	240	240
64000	$10880.00	15000	15000	240	240

Examples

The following set of examples demonstrates how to select a Persistent Disk size based on performance requirements.

Example 1

Suppose you have a database installation (small random IOs) that requires a maximum random write rate of 300 IOPs:

Standard Persistent Disk

(100 GB / 150 IOPS) x 300 IOPS = 200 GB

200 GB x $0.04/GB = $8 per month

SSD Persistent Disk

(100 GB / 3000 IOPS) x 300 IOPS = 10 GB

10 GB x $0.170/GB = $1.70 per month

So if random write performance were your primary requirement, you would have the option to purchase a Standard Persistent Disk of at least 200 GB or an SSD Persistent Disk of at least 10 GB.

SSD Persistent Disk would be the less expensive choice.

Example 2

Suppose you have a database installation (small random IOs) that requires a maximum sustained random read rate of 450 IOPs:

Standard Persistent Disk

(100 GB / 30 IOPS) x 450 IOPS = 1500 GB

1500 GB x $0.04/GB = $60 per month

SSD Persistent Disk

(100 GB / 3000 IOPS) x 450 IOPS = 15 GB

15 GB x $0.170/GB = $2.55 per month

So if random read performance were your primary requirement, you would have the option to purchase a Standard Persistent Disk of at least 1500 GB or an SSD Persistent Disk of at least 15 GB.

SSD Persistent Disk would be the less expensive choice.

Example 3

Suppose you have a data streaming service (large IOs) that requires a maximum sustained read throughput rate of 120 MB/s:

Standard Persistent Disk

(100 GB / 12 MB/s) x 120 MB/s = 1000 GB

1000 GB x $0.04/GB = $40 per month

SSD Persistent Disk

(100 GB / 48 MB/s) x 120 MB/s = 250 GB

250 GB x $0.170/GB = $42.50 per month

So if read throughput were your primary requirement, you would have the option to purchase a Standard Persistent Disk of at least 1000 GB or an SSD Persistent Disk of at least 250 GB.

Standard Persistent Disk would be the less expensive choice.

Example 4

Suppose you have a database installation (small random IOs) that requires a maximum sustained random read rate of 450 IOPs and a maximum sustained random write rate of 300 IOPS. To satisfy the aggregate sustained performance requirements, create a volume with the performance requirements to satisfy both.

From examples 1 and 2 above:

Standard Persistent Disk

200 GB + 1500 GB = 1700 GB

1700 GB x $0.04/GB = $68 per month

SSD Persistent Disk

10 GB + 15 GB = 25 GB

25 GB x $0.17/GB = $4.25 per month

So if random read and write performance were your primary requirement, you would have the option to purchase a Standard Persistent Disk of at least 1700 GB or an SSD Persistent Disk of at least 25 GB.

SSD Persistent Disk would be the less expensive choice.

Network egress caps

Each persistent disk write operation contributes to your virtual machine instance's cumulative network egress traffic. This means that persistent disk write operations are capped by your instance's network egress cap.

To calculate the maximum persistent disk write traffic that a virtual machine instance can issue, subtract an instance’s other network egress traffic from its 2 Gbit/s/core network cap. The remaining throughput represents the throughput available to you for persistent disk write traffic.

Because persistent disk storage has 3.3x data redundancy, each write has to be written 3.3 times. This means that a single write operation counts as 3.3 I/O operations.

The following figures are the persistent disk I/O caps per virtual machine instance, based on the network egress caps for the virtual machine. These figures are based on an instance that has no additional IP traffic.

	Standard persistent disk		Solid-state persistent disks
Number of cores	Standard persistent disk write limit (MB/s)	Standard volume size needed to reach limit (GB)	Solid-state persistent disk write limit (MB/s)	Solid-state volume size needed to reach limit (GB)
1	78	650	78	163
2	120	1300	156	326
4	120	1333	240	500
8	120	1333	240	500
16	120	1333	240	500

To derive the figures above, divide the network egress cap – 2 Gbit/s, which is equivalent to 256 MB/s – by the data redundancy multiplier (3.3):

Number of max write I/O for one core = 256 / 3.3 = ~78 MB/s of I/O issued by your standard persistent disk

Using the standard persistent disk write throughput/GB figure provided in the performance chart presented earlier, you can now derive an appropriate disk size as well:

Desired disk size = 78 / 0.12 = ~650 GB

Optimizing persistent disk and local SSD performance

You can optimize persistent disks and local SSDs to handle your data more efficiently.

Optimizing persistent disks

Persistent Disks can give you the performance as described above, but the virtual machine must drive sufficient usage to reach the performance caps. So once you have sized your Persistent Disk volumes appropriately for your performance needs, your application and operating system might need some tuning.

In this section, we describe a few key elements that can be tuned for better performance and follow with discussion of how to apply some of them to specific types of workloads.

Disable lazy initialization and enable DISCARD commands

Persistent Disk supports DISCARD (or TRIM commands, which allow operating systems to inform the disks when blocks are no longer in use. DISCARD support allows the operating system to mark disk blocks as no longer needed, without incurring the cost of zeroing out the blocks.

On most Linux operating systems, you enable DISCARD when you mount a persistent disk to your instance. Windows 2012 R2 instances enable DISCARD by default when you mount a persistent disk. Windows 2008 R2 does not support DISCARD.

Enabling DISCARD can boost general runtime performance, and it can also speed up the performance of your disk when it is first mounted. Formatting an entire disk volume can be time consuming. As such, so-called "lazy formatting" is a common practice. The downside of lazy formatting is that the cost is often then paid the first time the volume is mounted. By disabling lazy initialization and enabling DISCARD commands, you can get fast format and mount.

Disable lazy initialization and enable DISCARD during format by passing the following parameters to mkfs.ext4:
```
-E lazy_itable_init=0,lazy_journal_init=0,discard
```
The lazy_journal_init=0 parameter does not work on instances with CentOS 6 or RHEL 6 images. For those instances, format persistent disks without that parameter.
```
-E lazy_itable_init=0,discard
```
Enable DISCARD commands on mount, pass the following flag to the mount command:
```
-o discard
```

IO queue depth

Many applications have setting that influence their IO queue depth to tune performance. Higher queue depths increase IOPS, but can also increase latency. Lower queue depths decrease per-IO latency, but sometimes at the expense of IOPS.

Readahead cache

To improve IO performance, operating systems employ techniques such as readahead where more of a file than was requested is read into memory with the assumption that subsequent reads are likely to need that data. Higher readahead increases throughput, but at the expense of memory and IOPs. Lower readahead increases IOPS, but at the expense of throughput.

On linux systems, you can get and set the readahead value with the blockdev command:

$ sudo blockdev --getra /dev/

$ sudo blockdev --setra  /dev/

The readahead value is <desired_readahead_bytes> / 512 bytes.

For example, if you desire a 8 MB readahead, 8 MB is 8388608 bytes (8 * 1024 * 1024).

8388608 bytes / 512 bytes = 16384

And you would set:

$ sudo blockdev --setra 16384 /dev/

Free CPUs

Reading and writing to Persistent Disk requires CPU cycles from your virtual machine. To achieve very high, consistent IOPS levels requires having CPUs free to process IO.

IOPS-oriented workloads

Databases, whether SQL or NoSQL, have usage patterns of random access to data. The following are suggested for IOPS-oriented workloads:

Lower readahead values are typically suggested in best practices documents for MongoDB, Apache Cassandra, and other database applications
IO queue depth values of 1 per each 400-800 IOPS, up to a limit of 64 on large volumes
One free CPU for every 2000 random read IOPS and 1 free CPU for every 2500 random write IOPS

Throughput-oriented workloads

Streaming operations, such as a Hadoop job, benefit from fast sequential reads. As such, larger block sizes can increase streaming performance. The default block size on volumes is 4K. For throughput-oriented workloads, values of 256KB or above are recommended.

Optimizing SSD persistent disk performance

The performance by disk type chart describes the expected, achievable performance numbers for solid-state persistent disks. To optimize your application and virtual machine instance to achieve these numbers, use the following guidelines:

Make sure your application is issuing enough I/O

If your application is issuing less IOPS than the limit described in the chart above, you won't reach that level of IOPS. For example, on a 500 GB disk, the expected IOPS limit is 15,000 IOPS. However, if you issue less than that, or if you issue I/O operations that are larger than 16 KB, you won't achieve 15,000 IOPS.
Make sure to issue I/O with enough parallelism

Use a high-enough queue depth that you are leveraging the parallelism of the operating system. If you issue 1000 IOPS but do so in a synchronous manner with a queue depth of 1, you will achieve far less IOPS than the limit described in the chart. At a minimum, your application should have a queue depth of at least 1 per every 400-800 IOPS.
Make sure there is enough available CPU on the virtual machine instance issuing the I/O

If your virtual machine instance is starved for CPU, your application won't be able to manage the IOPS described above. As a rule of thumb, you should have one available CPU for every 2000-2500 IOPS of expected traffic.
Make sure your application is optimized for a reasonable temporal data locality on large disks

If your application accesses data distributed across different parts of a disk over short period of time (hundreds of GB per core), you won't achieve optimal IOPS. For best performance, optimize for temporal data locality, weighing factors like the fragmentation of the disk and the randomness of accessed parts of the disk.
Make sure the I/O scheduler in the operating system is configured to meet your specific needs

On Linux-based systems, you can set the I/O scheduler to noop to achieve the highest number of IOPS on SSD-backed devices.

Optimizing Local SSDs

By default, most Compute Engine-provided Linux images will automatically run an optimization script that configures the instance for peak local SSD performance. The script enables certain Queue sysfs files settings that enhance the overall performance of your machine and masks interrupt requests (IRQs) to specific virtual CPUs (vCPUs). This script only optimizes performance for Compute Engine local SSD devices.

Ubuntu, SLES, and older images might not be configured to include this performance optimization. If you are using any of these images, or an image older than v20141218, you can run this script manually to configure your instance instead.

Manually configure your local SSD

Create an instance with a local SSD attached and ssh into it.
Create a new file named set-interrupts.sh.

Edit your new file and add the following contents:

#!/bin/bash

total_cpus=`nproc`

config_nvme()
{
  current_cpu=0
  for dev in /sys/bus/pci/drivers/nvme/*
  do
    if [ ! -d $dev ]
    then
      continue
    fi
    for irq_info in $dev/msi_irqs/*
    do
      if [ ! -f $irq_info ]
      then
        continue
      fi
      current_cpu=$((current_cpu % total_cpus))
      cpu_mask=`printf "%x" $((1<<current_cpu))`
      irq=$(basename $irq_info)$a
      echo Setting IRQ $irq smp_affinity to $cpu_mask
      echo $cpu_mask > /proc/irq/$irq/smp_affinity
      current_cpu=$((current_cpu+1))
    done
  done
}

config_scsi()
{
 irqs=()
 for device in /sys/bus/virtio/drivers/virtio_scsi/virtio*
 do
   ssd=0
   for target_path in $device/host*/target*/*
   do
     if [ ! -f $target_path/model ]
     then
       continue
     fi
     model=$(cat $target_path/model)
     if [[ $model =~ .*EphemeralDisk.* ]]
     then
       ssd=1
       for queue_path in $target_path/block/sd*/queue
       do
         echo noop > $queue_path/scheduler
         echo 0 > $queue_path/add_random
         echo 512 > $queue_path/nr_requests
         echo 0 > $queue_path/rotational
         echo 0 > $queue_path/rq_affinity
         echo 1 > $queue_path/nomerges
       done
     fi
   done
   if [[ $ssd == 1 ]]
   then
     request_queue=$(basename $device)-request
     irq=$(cat /proc/interrupts |grep $request_queue| awk '{print $1}'| sed 's/://')
     irqs+=($irq)
   fi
 done
 irq_count=${#irqs[@]}
 if [ $irq_count != 0 ]
 then
   stride=$((total_cpus / irq_count))
   stride=$((stride < 1 ? 1 : stride))
   current_cpu=0
   for irq in "${irqs[@]}"
   do
     current_cpu=$(($current_cpu % $total_cpus))
     cpu_mask=`printf "%x" $((1<<$current_cpu))`
     echo Setting IRQ $irq smp_affinity to $cpu_mask
     echo $cpu_mask > /proc/irq/$irq/smp_affinity
     current_cpu=$((current_cpu+stride))
   done
 fi
}

config_nvme
config_scsi

Set the permissions on your file.
```
  $ chmod +rx set-interrupts.sh
  
```

Run your file.

  $ ./set-interrupts.sh

The script returns output similar to the following example:

Setting IRQ 43 smp_affinity to 1
Setting IRQ 47 smp_affinity to 10

Benchmarking local SSD performance

The local SSD performance figures provided in the Performance section were achieved using specific settings on the local SSD instance. If your instance is having trouble reaching these performance numbers and you have already configured the instance using the recommended local SSD settings, you can compare your performance numbers against the published numbers by replicating the settings used by the Compute Engine team.

Create a local SSD instance that has four or eight cores for each device, depending on your workload. For example, if you had four local SSD devices attached to an instance, you should use a 16-core machine type.

Run the following script on your machine, which replicates the settings used to achieve these numbers:

# install dependencies
sudo apt-get -y update
sudo apt-get install -y build-essential git libtool gettext autoconf \
libgconf2-dev libncurses5-dev python-dev fio

# blkdiscard
git clone git://git.kernel.org/pub/scm/utils/util-linux/util-linux.git
cd util-linux/
./autogen.sh
./configure --disable-libblkid
make
sudo mv blkdiscard /usr/bin/
sudo blkdiscard /dev/disk/by-id/google-local-ssd-0

# full write pass
sudo fio --name=writefile --size=100G --filesize=100G \
--filename=/dev/disk/by-id/google-local-ssd-0 --bs=1M --nrfiles=1 \
--direct=1 --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 \
--iodepth=200 --ioengine=libaio

# rand read
sudo fio --time_based --name=benchmark --size=100G --runtime=30 \
--filename=/dev/disk/by-id/google-local-ssd-0 --ioengine=libaio --randrepeat=0 \
--iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 \
--numjobs=4 --rw=randread --blocksize=4k --group_reporting

# rand write
sudo fio --time_based --name=benchmark --size=100G --runtime=30 \
--filename=/dev/disk/by-id/google-local-ssd-0 --ioengine=libaio --randrepeat=0 \
--iodepth=128 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 \
--numjobs=4 --rw=randwrite --blocksize=4k --group_reporting

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies.

Last updated May 31, 2016.

Optimizing Persistent Disk and Local SSD Performance

Contents

Disk type performance comparison

How does Standard Persistent Disk compare to a physical hard drive

Relationship between size and performance

Standard Persistent Disk

SSD persistent disks

Persistent Disk Size, Price, and Performance Summary

Examples

Example 1

Standard Persistent Disk

SSD Persistent Disk

Example 2

Standard Persistent Disk

SSD Persistent Disk

Example 3

Standard Persistent Disk

SSD Persistent Disk

Example 4

Standard Persistent Disk

SSD Persistent Disk

Network egress caps

Optimizing persistent disk and local SSD performance

Optimizing persistent disks

Disable lazy initialization and enable DISCARD commands

IO queue depth

Readahead cache

Free CPUs

IOPS-oriented workloads

Throughput-oriented workloads

Optimizing SSD persistent disk performance

Optimizing Local SSDs

Manually configure your local SSD

Benchmarking local SSD performance

Send feedback about...