Processor State Control for Your EC2 Instance
C-states control the sleep levels that a core can enter when it is idle. C-states are numbered starting with C0 (the shallowest state where the core is totally awake and executing instructions) and go to C6 (the deepest idle state where a core is powered off). P-states control the desired performance (in CPU frequency) from a core. P-states are numbered starting from P0 (the highest performance setting where the core is allowed to use Intel® Turbo Boost Technology to increase frequency if possible), and they go from P1 (the P-state that requests the maximum baseline frequency) to P15 (the lowest possible frequency).
The following instance types provide the ability for an operating system to control processor C-states and P-states:
c4.8xlarged2.8xlargem4.10xlarge
You might want to change the C-state or P-state settings to increase processor performance consistency, reduce latency, or tune your instance for a specific workload. The default C-state and P-state settings provide maximum performance, which is optimal for most workloads. However, if your application would benefit from reduced latency at the cost of higher single- or dual-core frequencies, or from consistent performance at lower frequencies as opposed to bursty Turbo Boost frequencies, consider experimenting with the C-state or P-state settings that are available to these instances.
The following sections describe the different processor state configurations and how to monitor the effects of your configuration. These procedures were written for, and apply to Amazon Linux; however, they may also work for other Linux distributions with a Linux kernel version of 3.9 or newer. For more information about other Linux distributions and processor state control, see your system-specific documentation.
Note
The examples on this page use the turbostat utility (which is available on Amazon Linux by default) to display processor frequency and C-state information, and the stress command (which can be installed by running sudo yum install -y stress) to simulate a workload.
Contents
Highest Performance with Maximum Turbo Boost Frequency
This is the default processor state control configuration for the Amazon Linux AMI, and it is recommended for most workloads. This configuration provides the highest performance with lower variability. Allowing inactive cores to enter deeper sleep states provides the thermal headroom required for single or dual core processes to reach their maximum Turbo Boost potential.
The following example shows a c4.8xlarge instance with two cores
actively performing work reaching their maximum processor Turbo Boost
frequency.
[ec2-user ~]$ sudo turbostat stress -c 2 -t 10
stress: info: [30680] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
stress: info: [30680] successful run completed in 10s
pk cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 Pkg_W RAM_W PKG_% RAM_%
5.54 3.44 2.90 0 9.18 0.00 85.28 0.00 0.00 0.00 0.00 0.00 94.04 32.70 54.18 0.00
0 0 0 0.12 3.26 2.90 0 3.61 0.00 96.27 0.00 0.00 0.00 0.00 0.00 48.12 18.88 26.02 0.00
0 0 18 0.12 3.26 2.90 0 3.61
0 1 1 0.12 3.26 2.90 0 4.11 0.00 95.77 0.00
0 1 19 0.13 3.27 2.90 0 4.11
0 2 2 0.13 3.28 2.90 0 4.45 0.00 95.42 0.00
0 2 20 0.11 3.27 2.90 0 4.47
0 3 3 0.05 3.42 2.90 0 99.91 0.00 0.05 0.00
0 3 21 97.84 3.45 2.90 0 2.11
...
1 1 10 0.06 3.33 2.90 0 99.88 0.01 0.06 0.00
1 1 28 97.61 3.44 2.90 0 2.32
...
10.002556 secIn this example, vCPUs 21 and 28 are running at their maximum Turbo Boost frequency because
the other cores have entered the C6 sleep state to save power and
provide both power and thermal headroom for the working cores. vCPUs 3 and 10 (each
sharing a processor core with vCPUs 21 and 28) are in the C1 state,
waiting for instruction.
In the following example, all 18 cores are actively performing work, so there is no headroom for maximum Turbo Boost, but they are all running at the "all core Turbo Boost" speed of 3.2 GHz.
[ec2-user ~]$ sudo turbostat stress -c 36 -t 10
stress: info: [30685] dispatching hogs: 36 cpu, 0 io, 0 vm, 0 hdd
stress: info: [30685] successful run completed in 10s
pk cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 Pkg_W RAM_W PKG_% RAM_%
99.27 3.20 2.90 0 0.26 0.00 0.47 0.00 0.00 0.00 0.00 0.00 228.59 31.33 199.26 0.00
0 0 0 99.08 3.20 2.90 0 0.27 0.01 0.64 0.00 0.00 0.00 0.00 0.00 114.69 18.55 99.32 0.00
0 0 18 98.74 3.20 2.90 0 0.62
0 1 1 99.14 3.20 2.90 0 0.09 0.00 0.76 0.00
0 1 19 98.75 3.20 2.90 0 0.49
0 2 2 99.07 3.20 2.90 0 0.10 0.02 0.81 0.00
0 2 20 98.73 3.20 2.90 0 0.44
0 3 3 99.02 3.20 2.90 0 0.24 0.00 0.74 0.00
0 3 21 99.13 3.20 2.90 0 0.13
0 4 4 99.26 3.20 2.90 0 0.09 0.00 0.65 0.00
0 4 22 98.68 3.20 2.90 0 0.67
0 5 5 99.19 3.20 2.90 0 0.08 0.00 0.73 0.00
0 5 23 98.58 3.20 2.90 0 0.69
0 6 6 99.01 3.20 2.90 0 0.11 0.00 0.89 0.00
0 6 24 98.72 3.20 2.90 0 0.39
...High Performance and Low Latency by Limiting Deeper C-states
C-states control the sleep levels that a core may enter when it is inactive. You may want to control C-states to tune your system for latency versus performance. Putting cores to sleep takes time, and although a sleeping core allows more headroom for another core to boost to a higher frequency, it takes time for that sleeping core to wake back up and perform work. For example, if a core that is assigned to handle network packet interrupts is asleep, there may be a delay in servicing that interrupt. You can configure the system to not use deeper C-states, which reduces the processor reaction latency, but that in turn also reduces the headroom available to other cores for Turbo Boost.
A common scenario for disabling deeper sleep states is a Redis database application, which stores the database in system memory for the fastest possible query response time.
To limit deeper sleep states on Amazon Linux
Open the
/boot/grub/grub.conffile with your editor of choice.[ec2-user ~]$sudo vim /boot/grub/grub.confEdit the
kernelline of the first entry and add theintel_idle.max_cstate=1option to setC1as the deepest C-state for idle cores.# created by imagebuilder default=0 timeout=1 hiddenmenu title Amazon Linux 2014.09 (3.14.26-24.46.amzn1.x86_64) root (hd0,0) kernel /boot/vmlinuz-3.14.26-24.46.amzn1.x86_64 root=LABEL=/ console=ttyS0intel_idle.max_cstate=1initrd /boot/initramfs-3.14.26-24.46.amzn1.x86_64.imgSave the file and exit your editor.
Reboot your instance to enable the new kernel option.
[ec2-user ~]$sudo reboot
The following example shows a c4.8xlarge instance with two cores
actively performing work at the "all core Turbo Boost" core
frequency.
[ec2-user ~]$ sudo turbostat stress -c 2 -t 10
stress: info: [5322] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
stress: info: [5322] successful run completed in 10s
pk cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 Pkg_W RAM_W PKG_% RAM_%
5.56 3.20 2.90 0 94.44 0.00 0.00 0.00 0.00 0.00 0.00 0.00 131.90 31.11 199.47 0.00
0 0 0 0.03 2.08 2.90 0 99.97 0.00 0.00 0.00 0.00 0.00 0.00 0.00 67.23 17.11 99.76 0.00
0 0 18 0.01 1.93 2.90 0 99.99
0 1 1 0.02 1.96 2.90 0 99.98 0.00 0.00 0.00
0 1 19 99.70 3.20 2.90 0 0.30
...
1 1 10 0.02 1.97 2.90 0 99.98 0.00 0.00 0.00
1 1 28 99.67 3.20 2.90 0 0.33
1 2 11 0.04 2.63 2.90 0 99.96 0.00 0.00 0.00
1 2 29 0.02 2.11 2.90 0 99.98
...In this example, the cores for vCPUs 19 and 28 are running at 3.2 GHz, and the other
cores are in the C1 C-state, awaiting instruction. Although the
working cores are not reaching their maximum Turbo Boost frequency, the inactive cores
will be much faster to respond to new requests than they would be in the deeper
C6 C-state.
Baseline Performance with the Lowest Variability
You can reduce the variability of processor frequency with P-states. P-states control the desired performance (in CPU frequency) from a core. Most workloads perform better in P0, which requests Turbo Boost. But you may want to tune your system for consistent performance rather than bursty performance that can happen when Turbo Boost frequencies are enabled.
Intel Advanced Vector Extensions (AVX or AVX2) workloads can perform well at lower frequencies, and AVX instructions can use more power. Running the processor at a lower frequency, by disabling Turbo Boost, can reduce the amount of power used and keep the speed more consistent. For more information about optimizing your instance configuration and workload for AVX, see http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf.
This section describes how to limit deeper sleep states and disable Turbo Boost (by
requesting the P1 P-state) to provide low-latency and the lowest
processor speed variability for these types of workloads.
To limit deeper sleep states and disable Turbo Boost on Amazon Linux
Open the
/boot/grub/grub.conffile with your editor of choice.[ec2-user ~]$sudo vim /boot/grub/grub.confEdit the
kernelline of the first entry and add theintel_idle.max_cstate=1option to setC1as the deepest C-state for idle cores.# created by imagebuilder default=0 timeout=1 hiddenmenu title Amazon Linux 2014.09 (3.14.26-24.46.amzn1.x86_64) root (hd0,0) kernel /boot/vmlinuz-3.14.26-24.46.amzn1.x86_64 root=LABEL=/ console=ttyS0intel_idle.max_cstate=1initrd /boot/initramfs-3.14.26-24.46.amzn1.x86_64.imgSave the file and exit your editor.
Reboot your instance to enable the new kernel option.
[ec2-user ~]$sudo rebootWhen you need the low processor speed variability that the
P1P-state provides, execute the following command to disable Turbo Boost.[ec2-user ~]$sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"When your workload is finished, you can re-enable Turbo Boost with the following command.
[ec2-user ~]$sudo sh -c "echo 0 > /sys/devices/system/cpu/intel_pstate/no_turbo"
The following example shows a c4.8xlarge instance with two vCPUs
actively performing work at the baseline core frequency, with no Turbo Boost.
[ec2-user ~]$ sudo turbostat stress -c 2 -t 10
stress: info: [5389] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
stress: info: [5389] successful run completed in 10s
pk cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7 Pkg_W RAM_W PKG_% RAM_%
5.59 2.90 2.90 0 94.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 128.48 33.54 200.00 0.00
0 0 0 0.04 2.90 2.90 0 99.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 65.33 19.02 100.00 0.00
0 0 18 0.04 2.90 2.90 0 99.96
0 1 1 0.05 2.90 2.90 0 99.95 0.00 0.00 0.00
0 1 19 0.04 2.90 2.90 0 99.96
0 2 2 0.04 2.90 2.90 0 99.96 0.00 0.00 0.00
0 2 20 0.04 2.90 2.90 0 99.96
0 3 3 0.05 2.90 2.90 0 99.95 0.00 0.00 0.00
0 3 21 99.95 2.90 2.90 0 0.05
...
1 1 28 99.92 2.90 2.90 0 0.08
1 2 11 0.06 2.90 2.90 0 99.94 0.00 0.00 0.00
1 2 29 0.05 2.90 2.90 0 99.95The cores for vCPUs 21 and 28 are actively performing work at the baseline processor speed
of 2.9 GHz, and all inactive cores are also running at the baseline speed in the
C1 C-state, ready to accept instructions.

