/ LINUX

CPU Affinity Introduction

Overview

There are two types of CPU affinity. The first, soft affinity (also called  natural affinity) is the tendency of the scheduler to try to keep  processes on the same CPU as long as possible. It is merely an attempt;  if it is ever infeasible, the process is migrated to another processor.  The O(1) scheduler exhibits excellent natural affinity. On the  opposite end, however, is the Linux 2.4.x scheduler (admittedly an old kernel), which has poor CPU  affinity. This behavior results in the ping-pong effect. The scheduler  bounces processes between multiple processors each time they are  scheduled and rescheduled.

Hard  affinity, on the other hand, is what the CPU affinity system call  provides. It is a requirement, and processes must adhere to a specified  hard affinity. If a processor is bound to CPU 1, for example, then it  can run only on CPU 1.

CPU Affinity Benefits

The  first benefit of CPU affinity is optimizing cache performance. The  scheduler tries hard to keep tasks on the same processor, but in some  performance-critical situations, i.e. a highly threaded application, it  makes sense to enforce the affinity as a hard requirement.

Multiprocessing computers try and keep the processor caches valid. Data can be kept in only one processor’s cache at a time; otherwise, the processor’s cache may grow out of sync. Consequently,  whenever a processor adds a line of data to its local cache, all the  other processors in the system also caching it must invalidate that data  but this invalidation is costly.

But the real performance  penalty comes into play when processes bounce between processors as  they constantly cause cache invalidations, and the data they want is  never in the cache when they need it. Thus, cache miss rates grow very large. CPU affinity protects against this and improves cache performance.

A  second benefit of CPU affinity is if multiple threads are accessing the  same data, it can make sense to bind them all to the same processor. Doing so guarantees that the threads do not contend over data and cause cache misses. This  does diminish the performance gained from multi-threading on SMP,  however if the threads are inherently serialized, however, the improved  cache hit rate can negate this.

The third benefit is found in real-time or time-sensitive applications. In this approach, all the system processes are bound to a subset of the processors on the system. The application then is bound to the remaining processors. For  example in a dual-processor system, the application would be bound to  one processor, and all other processes are bound to the other. This ensures that the application receives the full attention of the processor.

Implementing CPU Affinity under Linux

There are 2 methods to implement cpu affinity, within the source code of the application itself using the sched_getaffinity system call or by use of the command line tool taskset.

Using taskset to assign CPU affinity

Under Linux it is straight forward to bind an application to one or more cores via the taskset command. Once  you know the processor type you are using, and therefore the allocation  you require, taskset can be used to either start the application bound  to the correct cores or to rebind an already running application. For example:

taskset –c 2,6 <application>

The above taskset command is for a HarperTown based system and is therefore binding an  application to core 3 and 7 (taskset start at cpu0 hence the num-1). In order to bind a process to a cpu(s) taskset needs to be run by root.

taskset can also be run on an existing application to change its processor binding(s) if required as follows:

taskset –c 2,6 –p <pid>

To verify that a  taskset binding has worked, or to verify what the binding profile of an  already running application, run, as any user:

taskset –c –p <pid>

This will return the core(s) being used by the process.

Conclusion

CPU affinity is of most benefit to realtime processes and network IRQs, if the network throughput is consistently high, or suffer micro spiking. There should be little reason to use it for standard desktop applications.