Managing Multi-core Performance

You can obtain best performance on systems with multi-core processors by requiring that threads do not migrate from core to core. To do this, bind threads to the CPU cores by setting an affinity mask to threads. Use one of the following options:

Consider the following performance issue:

The following code example shows how to resolve this issue by setting an affinity mask by operating system means using the Intel(R) compiler. The code calls the system function sched_setaffinity to bind the threads to the cores on different sockets. Then the Intel MKL FFT function is called:

        
#define _GNU_SOURCE //for using the GNU CPU affinity
// (works with the appropriate kernel and glibc)
// Set affinity mask
#include <sched.h>
#include <stdio.h>
#include <unistd.h>
#include <omp.h>
int main(void) {
	int NCPUs = sysconf(_SC_NPROCESSORS_CONF);
	printf("Using thread affinity on %i NCPUs\n", NCPUs);
#pragma omp parallel default(shared)
	{
		cpu_set_t new_mask;
		cpu_set_t was_mask;
		int tid = omp_get_thread_num();
		
		CPU_ZERO(&new_mask);
		
		// 2 packages x 2 cores/pkg x 1 threads/core (4 total cores)
		CPU_SET(tid==0 ? 0 : 2, &new_mask);
		
		if (sched_getaffinity(0, sizeof(was_mask), &was_mask) == -1) {
			printf("Error: sched_getaffinity(%d, sizeof(was_mask), &was_mask)\n", tid);
		}
		if (sched_setaffinity(0, sizeof(new_mask), &new_mask) == -1) {
			printf("Error: sched_setaffinity(%d, sizeof(new_mask), &new_mask)\n", tid);
		}
		printf("tid=%d new_mask=%08X was_mask=%08X\n", tid,
						*(unsigned int*)(&new_mask), *(unsigned int*)(&was_mask));
	}
	// Call Intel MKL FFT function
	return 0;
}
 
        

Compile the application with the Intel compiler using the following command:

icc test_application.c -openmp 

 

where test_application.c is the filename for the application.

Build the application. Run it in two threads, for example, by using the environment variable to set the number of threads:

env OMP_NUM_THREADS=2 ./a.out

See the Linux Programmer's Manual (in man pages format) for particulars of the sched_setaffinity function used in the above example.


Submit feedback on this help topic

Copyright © 2006 - 2010, Intel Corporation. All rights reserved.