Elemental functions are a general language construct to express a data parallel algorithm. An elemental function is written as a regular C/C++ function, and the algorithm within describes the operation on one element, using scalar syntax. The function can then be called as a regular C/C++ function to operate on an single element or it can be called in a data parallel context, providing many elements to operate on. In Intel® Cilk™ Plus, the data parallel context is provided as an array.
How Elemental Functions Work
When you write an elemental function, the Intel® compiler generates a short vector form of the function, which can perform your function's operation on multiple arguments in a single invocation. The short vector version may be able to perform multiple operations as fast as the regular implementation performs a single one by utilizing the vector ISA in the CPU. In addition, upon invocation of the function, if the data set is large enough, the compiler may assign different copies of the elemental functions to different workers, executing them concurrently. The end result is that your data parallel operation executes on the CPU utilizing both the parallelism available in the multiple cores and the parallelism available in the vector ISA.
Declaring an Elemental Function
In order for the compiler to generate the short vector function, you need to provide an indication in your code. Use the __declspec(vector) syntax, as follows:
__declspec (vector (optional clauses) return_type function_name (arguments)
Write the code inside the function using existing C/C++ syntax.
Invoking an Elemental Function with Parallel Context
Typically, the invocation of an elemental function provides arrays wherever scalar arguments are specified as formal parameters. Use the array notation syntax available in Intel® Cilk™ Plus to provide the arrays succinctly. Alternatively, you can invoke the function from a _Cilk_for loop.
The following example shows how to use elemental functions to add two large arrays and store the result in a third array, taking advantage of the parallelism available in both the cores and the vectors in the CPU:
Example |
---|
//declaring the function body __declspec (vector) double ef_add (double x, double y){ return x + y; } //invoking the function using array notations a[:] = ef_add(b[:],c[:]); //operates on the whole extent of the arrays a,b,c a[0:n:s] = ef_add(b[0:n:s],c[0:n:s]); //use the full array notation construct to also specify n as an extend and s as a stride //Use the _Cilk_for construct to invoke the elemental function in a data parallel context _Cilk_for (j = 0; j < n; ++j) { a[j] = ef_add(b[j],c[j]) } |
Only the calling code using the _Cilk_for calling syntax, is able to use all available parallelism. The array notation syntax, as well as calling the elemental function from the regular for loop, results in invoking the short vector function in each iteration and utilizing the vector parallelism, but the invocation is done in a serial loop, without utilizing multiple cores.
Limitations
The following language constructs are disallowed within elemental functions:
Loops, in particular the keywords for, while, do, goto
switch statements
Operations on classes and structs (other than member selection)
Function calls to non-elemental functions
_Cilk_spawn
Expressions with array notations
Copyright © 1996-2010, Intel Corporation. All rights reserved.