User-mandated or SIMD Vectorization

User-mandated or SIMD vectorization supplements automatic vectorization just like OpenMP parallelization supplements automatic parallelization. The following figure illustrates this relationship. User-mandated vectorization is implemented as a single-instruction-multiple-data (SIMD) feature and is referred to as SIMD vectorization.

The SIMD vectorization feature is available for both Intel® microprocessors and non-Intel microprocessors. Vectorization may call library routines that can result in additional performance gain on Intel microprocessors than on non-Intel microprocessors. The vectorization can also be affected by certain options, such as /arch or /Qx (Windows) or -m or -x (Linux and Mac OS X).

The following figure illustrates how SIMD vectorization is positioned among various approaches that you can take to generate vector code that exploits vector hardware capabilities. The programs written with SIMD vectorization are very similar to those written using auto-vectorization hints. You can use SIMD vectorization to minimize the amount of code changes that you may have to go through in order to obtain vectorized code.

SIMD vectorization uses the !DIR$ SIMD directive to effect loop vectorization. You must add the directive to a loop and recompile for the loop to get vectorized (the options –Qsimd [on Windows* OS] or –simd [on Linux* OS] are enabled by default).

Consider an example in Fortran where the compiler does not automatically vectorize the loop due to the unknown data dependence distance "X". You can use the data dependence assertion via the auto-vectorization hint, !DIR$ IVDEP, to let the compiler decide to vectorize the loop or not, or you can enforce vectorization of the loop using !DIR$ SIMD.

Example: without !DIR$ SIMD
[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) DO I=X+1, N A(I) = A(I) + A(I-X) ENDDO end
Command line entry and output
[D:/simd] ifort example1.f -nologo -Qvec-report2 D:\simd\example1.f(6): (col. 9) remark: loop was not vectorized: existence of vector dependence.
Example: with !DIR$ SIMD
[D:/simd] cat example1.f subroutine add(A, N, X) integer N, X real A(N) !DIR$ SIMD DO I=X+1, N A(I) = A(I) + A(I-X) ENDDO end
Command line entry and output
[D:/simd] ifort example1.f -nologo -Qvec-report2 -Qsimd D:\simd\example1.f(7): (col. 9) remark: LOOP WAS VECTORIZED.

The one big difference between using the SIMD directive and auto-vectorization hints is that with the SIMD directive, the compiler generates a warning when it is unable to vectorize the loop. With auto-vectorization hints, actual vectorization is still under the discretion of the compiler, even when you use the !DIR$ VECTOR ALWAYS hint.

The SIMD directive has five optional clauses to guide the compiler on how vectorization must proceed. Use these clauses appropriately so that the compiler obtains enough information to generate correct vector code. For more information on the clauses, see the !DIR$ SIMD description.

Additional Semantics

Note the following points when using !DIR$ SIMD directive.

A variable may belong to at most one of private, linear, or reduction (or none of them).
Within the vector loop, an expression is evaluated as a vector value if it is private, linear, reduction, or it has a sub-expression that is evaluated to a vector value. Otherwise, it is evaluated as a scalar value (that is, broadcast the same value to all iterations). Scalar value does not necessarily mean loop invariant, although that is the most frequently seen usage pattern of scalar value.
A vector value may not be assigned to a scalar L-value. It is an error.
A scalar L-value may not be assigned under a vector condition. It is an error.
The computed GOTO statement is not supported.

Using vector Declaration

Consider the following C++ example code with a loop containing the math function, sinf(). All code examples in this section are applicable for the Windows* operating system only.

Example: Loop with math function is auto-vectorized
[D:/simd] cat example2.c void vsin(float restrict a, float restrict b, int n){ int i; for (i=0; i<n; i++) { a[i] = sinf(b[i]); } }
[D:/simd] icl example2.c –nologo –O3 -Qvec-report2 -Qrestrict example2.c D:\simd\example2.c(3): (col. 3) remark: LOOP WAS VECTORIZED.

Example: Loop with math function is auto-vectorized

[D:/simd] cat example2.c

void vsin(float *restrict a, float *restrict b, int n){

int i;

for (i=0; i<n; i++) {

  a[i] = sinf(b[i]);

Command-line entry and output

[D:/simd] icl example2.c –nologo –O3 -Qvec-report2 -Qrestrict

example2.c

D:\simd\example2.c(3): (col. 3) remark: LOOP WAS VECTORIZED.

When you compile the above code, the loop with sinf() function is auto-vectorized using the appropriate SVML library function provided by the Intel compiler. The auto-vectorizer identifies the entry points, matches up the scalar math library function to the SVML function and invokes it.

However, within this loop if you have a call to your function, foo() that has the same prototype as sinf(), the auto-vectorizer fails to vectorize the loop because it does not know what foo() does unless it is inlined to this call site.

Example: Loop with user-defined function is NOT auto-vectorized
[D:/simd] cat example2.c float foo(float); void vfoo(float restrict a, float restrict b, int n){ int i; for (i=0; i<n; i++){ a[i] = foo(b[i]); } }
[D:/simd] icl example2.c -nologo -O3 -Qvec-report2 -Qrestrict example2.c D:\simd\example2.c(4): (col. 3) remark: loop was not vectorized: existence of vector dependence.

Example: Loop with user-defined function is NOT auto-vectorized

[D:/simd] cat example2.c

float foo(float);

void vfoo(float *restrict a, float *restrict b, int n){

 int i;

 for (i=0; i<n; i++){

  a[i] = foo(b[i]);

Command-line entry and output

[D:/simd] icl example2.c -nologo -O3 -Qvec-report2 -Qrestrict

example2.c

D:\simd\example2.c(4): (col. 3) remark: loop was not vectorized: existence of vector dependence.

In such cases, you can use the !DIR$ attributes vector::function-name-list declaration to vectorize the loop. All you need to do is add the vector declaration to the function declaration, and recompile both the caller and callee code. The loop and function are vectorized.

Example: Loop with user-defined function with vector declaration is vectorized
[D:/simd] cat example3.c // foo() and vfoo() do not have to be in the same compilation unit as long //as both see the same declspec. __declspec(vector) float foo(float); void vfoo(float restrict a, float restrict b, int n){ int i; for (i=0; i<n; i++) { a[i] = foo(b[i]); } } float foo(float x) { ... }
[D:/simd] icl example3.c -nologo -O3 -Qvec-report3 –Qrestrict example3.c D:\simd\example3.c(9): (col. 3) remark: LOOP WAS VECTORIZED D:\simd\example3.c(14): (col. 3) remark: FUNCTION WAS VECTORIZED

Example: Loop with user-defined function with vector declaration is vectorized

[D:/simd] cat example3.c

// foo() and vfoo() do not have to be in the same compilation unit as long

//as both see the same declspec.

__declspec(vector)

float foo(float);

void vfoo(float *restrict a, float *restrict b, int n){

int i;

for (i=0; i<n; i++) {

  a[i] = foo(b[i]);

float foo(float x) {

...

Command-line entry and output

[D:/simd] icl example3.c -nologo -O3 -Qvec-report3 –Qrestrict

example3.c

D:\simd\example3.c(9): (col. 3) remark: LOOP WAS VECTORIZED

D:\simd\example3.c(14): (col. 3) remark: FUNCTION WAS VECTORIZED

Object-level compatibility considerations: The functions vfoo() and foo() do not have to reside in the same compilation unit. However, if vfoo() is compiled with the vector declaration, foo() also has to be compiled with the same declaration because the vectorization of vfoo() creates a function call to some_mangled_name_for_vectorized_foo(), and the compilation of foo() has to provide the new function, some_mangled_name_for_vectorized_foo(), in addition to the original foo() itself.

Passing multi-dimensional arrays: You can pass a multi-dimensional array to a vector declared function. When the corresponding parameter is classified as scalar, a single array is passed for that parameter. If it is classified as private, multiple arrays are passed, just as if the original scalar function is called multiple times for the consecutive iterations. For a parameter that is linear classified, you must use an additional step. The array parameter A must have defined semantics for A+step. For example, for a vector length of 4, the callee sees [A, A+step, A+2*step, A+3*step]. If the parameter doesn’t have defined semantics for +step, classifying it as linear is a syntax error.

Example: Passing multi-dimensional array to a vector declared function
__declspec(vector(linear(a:1))) float foo(float a) { return a; } __declspec(vector(linear(a:1))) float foo1(float a) { return a; } void vfoo(float *X[]) { float A[100][100]; for (i=0; i<100; i++) { for (j=0; j<100; j++) { foo(A+j); foo1(X+j); } } }

Example: Passing multi-dimensional array to a vector declared function

__declspec(vector(linear(a:1)))

float foo(float *a) {

  return *a;

__declspec(vector(linear(a:1)))

float foo1(float **a) {

  return **a;

void vfoo(float *X[]) {

  float A[100][100];

  for (i=0; i<100; i++) {

    for (j=0; j<100; j++) {

      foo(A+j);

      foo1(X+j);

Calling vector declaration under a condition: The following example code illustrates the use of the vector declaration with the mask clause. The mask clause is used when the vector declaration is called under a condition. In the code below, the function fib() is called in the main() function and also recursively called in the function fib() under an if condition. The compiler creates a masked vector version and a non-masked vector version for the function fib() while retaining the original scalar version of the fib() function.

Example: Using vector declaration with mask clause
#include <stdio.h> #include <stdlib.h> #define N 45 int a[N], b[N], c[N]; // “mask” clause needs to be used when the vector function is // called under a condition. Function fib() is called in the main() function and // also recursively called in the fib() function under an if-condition. // The compiler creates masked and non-maksed vector versions for function fib() //while keeping the original scalar version of fib(). __declspec(vector(mask)) int fib(int n){ if (n <= 2) return n; else { return fib(n-1) + fib(n-2); } } int main(int argc, char *argv[]){ int i; for (i=0; i < N; i++) b[i] = i; #pragma simd for (i=0; i < N; i++) { a[i] = fib(b[i]); // after vectorization, non-masked vector fib() is called } printf("Done a[%d] = %d\n", N-1, a[N-1]); }
[D:/simd] icl example3.c -nologo -O3 -Qvec-report3 -Qrestrict example3.c D:\SIMD\vfib_example.c(20) (col. 3): remark: LOOP WAS VECTORIZED. D:\SIMD\vfib_example.c(23) (col. 3): remark: SIMD LOOP WAS VECTORIZED. D:\SIMD\vfib_example.c(9) (col. 1): remark: FUNCTION WAS VECTORIZED. D:\SIMD\vfib_example.c(9) (col. 1): remark: FUNCTION WAS VECTORIZED.

Example: Using vector declaration with mask clause

#include <stdio.h>

#include <stdlib.h>

#define N 45

int a[N], b[N], c[N];

// “mask” clause needs to be used when the vector function is

// called under a condition. Function fib() is called in the main() function and

// also recursively called in the fib() function under an if-condition.

// The compiler creates masked and non-maksed vector versions for function fib()

//while keeping the original scalar version of fib().

__declspec(vector(mask))

int fib(int n){

  if (n <= 2) return n;

    else {

      return fib(n-1) + fib(n-2);

int main(int argc, char *argv[]){

  int i;

  for (i=0; i < N; i++) b[i] = i;

  #pragma simd

  for (i=0; i < N; i++) {

    a[i] = fib(b[i]);

    // after vectorization, non-masked vector fib() is called

  printf("Done a[%d] = %d\n", N-1, a[N-1]);

Command-line entry and output

[D:/simd] icl example3.c -nologo -O3 -Qvec-report3 -Qrestrict example3.c

D:\SIMD\vfib_example.c(20) (col. 3): remark: LOOP WAS VECTORIZED.

D:\SIMD\vfib_example.c(23) (col. 3): remark: SIMD LOOP WAS VECTORIZED.

D:\SIMD\vfib_example.c(9) (col. 1): remark: FUNCTION WAS VECTORIZED.

D:\SIMD\vfib_example.c(9) (col. 1): remark: FUNCTION WAS VECTORIZED.

Restrictions for Using vector Declaration

Vectorization depends on two major factors: hardware and the style of source code. For the current implementation of the vector declaration, there are certain restrictions that apply when using the vector declaration. The following features are not allowed:

Thread creation and joining through _Cilk_spawn, _Cilk_for, OpenMP* parallel/for/sections/task, and explicit threading API calls
Using setjmp, longjmp, EH, SEH
Inline ASM code and VML
Calling non-vector functions (note that all SVML functions are considered vector functions)
Locks, barriers, atomic construct, critical sections (presumably this is a special case of the previous one).
Goto statements
Intrinsics (for example, SVML intrinsics)
Function call through function pointer and virtual function
Any loop/array notation constructs
Struct access
The computed GOTO statement is not supported

Formal parameters must be the following data types:

(un)signed 8, 16, 32, or 64-bit integer
32 or 64-bit floating point
64 or 128-bit complex
a pointer (C++ reference is considered a pointer data type)

User-mandated or SIMD Vectorization

Additional Semantics

Using vector Declaration

Restrictions for Using vector Declaration

See Also