Data race

The data usage pattern in an OpenMP* parallel region creates a data race.

A data race occurs when two threads access the same memory without proper synchronization. This can cause the program to produce non-deterministic results in parallel mode. In OpenMP*, loops are parallelized by assigning different loop iterations to different threads. This strategy requires the loop iterations to be independent so they can run in any order.

There are many ways that race conditions can be created. The simplest way is to write to a shared variable in a parallel region. In this case, the final value of the variable depends on which iteration writes last. In sequential mode, this is always the last iteration, but in parallel mode, this is not guaranteed.

Another common cause of race conditions is a so-called loop carried data dependency. This refers to a case where a value written in one loop iteration is read or written by another iteration. Put another way, a variable is read in a loop iteration without having been previously written in that same iteration. Usually this occurs when an array is indexed improperly. For example, if "x" is a loop counter and "a" is an array, then the value of "a[x]" is different in every loop iteration. Therefore, a loop can write "a[x]" without creating a loop carried dependency. However, if that loop also reads or writes "a[x+1]", then we do have a loop carried dependency because "a[x]" in one loop iteration is the same as "a[x + 1]" in the previous iteration.

There are many possible solutions to race conditions, and it is up to you to decide the best solution. Not all sequential algorithms can be run in parallel or be parallelized. Sometimes it is possible to change the algorithm to group the units of work in a different way that can be parallelized.

Beyond changing the algorithm, there are two main solutions to data race conditions. The first solution is the copy-in, copy-out or "privatization" approach. That is, each thread copies the value of the shared variable into a thread-private variable. Each thread can then read and write their thread-private variable without fear of interaction. At the bottom of the parallel region, the private copies can be merged back into the shared variable. OpenMP supports a number of data sharing constructs, such as REDUCTION, PRIVATE, THREADPRIVATE, FIRSTPRIVATE, and LASTPRIVATE.

The copy-in, copy-out strategy does not work in every situation. The OpenMP REDUCTION clause provides a number of ways to accumulate the separate private variables back into the shared copy. For example, they can all be added, or a maximum/minimum computed and so on. If the algorithm needed to properly combine the values at the copy-out step is not regular, then it may not be possible to use the available OpenMP facilities to join the values together.

The final strategy is to employ synchronization to ensure that the variable is accessed in the right order. This can have a performance impact because threads can no longer execute freely in parallel; they will have to spend time waiting for their turn. For example, it is possible to constrain parts of parallel loops to execute in strict iteration order using the ORDERED construct.

Sometimes data dependencies can be resolved by splitting a loop into two loops. For example, suppose we have a loop where "x" is the loop counter and we are accessing an array "a". The loop contains one statement that assigns to "a[x]" and a later statement that reads from "a[x - 1]". This creates a data race, because there is no guarantee that the "i-1"-st loop iteration will complete before the "i"-th iteration in parallel mode. However, if the second statement can be moved to a second loop, then the problem is resolved.

ID	Observation	Description
2-N	Bad memory access	The place a variable was read or written that contributed to the race condition

Example


#include <stdio.h>
#include <omp.h>

void do_work1()
{
    int i, sum = 0;
    int a[100]; 

    for (i = 1; i < 100; i++) {
       a[i] = i;
    }
 
// The loop-carried data dependency on "a" in the following loop
// means the iterations cannot be run safely in parallel.
// In sequential mode, the i-th array element is only modified
// in the i-th iteration.  Therefore "a[i + 1]" is never modified
// by a previous loop iteration, and will contain the value
// assigned earlier: "i + 1".  This is not true in parallel mode.
// For example, suppose the i = 10 iteration runs before the
// i = 9 iteration.  Then the statement a[i] = a[i] + 1; in
// iteration 10 will set a[10] to 11 before the statement
// sum = sum + a[i + 1] in iteration 9 is executed.
// When it is, it will add 11 to sum instead of 10.
// This is not what happens in sequential mode.
        
#pragma omp parallel for reduction(+:sum)
    for (i = 1; i < 99; i++) {
        a[i] = a[i] + 1;
        sum = sum + a[i + 1];
    }

    printf("%d\n", sum); 
}

void do_work2()
{
    int i, sum = 0;
    int a[100], b[100];

    for (i = 1; i < 100; i++) {
        a[i] = i;
        b[i] = 100 - i;
    }

// The following loop demonstrates a "possible" data dependency.
//
// A data dependency exists if the same element of an array
// is written on one loop iteration and read on another or
// written on two different loop iterations.  This loop writes
// the "a[i]-th" element of "b" and reads the "i-th" element.
// Therefore a write-write dependency exists if and only if
// "a[i]" in one iteration is equal to "a[i]" in another iteration.
// A read-write dependency exists if and only if "a[i]" in one
// iteration is equal to "i" in another iteration.
//
// These facts are hard to determine statically so a "possible data
// dependency error will be issued.  In this program this is a
// false positive because the previous loop sets "a[i]" equal to "i".
// Therefore "a[i]" has a different value on every iteration so no
// write-write dependency.  Also "a[i]" and "i" have different values on
// different loop iterations so no read-write dependency exists either.

#pragma omp parallel for reduction(+:sum)
    for (i = 1; i < 100; i++) {
        b[a[i]] = i;
        sum = sum + b[i];
    }
    printf("%d\n", sum);
}

int main(int argc, char **argv)
{
    do_work1();
    do_work2();
    return 0;
}