OpenMP: Variables and Synchronisation

OpenMP 02: Variables and Synchronisation

Conceptual Focus

Parallelisation introduces multiple memory contexts. Each thread must know:

which data it can safely modify ( private ), and
which data must remain globally visible ( shared ).

Errors here → race conditions, false sharing, or incorrect results.

Variable Scope

Default behaviour

#pragma omp parallel default(shared) private(i)

or, safer:

#pragma omp parallel default(none) shared(n,a,b) private(i)

Always prefer default(none) — the compiler forces you to declare every variable’s scope explicitly.

Clause	Meaning
`shared(list)`	One storage location shared by all threads
`private(list)`	Each thread has its own uninitialised copy
`firstprivate(list)`	Private + initialised from master copy
`lastprivate(list)`	Private + value from last iteration copied back after region
`threadprivate(list)`	Global variable replicated per thread (avoid if possible)

Synchronisation Constructs

1. Barriers

All threads stop until everyone reaches the barrier.

Implicit at end of every parallel or for region.

#pragma omp barrier

Add nowait on loops to skip the implicit barrier only if you are 100 % sure there is no inter-iteration dependency.

2. Critical / Atomic / Single

#pragma omp critical
{ /* executed by one thread at a time */ }

#pragma omp atomic
sum += x[i];          // low-overhead for simple ops

#pragma omp single
{ initialise(); }      // exactly one thread executes

Parallel `for` Loops

Valid canonical form

for (i = start; i < end; i += step)

Only one entry & exit point.
No break allowed (use flags instead).
continue permitted.
Loop bounds and increment must be loop-invariant.

Ordered Execution

Normally, iteration output interleaves arbitrarily:

#pragma omp parallel for
for (int i = 0; i < n; ++i)
  printf("%d\n", i);   // nondeterministic order

To enforce sequential order:

#pragma omp parallel for ordered
for (int i = 0; i < n; ++i) {
  compute(i);
  #pragma omp ordered
  printf("%d\n", i);
}

Only one ordered block per iteration is allowed.

Which Loops Can Be Parallelised?

Pattern	Parallel-safe?	Why
`a[i] = a[i] + a[i-1];`	✗	dependency on previous iteration
`a[i] = a[i] + a[i+n/2];`	✓	independent memory locations
`a[idx[i]] = a[idx[i]] + b[idx[i]];`	✓ if `idx` is a permutation	otherwise race on repeated indices

Before adding #pragma omp parallel for, ensure each iteration is independent (no write–after-read or write–after-write conflicts).

`lastprivate` and `firstprivate`

lastprivate – carry last iteration’s value out

#pragma omp parallel for private(j) lastprivate(t)
for (int r = 0; r < r_max; ++r) {
  for (int j = 1; j < n; ++j)
    t[j] = t[j-1] * r;
  t_sum[r] = sum(t, n);
}

After the loop, the final t from iteration r_max – 1 is visible in the master thread.

firstprivate – copy master initial value in

t[0] = expensive_init();
#pragma omp parallel for private(j) firstprivate(t)
for (int r = 0; r < r_max; ++r) {
  for (int j = 1; j < n; ++j)
    t[j] = t[j-1] * r;
  t_sum[r] = sum(t, n);
}

Prevents recomputation of t[0] per thread.

Nested / Compound Loops

Only the outer loop is distributed unless collapse(n) is specified.

#pragma omp parallel for shared(m,n) private(i,j,x,y)
for (int i = 0; i < m; ++i)
  for (int j = 0; j < n; ++j)
    work(i,j);

Here:

m, n → shared
i, j, x, y → private

Summary

Topic	Key Takeaway
Variable scope	Declare explicitly; `default(none)` prevents silent bugs.
Synchronisation	Use barriers/critical/atomic/single to control access.
Loop structure	Must be canonical; ensure iteration independence.
Ordered regions	For deterministic output, but slows execution.
first/lastprivate	Initialise or propagate values across threads safely.

Practical Checkpoints for PX457 Labs

Compile with O3 -fopenmp -Wall -Wextra -std=c11.
Use default(none) and declare every variable’s scope.
Measure scaling with and without synchronisation to see cost.
Visualise false sharing or ordering effects with micro-benchmarks.
When in doubt, add reduction() instead of manual accumulation (Lecture 7 topic)