Jabir Hussain
Conceptual Overview
Why OpenMP?
- Modern CPUs no longer get much faster per core → the only route to higher throughput is parallelism.
- OpenMP (“Open Specifications for Multi-Processing”) is a shared-memory parallel programming API that:
- adds compiler directives (e.g.
#pragma omp parallel), - exposes runtime library functions (
omp_get_num_threads()), - and uses environment variables (
OMP_NUM_THREADS) to control execution.
- adds compiler directives (e.g.
Used mainly for multi-core CPUs, it complements MPI (distributed memory) and CUDA (GPUs).
Programming Model
| Concept | Meaning |
|---|---|
| Master thread | The single thread that begins execution. |
| Fork | Creation of a team of threads at a parallel region. |
| Join | Synchronisation where threads finish and control returns to the master. |
| Thread | The smallest independent sequence of instructions scheduled by the OS. |
Execution alternates between serial regions (one thread) and parallel regions (multiple threads):
Serial → Fork → Parallel region → Join → Serial → …
Core Compiler Directives
In C / C++
#pragma omp parallel default(shared) private(x, y)
In Fortran
!$omp parallel default(shared) private(x, y)
!$omp end parallel
- Lines beginning with
#pragma omp(C) or!$omp(Fortran) are ignored by compilers that lack OpenMP support — meaning the code still runs sequentially. - Directives apply to the next structured block (
{...}in C,do/end dopair in Fortran).
Example – Parallel Hello World
#include <omp.h>
int main() {
int nthreads, tid;
#pragma omp parallel private(tid)
{
tid = omp_get_thread_num();
printf("Hello world from thread %d\n", tid);
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
}
}
Key runtime calls:
omp_get_thread_num()→ returns current thread IDomp_get_num_threads()→ returns team size
Controlling the Number of Threads
Three equivalent mechanisms:
-
Environment variable:
export OMP_NUM_THREADS=8 -
Library call inside program:
omp_set_num_threads(8); -
Clause on directive:
#pragma omp parallel num_threads(8)
Default = number of physical cores.
Parallel Regions vs Work-Sharing
| Directive | Effect | Example Output |
|---|---|---|
#pragma omp parallel |
All threads execute the entire block. Work is replicated. | P × N messages |
#pragma omp parallel for |
Threads divide loop iterations among themselves. Work is shared. | N messages total |
#pragma omp parallel for
for (int i=0; i<10; i++)
printf("Hello world %d\n", i);
Each iteration is independent ⇒ safe for concurrent execution.
With for/do, loop index i is private by default.
Conditional Compilation
To compile OpenMP-specific code only when enabled:
#ifdef _OPENMP
printf("Compiled with OpenMP\n");
#endif
Summary
| Topic | Key Idea |
|---|---|
| Motivation | Serial CPU performance has plateaued — need thread-level parallelism. |
| Model | Fork–Join, shared-memory threads. |
| API Components | Compiler directives, runtime library, environment variables. |
| Work-sharing | parallel for/do splits independent iterations across threads. |
| Thread control | OMP_NUM_THREADS, omp_set_num_threads(), num_threads() clause. |
| Compilation safety | #ifdef _OPENMP guards. |
Takeaway for PX457 Assignments
When benchmarking, you should:
- Compile with
fopenmp(GCC/Clang) or/openmp(MSVC). - Verify scaling by timing loops with different
OMP_NUM_THREADS. - Check for data dependencies — OpenMP will not protect you from race conditions.
- Use
private,shared,reduction, andscheduleclauses properly when analysing loop performance.