Jorrit's Technobabble
HomeAboutContact

Introduction to Parallel Programming with OpenMP

By Jorrit Klein Bramel
Published in Programming
May 14, 2023
14 min read
Introduction to Parallel Programming with OpenMP

Table Of Contents

01
Introduction
02
Why Parallel Programming?
03
What is OpenMP?
04
Setting Up a Compilation Environment
05
The Basic Parallel Construct
06
Further Constructs in OpenMP
07
Clauses in OpenMP
08
Advanced OpenMP Constructs
09
Conclusion
10
Additional Resources

Introduction

The emergence of multicore processors and the subsequent proliferation of multicore systems have resulted in a paradigm shift in how we approach computation, emphasising the need for parallel programming1. This advancement has led to a significant boost in computational capabilities but also poses new challenges in terms of software development2.

To effectively harness the power of these multicore systems, developers have to design their programs such that computations are carried out simultaneously on different cores. This is where the concept of parallel programming comes into play3. With parallel programming, multiple computations can be performed concurrently, leading to faster execution times and increased program efficiency4.

Among the several paradigms for parallel programming, the shared-memory model has been particularly attractive for its simplicity and applicability to multicore systems5. Open Multi-Processing (OpenMP) is one such shared-memory parallel programming model, which has gained widespread adoption due to its ease of use and efficiency6. OpenMP offers an application programming interface (API) that facilitates the development of parallel programs for shared-memory architectures7.

This article post aims to provide an introduction to parallel programming using OpenMP. It covers a range of topics, from setting up the compilation environment to exploring the various constructs and clauses provided by OpenMP for parallel programming. By the end of this post, readers should have a solid understanding of OpenMP and be equipped with the knowledge to start writing their own parallel programs.

Why Parallel Programming?

As computer architectures have evolved, so too have the methods of programming these machines. In the era of single-processor computers, serial programming – executing one instruction after another – was the norm3. However, with the advent of multicore processors, this traditional approach became less efficient1.

Over two decades ago, in 2001, IBM introduced the world’s first multicore processor, the Power 4, marking a turning point in the computing industry. From this point forward, modern processors have continually increased the number of cores, and specialised hardware has an even higher number of cores8.

To illustrate, consider one of the login nodes of the ARCHER2 supercomputer0, which has two physical processors, each with 64 cores. Furthermore, each core has simultaneous multithreading (SMT) support, resulting in a total of 256 logical processors. The architecture of such powerful machines brings with it significant potential for increased performance. Still, it also poses new challenges in terms of software development and computational efficiency2.

We can retrieve the processor information on a Linux-based machine using the lscpu command-line utility:

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7742 64-Core Processor
CPU family: 23
Model: 49
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
NUMA:
NUMA node(s): 8

These multicore systems demand a different approach to programming - parallel programming9. When launching an application, the process will only have one main thread by default. To fully utilise all available logical processors, it’s crucial to employ an application programming interface that supports multithreaded programming7. Parallel programming allows computations to be performed simultaneously on different cores, maximising efficiency and reducing execution times4. Therefore, learning parallel programming has become a necessity for the modern software developer.

What is OpenMP?

OpenMP (Open Multi-Processing) is an application programming interface (API) that facilitates shared-memory multiprocessing programming in C, C++, and Fortran5. Introduced in the late 1990s, OpenMP was designed to standardise directive-based extensions for shared-memory parallelisation, becoming an industry standard6.

Developed collaboratively by a consortium of industry and academic organisations, OpenMP provides a high-level, portable, and scalable model for developers to write parallel applications for shared-memory architectures7. OpenMP’s power lies in its simplicity. It primarily relies on compiler directives, which are special annotations embedded in the source code. The compiler interprets these directives to generate parallelised code, thereby freeing the programmer from the intricacies of thread management10.

OpenMP’s directives, library routines, and environment variables allow developers to specify parallelism at a high level, making it a straightforward and efficient tool for creating parallel programs11. This ability to focus on the application logic rather than the details of thread management has led to the widespread adoption of OpenMP in high-performance computing12.

In essence, OpenMP is a critical tool for parallel programming on many CPU architectures and operating systems, supporting the C, C++, and Fortran programming languages. As such, understanding OpenMP is pivotal to harnessing the full power of modern multicore systems.

Setting Up a Compilation Environment

In order to run and test the parallel computing capabilities of OpenMP, a suitable compilation environment is necessary. This process encompasses the choice of an operating system, the installation of a compiler, and the subsequent verification of the environment with a simple program.

Choosing an Operating System

The choice of the operating system can significantly impact the development and performance of parallel applications. For the purpose of this article, we are covering both Pop!_OS 22.04, a Linux distribution by System76, and macOS. Pop!_OS, an Ubuntu-based Linux operating system, provides a robust environment for software development and is especially adept in handling compilers such as GCC or Clang. On the other hand, macOS, with its UNIX-based architecture, offers similar robustness for software development.

It is important to note that the subsequent steps and code examples can be replicated in any Ubuntu/Debian-based Linux distribution or macOS, given their similarity in structure and functionality.

Installing the Compiler

The next critical step is the installation of the compiler. A compiler is a program that transforms human-readable source code into machine code, executable by a computer13. In our case, we will be using the GNU Compiler Collection (GCC) due to its widespread use and excellent support for OpenMP.

On Ubuntu/Debian-based systems like Pop!_OS, the GCC compiler can be installed using the command below:

$ sudo apt install gcc

On macOS, you would use the Homebrew package manager to install the GCC compiler. If you haven’t installed Homebrew yet, you can do so by running the following command in the Terminal:

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Once Homebrew is installed, you can install GCC:

$ brew install gcc

Alternatively, Clang, another widely used compiler, can also be used for OpenMP programs. It may require an additional package (libomp-dev) to ensure full OpenMP support. To install Clang and the necessary OpenMP library, use the commands:

On Ubuntu/Debian-based systems:

$ sudo apt install clang libomp-dev

On macOS:

$ brew install llvm libomp

Testing the Environment with a Simple Program

With the operating system chosen and the compiler installed, it is now essential to verify the functionality of the environment. To this end, we will employ the well-known ”Hello, World!” program, albeit with a parallel computing twist.

#include <omp.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
int thread_number, number_of_threads;
// the private clause declares one or more list items to be private to a task or to a SIMD lane.
#pragma omp parallel private(thread_number, number_of_threads)
{
thread_number = omp_get_thread_num();
number_of_threads = omp_get_num_threads();
printf("Hola Mundo, from thread %d out of %d threads.\n", thread_number, number_of_threads);
}
return 0;
}

Just as the traditional “Hello, World!” this program prints a greeting message (in Spanish, because why not?). However, this program will spawn multiple threads and print the greeting message from each thread, illustrating the fork-join parallel execution model inherent in OpenMP.

Depending on the compiler installed and the operating system, the code can be compiled and executed with one of the following command sets:

For GCC on Ubuntu/Debian-based systems:

$ gcc -fopenmp hello.c -o hello
$ ./hello

For Clang on Ubuntu/Debian-based systems:

$ clang -fopenmp hello.c -o hello
$ ./hello

For GCC on macOS (if your gcc version is 13, replace ‘gcc’ with ‘gcc-13’):

$ gcc-13 -fopenmp hello.c -o hello
$ ./hello

For Clang on macOS (use ‘clang’ installed from llvm, for example ‘/opt/homebrew/Cellar/llvm/16.0.6/bin/clang’):

$ /opt/homebrew/Cellar/llvm/16.0.6/bin/clang -fopenmp hello.c -o hello
$ ./hello

After executing the above commands, the output should look similar to this:

$ ./hello
Hola Mundo, from thread 5 out of 12 threads.
Hola Mundo, from thread 0 out of 12 threads.
Hola Mundo, from thread 11 out of 12 threads.
Hola Mundo, from thread 3 out of 12 threads.
Hola Mundo, from thread 8 out of 12 threads.
Hola Mundo, from thread 1 out of 12 threads.
Hola Mundo, from thread 6 out of 12 threads.
Hola Mundo, from thread 10 out of 12 threads.
Hola Mundo, from thread 9 out of 12 threads.
Hola Mundo, from thread 4 out of 12 threads.
Hola Mundo, from thread 7 out of 12 threads.
Hola Mundo, from thread 2 out of 12 threads.

The output demonstrates that the environment is set up correctly, with OpenMP successfully creating a parallel region and spawning multiple threads.

In summary, setting up a compilation environment for OpenMP requires a suitable operating system, a suitable compiler, and an initial program to verify the setup. The chosen Pop_OS and macOS, along with GCC (or Clang), offer a reliable and robust environment for parallel programming with OpenMP.

The Basic Parallel Construct

The essence of parallel programming with OpenMP lies in its ability to distribute computations across multiple processor cores14. This is achieved with the use of a parallel construct, a foundational component of OpenMP that is defined by compiler directives and structured blocks10.

Introduction to the Parallel Construct

At the heart of OpenMP programming lies the concept of the parallel construct, which is used to specify regions of code to be executed in parallel by multiple threads5. Each parallel construct commences with the execution of a single thread (often referred to as the master thread) and leads to the creation of a team of threads, including the master.

#include <omp.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
#pragma omp parallel
{
printf("Hello, world!\n");
}
return 0;
}

The presence of the #pragma omp parallel statement indicates the start of a parallel region. The code within the subsequent structured block is executed by all threads in parallel. Thus, this code will print “Hello, world!” as many times as there are threads available.

Understanding Compiler Directives and Structured Blocks

The OpenMP programming model makes extensive use of compiler directives and structured blocks. A compiler directive in OpenMP starts with #pragma omp in C / C++ and with !$omp in Fortran, followed by the directive name10.

In the context of a parallel construct, the directive name is parallel, indicating that the block of code that follows should be executed in parallel. This block of code, enclosed in braces {...}, is referred to as a structured block. Every thread in the team executes this block independently of the others.

The compiler directives and structured blocks together form a construct that defines the parallel region14. In the case of the parallel construct, the #pragma omp parallel directive and the subsequent structured block collectively specify a region of code to be executed in parallel.

Fork-Join Parallel Execution Model

OpenMP employs a parallel execution model that hinges on the notion of “fork-join”, a structure integral to how multithreaded applications function5.

Explanation of the Fork-Join Model

In the context of OpenMP, the fork-join model is a scheme where the flow of the program forks off parallel threads at the beginning of parallel regions14. These multiple threads, created by the fork operation, run in parallel and execute the structured block associated with the parallel construct10.

After the completion of the parallel region, the threads join back together, resuming the single-threaded execution of the program. This model is essential for synchronising the threads and ensuring that all threads have completed their tasks before the program moves forward11.

The simplicity and predictability of the fork-join model make it an ideal choice for parallel programming, offering a balance between parallel execution and synchronisation5.

Role of the Initial Thread in OpenMP

When an OpenMP program commences, it starts as a single thread of execution, referred to as the initial thread10. As the initial thread encounters a parallel construct, it becomes the master of the new team of threads, which includes itself5.

The initial thread plays a crucial role as it carries out the main thread of execution, performs I/O operations, interacts with the operating system, and coordinates the creation and termination of parallel regions.

The fork-join model of execution underscores the importance of the initial thread, with its role dynamically changing from being a single executor to a coordinator for parallel execution11.

Further Constructs in OpenMP

OpenMP provides several constructs to allow developers to control the execution of code in a multithreaded environment. These constructs can be used in conjunction to implement complex parallel operations5.

Parallelising Loop Construct

The loop construct in OpenMP can be used to parallelise the execution of iterations of a loop across different threads10. This parallelisation can significantly increase the performance of programs that perform large computations within loops14. The syntax for the parallel loop construct is #pragma omp for.

#include <omp.h>
#include <stdio.h>
#define N 1000
int main() {
int a[N], b[N], c[N];
// initialise arrays a and b with some values
for (int i = 0; i < N; i++) {
a[i] = i;
b[i] = 2 * i;
}
// perform parallel addition of arrays a and b into c
#pragma omp parallel for
for (int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
// check the results
for (int i = 0; i < N; i++) {
if (c[i] != 3 * i) {
printf("Addition is incorrect. Element %d is %d but expected %d\n", i, c[i], 2 * i);
return -1;
}
}
printf("Addition is correct!\n");
return 0;
}

In this example, arrays a and b are initialised with specific values, and then a parallel for construct is used to perform the addition of a and b into c in parallel. After the parallel section, a check is performed to validate that the addition was performed correctly. As a side note, #pragma omp parallel for is a shortcut for specifying #pragma omp parallel and #pragma omp for.

Critical Construct

The critical construct in OpenMP is used to specify a section of code that must be executed by one thread at a time11. This construct is crucial for handling shared resources and avoiding race conditions, where multiple lines attempt to modify the same resource simultaneously5. The syntax for the critical construct is #pragma omp critical.

#include <omp.h>
#include <stdio.h>
#define N 1000
int main() {
int x = 0;
#pragma omp parallel for
for (int i = 0; i < N; i++) {
#pragma omp critical
{
x += i;
}
}
printf("x = %d\n", x);
return 0;
}

In this example, the #pragma omp critical directive ensures that the increment operation on x is done by one thread at a time, preventing a potential race condition.

Atomic Construct

Similar to the critical construct, the atomic construct allows one thread at a time to execute a specific memory operation14. This construct is used to prevent inconsistencies in read-modify-write operations and is generally used for simpler, single-statement operations10. The syntax for the atomic construct is #pragma omp atomic.

#include <omp.h>
#include <stdio.h>
#define N 1000
int main() {
int x = 0;
#pragma omp parallel for
for (int i = 0; i < N; i++) {
#pragma omp critical
{
x++;
}
}
printf("x = %d\n", x);
return 0;
}

In this example, #pragma omp atomic ensures that the increment operation on x is done atomically.

Flush Construct

The flush construct in OpenMP is used to ensure memory consistency between threads6. This construct can be used to prevent issues with memory caching, where one thread may not see the changes made to shared variables by another thread10. The syntax for the flush construct is #pragma omp flush.

#include <omp.h>
#include <stdio.h>
int main() {
int x = 0;
#pragma omp parallel
{
int tid = omp_get_thread_num();
if (tid == 0) {
x = 10;
#pragma omp flush(x)
} else {
#pragma omp flush(x)
printf("x = %d\n", x);
}
}
return 0;
}

In this example, the #pragma omp flush(x) directive is used to ensure that all threads see the updated value of x.

Ordered Construct

The ordered construct in OpenMP allows developers to control the order in which iterations of a parallel loop are executed14. While parallel loops generally execute iterations in any order across different threads, the ordered construct ensures that the iterations are executed in the same order as they would be in a serial loop5. The syntax for the ordered construct is #pragma omp ordered.

#include <omp.h>
#include <stdio.h>
#define N 10
int main() {
#pragma omp parallel for ordered
for (int i = 0; i < N; i++) {
#pragma omp ordered
printf("%d\n", i);
}
return 0;
}

In this example, the #pragma omp ordered directive ensures that the print operation is done in order, even though the loop iterations are done in parallel.

Clauses in OpenMP

OpenMP provides a variety of clauses to offer control over the execution of parallel constructs6. Clauses are added to the OpenMP directives to specify additional behaviour.

Collapse Clause

The collapse clause in OpenMP allows for specifying how many loops in a nested loop should be combined (or “collapsed”) into one larger loop to create a larger set of iterations that can be run in parallel14. This can provide more fine-grained control of the execution of the loops and better load balancing6. The syntax for the collapse clause is #pragma omp for collapse(n) where n is the number of loops to be collapse.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define N 10000
int main() {
/*
* in c and c++, when declaring arrays within functions, the memory for these arrays is allocated on the stack.
* however, the stack has a limited size, and creating large arrays can exceed this limit, causing a stack overflow and a segmentation fault.
* to overcome this, we can dynamically allocate memory on the heap using the `malloc` function.
* the heap generally has a much larger available memory space than the stack, allowing for larger data structures.
* after dynamically allocating these arrays, we perform our computations as usual.
* finally, freeing the allocated memory using the `free` function is important to avoid memory leaks.
*/
int **a, **b, **c;
// allocate memory
a = malloc(N * sizeof(int *));
b = malloc(N * sizeof(int *));
c = malloc(N * sizeof(int *));
for (int i = 0; i < N; i++) {
a[i] = malloc(N * sizeof(int));
b[i] = malloc(N * sizeof(int));
c[i] = malloc(N * sizeof(int));
}
// initialise arrays
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
a[i][j] = i + j;
b[i][j] = i - j;
}
}
// parallelise computation
#pragma omp parallel for collapse(2)
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
c[i][j] = a[i][j] + b[i][j];
}
}
// cleanup
for (int i = 0; i < N; i++) {
free(a[i]);
free(b[i]);
free(c[i]);
}
free(a);
free(b);
free(c);
return 0;
}

In this example, two-dimensional arrays a, b and c are declared and initialised. A collapse(2) clause is used to collapse the two nested loops into a single larger loop that is then parallelised. This results in all combinations of i and j being processed in parallel, which can lead to better load balancing and higher utilisation of the CPU cores. Furthermore, in this code, a, b, and c are pointers to pointers, and memory for these arrays is dynamically allocated on the heap. This allows us to handle much larger N values without causing a stack overflow.

Private Clause

The private clause in OpenMP declares one or more list items to be private to each thread, meaning each thread will have its own local copy6. This clause is used to prevent data race conditions by ensuring that threads do not overwrite each other’s data11. The syntax for the private clause is #pragma omp parallel private(var).

The firstprivate clause is similar to the private clause, but it also initialises each private copy with the original variable’s value14. The syntax for the firstprivate clause is #pragma omp parallel firstprivate(var).

The lastprivate clause also creates a private copy of a variable for each thread, but it also assigns the value of the sequentially last iteration to the original variable6. The syntax for the lastprivate clause is #pragma omp parallel lastprivate(var).

#include <omp.h>
#include <stdio.h>
#define N 1000
int main() {
int x = 1;
#pragma omp parallel for firstprivate(x) lastprivate(x)
for (int i = 0; i < N; i++) {
x = i;
}
printf("x = %d\n", x);
return 0;
}

In this example, x is made firstprivate, so each thread gets a copy of x that’s initialised with the original value of x. After the loop, the value of x in the main thread is updated with the value from the last iteration of the loop, due to the lastprivate clause.

Reduction Clause

The reduction clause in OpenMP allows for a reduction operation (such as addition or multiplication) on the variables that are private to each thread11. The result of the reduction is stored in a global variable accessible to all threads14. The syntax for the reduction clause is #pragma omp parallel reduction(operator:var).

#include <omp.h>
#include <stdio.h>
#define N 1000
int main() {
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; i++) {
sum += i;
}
printf("sum = %d\n", sum);
return 0;
}

In this example, the reduction(+:sum) clause is used to perform a summation across all threads, and the final result is stored in sum.

Schedule Clause

The schedule clause in OpenMP allows developers to specify how iterations of a loop should be distributed among the threads6. There are three scheduling methods available: static, dynamic, and guided11. The syntax for the schedule clause is #pragma omp parallel for schedule(type, chunk_size).

#include <omp.h>
#include <stdio.h>
#define N 10
int main() {
#pragma omp parallel for schedule(static, 2)
for (int i = 0; i < N; i++) {
printf("%d executed by thread %d\n", i, omp_get_thread_num());
}
return 0;
}

In this example, the schedule(static, 2) clause is used to distribute the iterations of the loop among the threads in chunks of 2.

Advanced OpenMP Constructs

Beyond the basic constructs, OpenMP provides advanced constructs that allow for even more complex and optimised parallel execution6.

Sections Construct

The sections construct in OpenMP provides a mechanism to assign different sections of code to different threads11. The sections are executed independently and in parallel, allowing tasks to be distributed based on the nature of the code14. The syntax for the sections construct is #pragma omp sections.

#include <omp.h>
#include <stdio.h>
void process(int i) {
printf("Processing %d\n", i);
}
int main() {
#pragma omp parallel sections
{
#pragma omp section
process(1);
#pragma omp section
process(2);
}
return 0;
}

In this example, two different sections of code are defined within the #pragma omp sections construct. Each section is processed by a separate thread.

Master and Single Construct

The master construct in OpenMP ensures that a section of the code is executed by the master thread only6. This is useful for operations that should only be done once and do not require parallelism11. The syntax for the master construct is #pragma omp master.

The single construct in OpenMP also ensures that a section of code is executed by one thread only, but it does not specify which thread14. The syntax for the single construct is #pragma omp single.

#include <omp.h>
#include <stdio.h>
void master_operation() {
printf("This operation is done by the master thread\n");
}
void single_operation() {
printf("This operation is done by a forked thread\n");
}
int main() {
#pragma omp parallel
{
#pragma omp master
master_operation();
#pragma omp single
single_operation();
}
return 0;
}

In this example, the master_operation function is executed only by the master thread, while single_operation is executed by one of the threads (not necessarily the master thread).

SIMD Construct

The SIMD (Single Instruction, Multiple Data) construct in OpenMP provides a method to specify that the loop should be vectorised6. This can result in significant performance improvements for loops where the iterations are independent and can be executed in parallel11. The syntax for the SIMD construct is #pragma omp simd.

#include <omp.h>
#include <stdio.h>
#define N 100
int main() {
int a[N], b[N], c[N];
// initialise arrays a and b with some values
for (int i = 0; i < N; i++) {
a[i] = i;
b[i] = 2 * i;
}
// perform vectorised operation
#pragma omp simd
for (int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
// check the results
for (int i = 0; i < N; i++) {
if (c[i] != 3 * i) {
printf("Addition is incorrect. Element %d is %d but expected %d\n", i, c[i], 2 * i);
return -1;
}
}
printf("Addition is correct!\n");
return 0;
}

In this example, a SIMD construct is used to vectorise the addition of two arrays.

Barrier Construct

The barrier construct in OpenMP is used to synchronise all threads at a certain point in the program14. Once a thread reaches the barrier, it waits until all other threads reach the same point before proceeding6. The syntax for the barrier construct is #pragma omp barrier.

#include <omp.h>
#include <stdio.h>
void process(int i) {
printf("Processing %d\n", i);
}
int main() {
#pragma omp parallel
{
process(1);
#pragma omp barrier
process(2);
}
return 0;
}

In this example, all threads synchronise at the barrier after processing the first task, ensuring that all threads have completed the first task before any thread starts the second task.

Task Construct

The task construct in OpenMP allows the developer to define explicit tasks that will be executed in parallel11. This can provide more flexibility and better load balancing compared to the simple parallel and for constructs14. The syntax for the task construct is #pragma omp task.

#include <omp.h>
#include <stdio.h>
#define N 10
void process(int i) {
printf("Processing %d\n", i);
}
int main() {
#pragma omp parallel
{
#pragma omp single
{
for (int i = 0; i < N; i++) {
#pragma omp task
process(i);
}
}
}
return 0;
}

In this example, several tasks are created inside a loop, each tasked with processing a different iteration of the loop. The tasks are created by a single thread but executed in parallel by multiple threads.

Taskloop Construct

The taskloop construct in OpenMP creates tasks from the iterations of a loop6. This can provide a more fine-grained control of the execution of the loop and better load balancing11. The syntax for the taskloop construct is #pragma omp taskloop.

#include <omp.h>
#include <stdio.h>
#define N 10
void process(int i) {
printf("Processing %d\n", i);
}
int main() {
#pragma omp parallel
{
#pragma omp single
{
#pragma omp taskloop
for (int i = 0; i < N; i++) {
process(i);
}
}
}
return 0;
}

In this example, the taskloop construct is used to create a task for each iteration of a loop. Each task is then executed in parallel.

Taskgroup Construct

The taskgroup construct in OpenMP allows the developer to group related tasks together14. The taskgroup construct ensures that all tasks in the group have completed before continuing6. The syntax for the taskgroup construct is #pragma omp taskgroup.

#include <omp.h>
#include <stdio.h>
void process(int i) {
printf("Processing %d\n", i);
}
int main() {
#pragma omp parallel
{
#pragma omp single
{
#pragma omp taskgroup
{
#pragma omp task
process(1);
#pragma omp task
process(2);
}
}
}
return 0;
}

In this example, two tasks are grouped together in a taskgroup construct. The taskgroup ensures that both tasks are completed before the program continues.

Conclusion

OpenMP offers a comprehensive yet accessible approach to parallel programming, enabling developers to optimise their applications for modern multicore and multiprocessor systems14. Through various constructs, clauses, and advanced constructs, programmers can control thread execution, parallelise loops, and effectively manage shared and private data. By understanding these constructs and their effective use, one can enhance the performance of computationally intensive applications significantly6.

However, parallel programming with OpenMP, like any other programming technique, requires practice and deep understanding. Even though OpenMP simplifies the process, the onus of implementing correct, efficient, and effective parallelism lies on the programmer11.

Additional Resources

The study of OpenMP doesn’t end here. Many resources can further deepen your understanding and provide hands-on experiences:

  • [1] Official OpenMP Documentation
    • The official documentation is a definitive guide for understanding the functionalities and syntax of OpenMP.
  • [2] R. Chandra, Ed., Parallel Programming in OpenMP. San Francisco, CA: Morgan Kaufmann Publishers, 2001.
    • This book is a comprehensive guide that delves deep into parallel programming concepts using OpenMP.
  • [3] B. Chapman, G. Jost, and R. van der Pas, Using OpenMP: Portable shared memory parallel programming. The MIT Press, 2007.
    • This book discusses advanced OpenMP topics and is suitable for those already familiar with the basics.

  1. K. Asanovic et al., ‘A view of the parallel computing landscape’, Commun. ACM, vol. 52, no. 10, pp. 56–67, Oct. 2009, doi: 10.1145/1562764.1562783.
  2. E. A. Lee, ‘The Problem with Threads’, Computer, vol. 39, no. 5, pp. 33–42, May 2006, doi: 10.1109/MC.2006.180.
  3. D. Padua, Encyclopedia of parallel computing. Springer Publishing Company, Incorporated, 2011.
  4. A. Grama, Ed., Introduction to parallel computing, 2nd ed. Harlow, England ; New York: Addison-Wesley, 2003.
  5. L. Dagum and R. Menon, ‘OpenMP: an industry standard API for shared-memory programming’, IEEE Comput. Sci. Eng., vol. 5, no. 1, pp. 46–55, Mar. 1998, doi: 10.1109/99.660313.
  6. R. Chandra, Ed., Parallel Programming in OpenMP. San Francisco, CA: Morgan Kaufmann Publishers, 2001.
  7. B. Chapman, G. Jost, and R. van der Pas, Using OpenMP: Portable shared memory parallel programming (scientific and engineering computation). The MIT Press, 2007.
  8. W. J. Dally and B. P. Towles, Principles and practices of interconnection networks. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2004.
  9. ARCHER2 is a state-of-the-art HPE Cray EX system with 5,860 compute nodes connected through a superfast HPE Cray Slingshot interconnect hosted at EPCC at the University of Edinburgh.
  10. M. Herlihy, N. Shavit, V. Luchangco, and M. Spear, The art of multiprocessor programming, Second edition. Cambridge, MA: Morgan Kaufmann, 2021.
  11. E. Ayguade et al., ‘The Design of OpenMP Tasks’, IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 3, pp. 404–418, Mar. 2009, doi: 10.1109/TPDS.2008.105.
  12. T. G. Mattson, B. A. Sanders, and B. Massingill, Patterns for parallel programming. Boston: Addison-Wesley, 2005.
  13. W. J. Dally and B. P. Towles, Principles and practices of interconnection networks. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2004.
  14. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools (2nd edition). USA: Addison-Wesley Longman Publishing Co., Inc., 2006.
  15. R. van der Pas, E. Stotzer, and C. Terboven, Using OpenMP—The Next Step: Affinity, Accelerators, Tasking, and SIMD. The MIT Press, 2017.

Tags

#openmp#parallel-programming#c

Share

Previous Article
Mathematics Behind RSA Cryptography
Jorrit Klein Bramel

Jorrit Klein Bramel

Software and Data Engineer

I caught on fire once while coding. Software, technology and data science enthusiast who unites his passions to build elegant and effective solutions for modern-day business challenges.

Expertise

Programming
Linux
Big Data
Dyslexia

Social Media

githublinkedin

Related Posts

From Serial to Supercomputer: Scaling Fourier Transforms with MPI and ARCHER2
From Serial to Supercomputer: Scaling Fourier Transforms with MPI and ARCHER2
January 12, 2025
14 min
© 2025, All Rights Reserved.
Crafted with by Jorrit

Quick Links

About UsContact Us

Social Media