You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: src/intro-to-parallel-comp/challenges.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
Make sure to clone a copy of **your** challenges repo onto M3, ideally in a personal folder on vf38_scratch.
6
6
7
-
> Note: For every challenge you will be running the programs as SLURM jobs. This is so we don't overload the login nodes. A template [SLURM job script](./job.slurm) is provided at the root of this directory which you can use to submit your own jobs to SLURM by copying it to each challenges sub-directory and filling in the missing details. You may need more than one for some challenges. This template will put the would-be-printed output in a file named `slurm-<job-name>.out`.
7
+
> Note: For every challenge you will be running the programs as SLURM jobs. This is so we don't overload the login nodes. A template [SLURM job script](https://github.com/MonashDeepNeuron/HPC-Training-Challenges/blob/main/challenges/distributed-computing/job.slurm) is provided at the root of this directory which you can use to submit your own jobs to SLURM by copying it to each challenges sub-directory and filling in the missing details. You may need more than one for some challenges. This template will put the would-be-printed output in a file named `slurm-<job-name>.out`.
In this context, you can consider a query to be a job that carries out a series of steps on a particular dataset in order to achieve something e.g. a SORT query on a table. It's fairly straightforward to execute multiple queries at the same time using a parallel/distributed system but what if we want to parallelise and speed up the individual operations within a query?
7
+
In this context, you can consider a query to be a job that carries out a series of steps on a particular input in order to achieve something e.g. a SORT query on a table. It's fairly straightforward to execute multiple queries at the same time using a parallel/distributed system but what if we want to parallelise and speed up the individual operations within a query?
8
8
9
9
This is where things like synchronisation, data/workload distribution and aggregation needs to be considered. In this chapter we will provide some theoretical context before learning how to implement parallelism using OpenMP & MPI.
Copy file name to clipboardexpand all lines: src/intro-to-parallel-comp/message-passing.md
+11-6
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# Message Passing
2
2
3
-
As each processor has its own local memory with its own address space in distributed computing, we need a way to communicate between the processes and share data. Message passing is the mechanism of exchanging data across processes. Each process can communicate with one or more other processes by sending messages over a network.
3
+
As each processor has its own local memory with its own address space in distributed computing, we need a way to implement communication between the distributed processes and allow data sharing. Message passing is the mechanism of exchanging data between processes. Each process can communicate with one or more other processes by sending messages over a network.
4
4
5
-
The MPI (message passing interface) in OpenMPI is a communication protocol standard defining message passing between processors in distributed environments and are implemented by different groups with the main goals being high performance, scalability, and portability.
5
+
The MPI (message passing interface) in OpenMPI is a communication protocol standard defining message passing between processors in distributed environments. The main goals of this protocol standard is high performance, scalability, and portability.
6
6
7
7
OpenMPI is one implementation of the MPI standard. It consists of a set of headers library functions that you call from your program. i.e. C, C++, Fortran etc.
8
8
@@ -125,9 +125,13 @@ int MPI_Finalize(void);
125
125
126
126
```
127
127
128
-
Use man pages to find out more about each routine
128
+
Terminology:
129
+
- **World Size**: The total no. of processes involved in your distributed computing job.
130
+
- **Rank**: A unique ID for a particular process.
129
131
130
-
When sending a Process it packs up all of its necessary data into a buffer for the receiving process. These buffers are often referred to as envelopes since the data is being packed into a single message before transmission (similar to how letters are packed into envelopes before transmission to the post office)
132
+
> Use OpenMPI man pages to find out more about each routine
133
+
134
+
When sending data to a process, it packs up all of its necessary data into a buffer for the receiving process. These buffers are often referred to as envelopes since the data is being packed into a single message before transmission (similar to how letters are packed into envelopes before transmission to the post office)
131
135
132
136
### Elementary MPI Data types
133
137
@@ -257,8 +261,9 @@ The command top or htop looks into a process. As you can see from the image belo
257
261
- The command ```time``` checks the overall performance of the code
258
262
- By running this command, you get real time, user time and system time.
259
263
- Real is wall clock time - time from start to finish of the call. This includes the time of overhead
260
-
- User is the amount of CPU time spent outside the kernel within the process
261
-
- Sys is the amount of CPU time spent in the kernel within the process.
264
+
- User is the amount of CPU time spent outside the kernel within the process
265
+
- Sys is the amount of CPU time spent in the kernel within the process.
262
266
- User time +Sys time will tell you how much actual CPU time your process used.
Copy file name to clipboardexpand all lines: src/intro-to-parallel-comp/multithreading.md
+15-21
Original file line number
Diff line number
Diff line change
@@ -1,32 +1,32 @@
1
1
# Multithreading
2
2
3
-
We have all looked at the theory of threads and concurrent programming in the Operating System chapter. Now, we will shift our focus to OpenMP and its application for executing multithreaded operations in a declarative programming style.
3
+
Hopefully by now you are all familiar with multi-threading and how parallel computing works. We'll now go through how to implement parallel computing using OpenMP in order to speed up the execution of our C programs.
4
4
5
5
## OpenMP
6
6
7
-
OpenMP is an Application Program Interface (API) that is used to explicitly direct multi-threaded, shared memory parallelism in C/C++ programs. It is not intrusive on the original serial code in that the OpenMP instructions are made in pragmas interpreted by the compiler.
8
-
9
-
> Further features of OpenMP will be introduced in conjunction with the concepts discussed in later sub-chapters.
7
+
OpenMP is an Application Program Interface (API) that is used to implement multi-threaded, shared memory parallelism in C/C++ programs. It's designed to be a very minimal add-on to serial C code when it comes to implementation. All you have to do is use the `#pragma` (C preprocessor directives) mechanism to wrap the parallel regions of your code.
10
8
11
9
### Fork-Join Parallel Execution Model
12
10
13
-
OpenMP uses the `fork-join model` of parallel execution.
11
+
OpenMP uses the *fork-join model* of parallel execution.
14
12
15
-
***FORK**: All OpenMP programs begin with a `single master thread` which executes sequentially until a `parallel region` is encountered, when it creates a team of parallel threads.
13
+
***FORK**: All OpenMP programs begin with a *single master thread* which executes sequentially until a *parallel region* is encountered. After that, it spawns a *team of threads* to carry out the multi-threaded parallel computing.
16
14
17
-
The OpenMP runtime library maintains a pool of threads that can be added to the threads team in parallel regions. When a thread encounters a parallel construct and needs to create a team of more than one thread, the thread will check the pool and grab idle threads from the pool, making them part of the team.
15
+
The OpenMP runtime library maintains a pool of potential OS threads that can be added to the thread team during parallel region execution. When a thread encounters a parallel construct (pragma directive) and needs to create a team of more than one thread, the thread will check the pool and grab idle threads from the pool, making them part of the team.
18
16
19
-
***JOIN**: Once the team threads complete the parallel region, they `synchronise` and return to the pool, leaving only the master thread that executes sequentially.
17
+
This speeds up the process of thread spawning by using a *warm start* mechanism to minimise overhead associated with the kernel scheduler context switching needed to conduct thread spawning.
20
18
21
-

19
+
> If you're unclear how the kernel scheduler context switching works, revisit the operating systems chapter and explore/lookup the topics introduced there.
20
+
21
+
***JOIN**: Once the team of threads complete the parallel region, they **synchronise** and return to the pool, leaving only the master thread that executes sequentially.
22
22
23
-
> We will look a bit more into what is synchronisation as well as synchronisation techniques in the next sub-chapter.
23
+

24
24
25
25
### Imperative vs Declarative
26
26
27
-
Imperative programming specifies and directs the control flow of the program. On the other hand, declarative programming specifies the expected result and core logic without directing the program's control flow.
27
+
Imperative programming specifies and directs the control flow of the program. On the other hand, declarative programming specifies the expected result and core logic without directing the program's control flow i.e. you tell the computer what to do instead of *how to do it*.
28
28
29
-
OpenMP follows a declarative programming style. Instead of manually creating, managing, synchronizing, and terminating threads, we can achieve the desired outcome by simply declaring it using pragma.
29
+
OpenMP follows a declarative programming style. Instead of manually creating, managing, synchronizing, and terminating threads, we can achieve the desired outcome by simply declaring pragma directives in our code.
Here is a template script provided in the home directory in M3. Notice that we can dynamically change the number of threads using `export OMP_NUM_THREADS=12`
55
+
Here is a template script provided in the home directory in M3. Notice that we can dynamically change the number of threads using `export OMP_NUM_THREADS=12`.
56
+
57
+
> The `export` statement is a bash command you can type into a WSL/Linux terminal. It allows you to set environment variables in order to manage runtime configuration.
Copy file name to clipboardexpand all lines: src/intro-to-parallel-comp/parallel-algos.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -37,4 +37,4 @@ This is how one of the simplest parallel algorithms - **parallel sum** works. Al
37
37
38
38

39
39
40
-
Besides the difference between serial & parallel regions another important concept to note here is **partitioning** aka. chunking. Often when you're parallelising your serial algorithm you will have to define local, parallel tasks that will execute on different parts of your dataset simultaneously in order to acheive a speedup. This can be anything from a sum operation in this case, to a local/serial sort or even as complex as the training of a CNN model on a particular batch of images.
40
+
Besides the difference between serial & parallel regions another important concept to note here is **partitioning** aka. chunking. Often when you're parallelising your serial algorithm you will have to define local, parallel tasks that will execute on different parts of your input simultaneously in order to acheive a speedup. This can be anything from a sum operation in this case, to a local/serial sort or even as complex as the training of a CNN model on a particular batch of images.
0 commit comments