This is a repository where various scheduling policies available in OpenMP are investigated. The investigation is performed for two different workloads, one that is slightly unbalanced called loop1 and the other one that is very unbalanced with the most of the work concentrated in the first few iterations, called loop2.
The following schedulers provided by OpenMP are investigated:
STATIC,nDYNAMIC,nGUIDED,n
where n is the selected chunksize.
Additionally, a scheduler was designed by hand called affinity scheduler aiming to combine the characteristics of the afformentioned schedulers and compare the performance.
includes/: Contains the header file calledresources.hnecessary for the development of the code. Additionally, containsaffinity_structs.handmacros.hnecessary for the development of the affinity scheduler.src/main.c: The main source file used to execute each scheduling option for the two available workloads.src/loops/: Contains all the functions relevant to the workload, i.e initialisation and validation as well as execution of the workload.src/omplib/: Contains wrap functions of OpenMP commands in an effort to hide the APIs functions.src/affinity/: Contains all the functions used to develop the affinity scheduler.scripts/performance/: Contains all the performance tests available to measure the performance of the code.scripts/pbs/: Contains all the performance tests to run on the back-end of CIRRUS available to measure the performance of the code.scripts/plots/: Contains all the plot scripts available to plot the results of the performance tests.res/: Directory containing the raw results and plots for each test.
The designed affinity scheduler comes in two versions. The first one uses critical regions in order to synchronize the threads while the second locks. One can choose between the two versions by compiling the code with different DEFINE flag. Moreover, one can also choose between which scheduler to use to measure its performance. In other words, one can use another DEFINE flag to choose between the best_scheduling option chosen for each workload or choose to determine the scheduling option on the runtime.
The following options are available:
-DRUNTIME: Choose to select the scheduling option on the runtime.-DBEST_SCHEDULE: Choose to use the best scheduling option determined for each workload.-DBEST_SCHEDULE_LOOP2: Choose to use the best scheduling option determined for each workload after a further investigation ofloop2.-DAFFINITY: Choose to use affinity scheduler.-DLOCK: If set, the affinity scheduler with locks is used, otherwise the one with critical regions.
Note that one should only choose one of the four main options shown above. In case no option is selected, the serial version of the code is being executed.
To compile all the available versions of the code use:
$ make allThis will create all the necessary directories for the code to be executed. All the versions of the code are compiled using the different options showed above. This will result in the following executables:
bin/serial: Serial version of the code.bin/runtime: Parallel version of the code where scheduling can be determined on the runtime. Note that only the scheduling options provided by OpenMP can be selected.bin/best_schedule: The best scheduling options provided by OpenMP are used for each workload.bin/best_schedule_loop2: The best scheduling options provided by OpenMP are used for each workload after the best schedule option forloop2was tunned based on its chunksize.bin/affinity: The affinity scheduler with critical regions is used.bin/affinity_lock: The affinity scheduler with locks is used.
Alternatively, one can compile each version as follows: Create the required directories using:
$ make dirBuild the serial version:
$ make bin/serial -B
icc -O3 -qopenmp -std=c99 -Wall -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc obj/omplib.o obj/workload.o obj/main.o -o bin/serial -lm -qopenmpBuild the runtime version:
$ make bin/runtime DEFINE=-DRUNTIME -B
icc -O3 -qopenmp -std=c99 -Wall -DRUNTIME -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DRUNTIME -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DRUNTIME -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc obj/omplib.o obj/workload.o obj/main.o -o bin/runtime -lm -qopenmpBuild the best_scheduling version:
$ make bin/best_schedule DEFINE=-DBEST_SCHEDULE -B
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/main.o -o bin/best_schedule -lm -qopenmpBuild the best_scheduling version for loop2:
$ make bin/best_schedule_loop2 DEFINE=-DBEST_SCHEDULE_LOOP2 -B
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE_LOOP2 -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE_LOOP2 -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DBEST_SCHEDULE_LOOP2 -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/main.o -o bin/best_schedule_loop2 -lm -qopenmpBuild the affinity version with critical regions:
$ make bin/affinity DEFINE=-DAFFINITY -B
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/affinity.o -c src/affinity/affinity.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/mem.o -c src/affinity/mem.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/affinity.o obj/mem.o obj/main.o -o bin/affinity -lm -qopenmpBuild the affinity version with locks:
$ make bin/affinity_lock DEFINE=-DAFFINITY DEFINE+=-DLOCK -B
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/omplib.o -c src/omplib/omplib.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/workload.o -c src/loops/workload.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/affinity.o -c src/affinity/affinity.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/mem.o -c src/affinity/mem.c
icc -O3 -qopenmp -std=c99 -Wall -DAFFINITY -DLOCK -Iincludes -Isrc/affinity -Isrc/loops -Isrc/omplib -o obj/main.o -c src/main.c
icc -O3 -qopenmp -std=c99 -Wall obj/omplib.o obj/workload.o obj/affinity.o obj/mem.o obj/main.o -o bin/affinity_lock -lm -qopenmpTo clean the project run:
$ make cleanTo execute the serial code:
$ ./bin/serialTo execute the parallel code one has to choose the number of threads the code will be executed on. This can be done using:
$ export OMP_NUM_THREADS=$(THREADS)where $(THREADS) is the number of threads selected.
To executed the runtime version:
$ export OMP_SCHEDULE=$(KIND,n)
$ ./bin/runtimewhere $(KIND,n) is the selected scheduling option and chunksize used.
The available scheduling options are:
STATIC,n: Static schedulerDYNAMIC,n: Dynamic schedulerGUIDED,n: Guided scheduler wherenis the selected chunksize.
Example:
$ export OMP_NUM_THREADS=4
$ export OMP_SCHEDULE=DYNAMIC,2
$ ./bin/runtimeThis will execute the code on 4 threads using a dynamic scheduler with chunkisize of 2 for each workload.
To executed the best_scheduling version:
$ ./bin/best_scheduleThis will execute the code with GUIDED,16 for loop1 and DYNAMIC,8 for loop2.
To executed the best_scheduling_loop2 version:
$ ./bin/best_schedule_loop2This will execute the code with GUIDED,16 for loop1 and DYNAMIC,4 for loop2.
To executed the affinity version with critical regions use:
$ ./bin/affinityTo executed the affinity version with locks use:
$ ./bin/affinity_lockThis test executes multiple times the bin/runtime executable. Each time the performance of each OpenMP scheduling option is measured for different chunksizes. The number of threads is kept constant in order to determine the best scheduling option and chunksize for each workload.
Running on the front-end:
$ make runtime_testSubmitting a job on the back-end of CIRRUS:
$ make runtime_test_backTo plot the results once the test is finished run:
$ make plot_runtime_testObserving the results from the previous test, the best scheduling option is selected for each workload. This test runs multiple times the bin/best_schedule executable over a set of number of threads. The performance is then evaluated for each thread and each workload.
Running on the front-end:
$ make best_schedule_testSubmitting a job on the back-end of CIRRUS:
$ make best_schedule_test_backTo plot the results once the test is finished run:
$ make plot_runtime_testAs loop2 after the best_schedule test, saturates for the selected scheduling option and chunksize, a further investigation is performed. The executable bin/best_schedule_loop2 is executed multiple times over a set of number of threads and chunksizes for the selected best_scheduling option for loop2.
Running on the front-end:
$ make best_schedule_loop2_testSubmitting a job on the back-end of CIRRUS:
$ make best_schedule_loop2_test_backTo plot the results once the test is finished run:
$ make plot_runtime_testThe performance of the affinity scheduler is investigated for the two available versions, i.e when critical regions are used and when locks are used instead.
Running on the front-end:
$ make affinity_schedule_testSubmitting a job on the back-end of CIRRUS:
$ make affinity_schedule_test_backTo plot the results once the test is finished run:
$ make plot_runtime_testThe performance of all the implemented versions for each loop is evaluated and compared together.
Running on the front-end:
$ make performance_comparison_testSubmitting a job on the back-end of CIRRUS:
$ make performance_comparison_test_backTo plot the results once the test is finished run:
$ make plot_performance_comparison_testInstead of submiting the test scripts one by one, one can use the following to perform all the tests together:
$ make run_tests_frontIf one wants to submits the tests at the back-end can run:
$ make run_tests_backOnce all the tests are finished, the results can be plotted using:
$ make plot_tests