This repository is a part of the 100 Days of GPU Challenge, a 100-day long challenge to learn GPU programming.
| Day | Kernel | Description |
|---|---|---|
| 1 | Vector Addition | Implemented a basic element-wise addition kernel using CUDA to add two vectors. Read the first two chapters from the PMPP Book. |
| 2 | Matrix Addition | Implemented a basic matrix Addition kernel using CUDA to add two matrices. |
| 3 | RGB to Grayscale Conversion | Implemented a RGB to Grayscale Conversion kernel using CUDA. Read the first 2 sections from the third chapter of the PMPP Book. |
| 4 | Blur a RGB Image | Implemented a Blur rgb image conversion kernel using CUDA. Read the section 3 from the PMPP Book, and also this blog. |
| 5 | Matrix Multiplication | Implemented a Matrix Multiplication kernel using CUDA. Finished chapter 3 of PMPP Book. |
| 6 | Matrix Transpose | Implemented a Matrix Transpose kernel using CUDA. Started reading Chapter 4 and gained a comprehensive understanding of the architecture of modern CUDA-capable GPUs, including block scheduling, synchronization, and transparent scalability. |
| 7 | Softmax | Implemnted Softmax Function with CUDA. |
| 8 | ReLU | Implemented a ReLU kernel using CUDA. Finished Chapter 4. Gained an understanding of warp scheduling, latency tolerance, and control divergence. |
| 9 | Tiled Matrix Multiplication | Implemented Matrix Multiplication kernel using Shared Memory |
| 10 | GeLU | Implemented GeLU Kernel using CUDA. Finished Chapter 5 and get to know the different types of CUDA memory and how tiling helps reduce memory traffic. |
| 11 | Conv1D | Implemented 1D Convolution with shared memory. |
| 12 | Online Softmax | Implemented Online Softmax. |
| 13 | Softmax (Shared Memory) | Implemented Softmax with shared-memory using CUDA. |