Skip to content

Commit a4f55d2

Browse files
committed
Update README
1 parent 32df948 commit a4f55d2

File tree

2 files changed

+20
-12
lines changed

2 files changed

+20
-12
lines changed

README.md

+20-12
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,31 @@ A attempt at the [Ray Tracing in One Weekend](https://raytracing.github.io/books
66

77
```
88
$ make
9-
$ ./main > section14.ppm
9+
$ ./main > cover.ppm
1010
```
1111

12-
The resulting image is the cover image of the book: 1200x675px, 500 samples/pixel. Running `./main -small` generates a smaller image for testing purposes - 400x225px, 100 samples/pixel.
12+
The resulting image is the cover image of the book: 1200x675px, 500 samples/pixel.
1313

14-
The original, single-threaded implementation ran in 18m 41s with optimization flags enabled. Further optimizations reduced the time to 3m 31s, a 5.3x speedup. A writeup is in the works, so here's a summary:
14+
![Spheres of different size, color and material](cover.jpg)
1515

16-
* Multi-threading (4 threads) - 62% faster
17-
* Implementation changes - 18% faster
18-
* Changing from function pointers to enum + union for `scatter()`
19-
* Change `raycolor()` from recursive to iterative
20-
* Use custom random function instead of stdlib `rand()`
21-
* SIMD - 37% faster
22-
* Add [ARM Neon](https://community.arm.com/arm-community-blogs/b/operating-systems-blog/posts/arm-neon-programming-quick-reference) instructions in `vec3` functions
23-
* Vectorize `spherelisthit()` further to calculate 4 spheres at once
16+
The original, single-threaded implementation ran in 18m 41s with optimization flags enabled. Further optimizations reduced the time to 3m 31s, a 5.3x speedup. Here's a summary of optimizations with a smaller test image (`./main -small`) - 400x225px, 100 samples/pixel.
17+
18+
| Optimization | real | user | sys | cpu% |
19+
| --------------------------------------------------------- | ------ | ----- | ---- | ---- |
20+
| Optimization flags (`-O2`, `-flto`) | 24.568 | 24.52 | 0.01 | 99 |
21+
| Multi-threaded pixel rendering (4 threads) | 13.817 | 33.93 | 8.31 | 305 |
22+
| Multi-threaded scanline rendering | 9.459 | 36.43 | 0.02 | 385 |
23+
| Multi-threaded scanlines rendering | 9.261 | 36.68 | 0.01 | 396 |
24+
| Changing function pointer to enums + union | 7.855 | 31.10 | 0.01 | 396 |
25+
| Use custom inline random instead of stdlib `rand()` | 7.690 | 30.40 | 0.01 | 395 |
26+
| Change `raycolor()` to be iterative instead of recursive | 7.519 | 29.72 | 0.01 | 395 |
27+
| Use ARM Neon SIMD instructions in vector functions | 7.066 | 27.91 | 0.01 | 395 |
28+
| Use SIMD instructions to calculate 4 spheres at once | 6.085 | 24.01 | 0.01 | 394 |
29+
| Improved data loading for "4 spheres at once" | 5.847 | 23.09 | 0.01 | 395 |
30+
| Vectorized discrimiant check in `spherelisthit()` | 4.902 | 19.32 | 0.01 | 394 |
31+
| Small improvements in `spherelisthit()` | 4.743 | 18.70 | 0.01 | 394 |
2432

2533
## Resources
26-
- [jacobvosmaer/raytracingweekend](https://github.com/jacobvosmaer/raytracingweekend/) + accompaniying [blog post](http://blog.jacobvosmaer.nl/0022-ray-tracing-weekend/)
34+
- [jacobvosmaer/raytracingweekend](https://github.com/jacobvosmaer/raytracingweekend/) + accompanying [blog post](http://blog.jacobvosmaer.nl/0022-ray-tracing-weekend/)
2735
- [jfeintzeig/ray_tracer](https://github.com/JFeintzeig/ray_tracer) + accompanying [blog post](https://www.jakef.science/posts/simd-parallelism/)
2836
- [ARM Neon Instruction Set](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon])

cover.jpg

129 KB
Loading

0 commit comments

Comments
 (0)