Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
0e2234d
basic things working, adding direct lighting
Oct 9, 2016
2adb728
direct lighting kinda working
Oct 9, 2016
3b9dc94
done mostly, still need to sort by material i guess
Oct 11, 2016
1525b81
Added timers, won't work without stream compaction for some reason
Oct 11, 2016
099cf10
Mostly done, added some images
Oct 11, 2016
0fab397
Update README.md
Oct 11, 2016
6a61e2c
Update README.md
Oct 11, 2016
b4f8367
rendered missing images
Oct 11, 2016
ff15397
Merge branch 'master' of https://github.com/xnieamo/Project3-CUDA-Pat…
Oct 11, 2016
e94c489
Update README.md
Oct 11, 2016
f9b507f
Update README.md
Oct 11, 2016
8b23d17
performance plot
Oct 11, 2016
104de71
Merge branch 'master' of https://github.com/xnieamo/Project3-CUDA-Pat…
Oct 11, 2016
cf80980
Update README.md
Oct 11, 2016
59790e7
More plots
Oct 11, 2016
69e291a
Merge branch 'master' of https://github.com/xnieamo/Project3-CUDA-Pat…
Oct 11, 2016
773da33
update plot
Oct 11, 2016
652a11f
Update README.md
Oct 11, 2016
e32cc79
Cache plot
Oct 11, 2016
b88646c
Merge branch 'master' of https://github.com/xnieamo/Project3-CUDA-Pat…
Oct 11, 2016
e7a58fc
Update README.md
Oct 11, 2016
ad5fe48
equations
Oct 11, 2016
e14e255
Merge branch 'master' of https://github.com/xnieamo/Project3-CUDA-Pat…
Oct 11, 2016
96f34af
dof image
Oct 11, 2016
4834dc7
Update README.md
Oct 11, 2016
6e4b472
no aa image
Oct 12, 2016
af685b7
Merge branch 'master' of https://github.com/xnieamo/Project3-CUDA-Pat…
Oct 12, 2016
25f060b
Update README.md
Oct 12, 2016
5cf4137
Malley plot
Oct 12, 2016
d927bc7
Nsight profile pics
Oct 12, 2016
f0e40fe
Update README.md
Oct 12, 2016
a223ac9
Update README.md
Oct 12, 2016
2334f9f
Update README.md
Oct 12, 2016
160d0bc
Update README.md
Oct 12, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 117 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,123 @@ CUDA Path Tracer

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Xiaomao Ding
* Tested on: Windows 8.1, i7-4700MQ @ 2.40GHz 8.00GB, GT 750M 2047MB (Personal Computer)

### (TODO: Your README)
<img src="https://raw.githubusercontent.com/xnieamo/Project3-CUDA-Path-Tracer/master/img/cornell.2016-10-11_05-34-19z.5000samp.png" width="425"/> <img src="https://raw.githubusercontent.com/xnieamo/Project3-CUDA-Path-Tracer/master/img/cornell.2016-10-11_06-15-00z.5000samp.png" width="425"/>

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
# Introduction
The code in this repository implements a CUDA-based Monte Carlo path tracer allowing us to quickly render globally illuminated images. The path tracer sends out many rays into the scene to "sample" the colors of the objects in the scene. Upon hitting an object, another ray is generated at that location. This allows the path tracer to accumulate color of light bouncing off of nearby surfaces as well. Because each new ray generated is sampled from a probability distribution described by the object's material, it takes many iterations of the algorithm before a uniform "non-noisy" image emerges.

## Features
The following features have been implemented in this project:
* Direct illumination with Multiple Importance Sampling
* Depth of field via camera jittering
* Stochastic Sample Anti-aliasing
* Realistic reflective and refractive materials via Fresnel dielectrics and conductors
* Path termination via stream compaction
* First bounce caching

## Code guide
Features are enabled/disable via defines at the top of `interactions.h` and `pathtrace.cu`. MIS and Fresnel effects are toggled at the top of `interactions.h`. All other features are located at the top of `pathtrace.cu`.

#### Controls

* Esc to save an image and exit.
* S to save an image. Watch the console for the output filename.
* Space to re-center the camera at the original scene lookAt point
* left mouse button to rotate the camera
* right mouse button on the vertical axis to zoom in/out
* middle mouse button to move the LOOKAT point in the scene's X/Z plane

# Performance Analysis
## Direct illumination
The idea of direct illumination is to directly sample the light at each bounce by shooting and evaluating an additional ray. This in theory allows us to converge to a stable image much quicker than waiting for the rays to randomly hit light. In this project, the base global illumination renderer multiplicatively stacks the color of each surface hit. However, in the direct illumination renderer, I use a more realistic implementation of the light transport equation, found in the Physically Based Rendering Textbook [PBRT]. Thus, the images generated by the two algorithms will look different. For this section, it is more important to just compare how "grainy" the images are. Below are three sets of images, rendered with 10, 100, and 500 iterations, using the base renderer as well as the direct illumination renderer.

Base Renderer | Direct Illumination
:-------------------------:|:-------------------------:
![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_20-39-59z.10samp.png?raw=true) | ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-38-05z.10samp.png?raw=true)
![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_20-50-11z.100samp.png?raw=true) | ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-36-46z.100samp.png?raw=true)
![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_21-26-35z.500samp.png?raw=true) | ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png?raw=true)

It is clear that the direct illumination renderer gives a much nicer image at the same number of iterations. However, the increased renderering quality comes with a performance tradeoff. With the direct renderer, we shoot a ray to the light at each bounce, requiring an additional intersection test. The above image in particular was also rendered with multiple importance sampling, which shoots another ray from the intersection. This totals to 2 additional rays for each bounce. The plot below shows how much slower the direct illumination renderer is. All values in this plot as well as additional plots later on are averaged over 10 iterations of the renderer.

<p align="center">
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/BasicRenderer-DL-Time.png?raw=true">
</p>

Shockingly, the time taken for the actual path tracing is now over 10 times greater than the basic renderer! Overall, this the direct illumination renderer takes about 3 times longer per iteration, so we could have run the basic renderer for that many more iterations giving the same time. Below, I ran the basic renderer for 1500 iterations, compared to a 500 iteration image from the direct illumination renderer. Even with 1000 extra iterations, the direct illumination renderer looks a bit better.

Base Renderer 1500 iterations | Direct Illumination 500 iterations
:-------------------------:|:-------------------------:
![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/basic.2016-10-11_21-52-13z.1500samp.png?raw=true) | ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png?raw=true)

## Path termination via stream compaction
The direct illumination's runtime is fairly undesireable. Luckily, several of the other features in this project allow us drastically reduce the runtime. The first feature is path termination. The idea is to remove paths that have terminated (hit a light source or finished all their bounces) each time before we execute the path tracing kernel. In theory, this reduces the number of threads that need to be executed as the rays are evaluated. We use`thrust::partition` to perform stream compaction on the rays that have terminated. Here is a plot of the number of rays remaining, as well as the executition time of a single kernel call against the bounce number.
<p align="center">
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/SCRayCount.png?raw=true"/>
<br><br>
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/SCRuntime.png?raw=true"/>
</p>

<p align="center">
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/WithStreamCompaction.png?raw=true">
</p>
The plots show that the number of rays that need to be executed as the algorithm progress drops tremendously! While the first two stream compaction and path trace call take longer than without, all bounces from 3 and on take much less time than if we were to run the algorithm without stream compaction. By reducing the number of rays as we continue, we save a lot of time. In fact, we nearly half our runtime (consider that the number of intersections calculated initially is also reduced).

## First bounce caching
Because we are sampling for many iterations, we can also cache the rays first cast from the camera. This is because the camera will always be in the same position when we render. This saves us time by removing the need to cast the initial set of rays on each iteration. This naturally saves time as a function of the number of iterations we are running. Below is a plot of the time saved by caching the first rays for 10 iterations of the algorithm.

<p align="center">
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/FirstBounceCache.png?raw=true">
</p>

We save roughly 3 milliseconds per iteration when we cache the first bounces. For a 5000 iteration rendering, this amounts to roughly 15000 milliseconds save or about 43 (15000/350) additional iterations. This seems like a fairly insignificant boost in performance, especially because caching the first ray cast will prevent us from performing anti-aliasing.

## Anti-aliasing
If we only cast our initial ray from a single location, aliasing can occur. This is when there may be multiple objects in a single pixel but only one gets sampled. The result is that lines and boundaries in the image may appear jagged. To address this issue, we jitter our intial ray cast slightly within the pixel. This stochastic jittering let's us sample every object that may be in a pixel since our initial ray is no longer fixed. As just mentioned in the previous section, the we can no longer cache our first bounce. One possible solution would be to cache a large amount of initial casts. However, if we do so, we always have a higher risk of aliasing! Personally, I think the first bounce caching doesn't provide enough of a benefit to make pre-allocating a large amount of space to store so many more rays worthwhile. Below are two images, one anti-aliased, one not. Notice how the edges of the box and the border of the spheres are jagged in the non-aliased version.

No Anti-alias | With Anti-alias
:-------------------------:|:-------------------------:
![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/noaa.2016-10-11_23-58-49z.500samp.png?raw=true) | ![](https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/MIS-Comparisons/MIS.2016-10-11_20-32-42z.500samp.png?raw=true)

Because anti-aliasing is something that occurs at the start of each iteration, it is essentially free! (minus 2 random number generations). Unless we were caching our initial bounces, there are no performance tradeoffs for this feature.


## Depth of field
Depth of field refers to the blurring affect on objects not within the same plane of focus. We can mimic the effect of depth of field by slightly jittering our inital ray cast through an imaginary lens. The image below shows the result. Notice how the objects become less blurry the further away the are from the camera. This occurs if our focus is far away.

<p align="center">
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/depth.2016-10-11_22-51-27z.5000samp.png?raw=true">
</p>


Similar to anti-aliasing, there are no performance costs for this feature because it is a ray pre-processing step.

## Reflection/Refraction
We make use of two Fresnel equations to implement reflective and refractive materials. Note that the Fresnel implementations are only available in the direct illumination renderer. The basic renderer just assumes perfectly specular surfaces and uses Snell's law to calculate refracted rays. For reflections, we assume the material to be a Fresnel conductor. For refractions, we assume a Fresnel dielecctric. The formula for the two materials are as follows:
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/equations/Cond.PNG?raw=true" width="425"/> <img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/equations/Diel.PNG?raw=true" width="350"/>

We use an approximation of the two equations [PBRT 8.2-8.3] in our implementations. Additionally, the Fresnel equations are relatively cheap. The runtime assuming a perfect specular material is only 3.5 milliseconds faster per iteration. This means that the Fresnel materials are essentially free!

# Other small things
## Malley's Method
For diffuse materials, the brdf is Lambertian. This means that rays sampled off the bounce come from a uniform distribution over a hemisphere. Cosine sampling is a way to get a uniform distribution over a hemishpere. However, this requires several cosine/sine calls which are very costly on the GPU, since they depend on a limited number of special computation units. Malley's method is a way to approximate a cosine distribution. The great thing about this approximation is that it requires NO cosine or sines. Instead, we only need two random number generations. This means that using this method avoids the bottleneck on the GPU! Below is a plot showing how Malley's method can improve performance. One thing to note is that the time for intersections has decreased slightly as well. This is possible because the approximation is less accurate and sends more stray rays, but I am not certain what the root cause is.

<p align="center">
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/plots/MalleysApprox.png?raw=true">
</p>

## Odd performance boost without MIS
For some reason, taking out MIS from the code cuts the path tracing kernel runtime nearly in half! I have no idea why this happens. This is surprising because commenting out that section of the MIS code does not remove half the calculations (at least not in code). It is true though that 1 of the intersections is no longer used when MIS is disabled, so perhaps the compiler has also removed the intersection test call?

<p align="center">
With MIS.
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/profile/MISProfile.PNG?raw=true">
<br><br>
Without MIS.
<img src="https://github.com/xnieamo/Project3-CUDA-Path-Tracer/blob/master/img/profile/noMISProfile.PNG?raw=true">
</p>

# Reference
[PBRT] Physically Based Rendering, Second Edition: From Theory To Implementation. Pharr, Matt and Humphreys, Greg. 2010.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/cornell.2016-10-11_05-34-19z.5000samp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/cornell.2016-10-11_06-15-00z.5000samp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/depth.2016-10-11_22-51-27z.5000samp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/equations/Cond.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/equations/Diel.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/noaa.2016-10-11_23-58-49z.500samp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plots/BasicRenderer-DL-Time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plots/FirstBounceCache.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plots/MalleysApprox.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plots/SCRayCount.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plots/SCRuntime.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plots/WithStreamCompaction.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/profile/MISProfile.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/profile/noMISProfile.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
49 changes: 39 additions & 10 deletions scenes/cornell.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ EMITTANCE 5
MATERIAL 1
RGB .98 .98 .98
SPECEX 0
SPECRGB 0 0 0
SPECRGB 1 1 1
REFL 0
REFR 0
REFRIOR 0
Expand Down Expand Up @@ -43,30 +43,43 @@ MATERIAL 4
RGB .98 .98 .98
SPECEX 0
SPECRGB .98 .98 .98
REFL 1
REFL 0
REFR 1
REFRIOR 1.5
EMITTANCE 0
ABSORPTION 3

// Specular white
MATERIAL 5
RGB .98 .98 .98
SPECEX 0
SPECRGB 0 1 1
REFL 0
REFR 0
REFRIOR 0
REFRIOR 1.25
EMITTANCE 0
ABSORPTION 4

// Camera
CAMERA
RES 800 800
FOVY 45
ITERATIONS 5000
ITERATIONS 2
DEPTH 8
FILE cornell
FILE naa
EYE 0.0 5 10.5
LOOKAT 0 5 0
UP 0 1 0

LENSRADIUS 0.5
FOCALDIST 15

// Ceiling light
OBJECT 0
cube
sphere
material 0
TRANS 0 10 0
ROTAT 0 0 0
SCALE 3 .3 3
SCALE 3 1 3

// Floor
OBJECT 1
Expand Down Expand Up @@ -112,6 +125,22 @@ SCALE .01 10 10
OBJECT 6
sphere
material 4
TRANS -1 4 -1
ROTAT 0 0 0
TRANS -1 5 -1
ROTAT 0 45 0
SCALE 3 3 3

// Sphere
OBJECT 7
sphere
material 4
TRANS 3 3 3
ROTAT 0 45 0
SCALE 3 3 3

// Sphere
OBJECT 8
cube
material 5
TRANS -2 2 3
ROTAT 0 45 0
SCALE 2 2 2
Loading