Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be able to share compiled OpenCL binary caches among identical GPUs #187

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

harokyang
Copy link

This change solved the following issue

  1. When rendering with multiple GPUs/CPUs, the generated inputmap.cl of worker threads are different to each other
    Because each worker thread use its own instance of default_material in ClwSceneController, which has its own SceneObject id
    Different SceneObject id means different inputmap id, lead to different inputmap.cl

Change default_material from member variable to static variable solve this issue

  1. If rendering with multiple identical GPUs, they should have share the same kernel binary caches (with the first issue solved)
    But if there is no cache exists, all the GPUs will try to compile and generate the very same binary cache, at the same time
    This may cause a huge spike in memory consumption with lots of wasted works

Added a lock mechanism to allow only one worker to compile and generate binary cache, while the others will simply wait for it
Any workers with different GPU/CPU model won't be affected and will be able to generate their own version of kernel binaries

@harokyang
Copy link
Author

Can somebody give me a hint about the jenkins build report?
I have no access

@@ -262,6 +281,9 @@ CLWProgram CLProgram::GetCLWProgram(const std::string &opts)
// Save binaries
result.GetBinaries(0, binary);
SaveBinaries(cached_program_path, binary);

// Block other workers until binary cache generated
cache_lock.unlock();
Copy link
Contributor

@dtarakanov1 dtarakanov1 Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unlock is excessive since the cache_lock goes out of scope right after the call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -95,5 +96,7 @@ namespace Baikal
CLWContext m_context;
std::set<std::string> m_included_headers; ///< Set of included headers

static std::unordered_map<std::string, std::shared_ptr<std::mutex>> s_binary_cache_names;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a part of the class interface, so it might be moved into the .cpp file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into cl_program.cpp

@AvKhokhlov
Copy link
Contributor

Hello, according to the build logs the test 'AovTest.Aov_ObjectId' is failed. You should be able to reproduce this issue on Ubuntu/Windows machines. Looks like this is permanent issue for all of them. Try to launch Baikal unit tests

@harokyang
Copy link
Author

Thanks, I will look into it

@AlexanderVeselov
Copy link
Contributor

The AovTest.Aov_ObjectId test has failed because ClwSceneController::m_default_material has become a static field. Therefore, construction order for scene objects has changed. IDs for the objects are given one by one as they constructs, so now the test objects have other IDs => other AOV color.

@harokyang
Copy link
Author

harokyang commented Aug 23, 2018

Yes, seems the only way to solve the issue is by update the unit test reference image
Also I made a change again, m_default_material is now part of the scene properties
A material should stay with data object instead of controller
Data required to construct a inputmap should only come from the scene, not the controller

And no more static required when put m_default_material inside the scene object
All controllers can generate the same inputmap this way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants