Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU option? #58

Open
HazenBabcock opened this issue Apr 12, 2020 · 6 comments
Open

GPU option? #58

HazenBabcock opened this issue Apr 12, 2020 · 6 comments

Comments

@HazenBabcock
Copy link
Contributor

Some processes could be faster if they were done on a GPU. Thoughts on whether this is worth testing? Clusters of GPUs are I think rare and probably expensive so maybe it doesn't make that much sense for the typical use case of analyzing 100s of FOV in parallel?

@emanuega
Copy link
Owner

emanuega commented May 4, 2020

Ideally the whole decoding process would be computationally efficient enough to run on a single modest workstation computer within 3-6 hours. It's somewhat difficult to fit more than ~100 cores into a workstation, so GPUs could be worth pursuing if it's possible to achieve more than a 100-fold speed up on a single GPU such as the Titan RTX vs a single CPU core for decoding a single FOV. If a 500 or 1000-fold speed up is possible using a GPU it could be even more interesting.

@HazenBabcock
Copy link
Contributor Author

Can you elaborate a bit more on the performance target? Ideally an X bit dataset with Y FOV and Z z planes could be processed on a single workstation in 3-6 hours. 22 bits? 100 FOV? 8 z planes?

@emanuega
Copy link
Owner

emanuega commented May 5, 2020

Ideally a 22 bit dataset with 2500 FOVs each 2048x2408 pixels and 7 z planes could be decoded in 3-6 hours. This would cover a typical experiment of 1 square centimeter of 10-micron thick tissue.

@r3fang
Copy link

r3fang commented May 22, 2020

There are other ways to speed this up just using CPU. 1) you can remove the pixels with low intensity or based on some other features (number of non-zero rounds >= 4), this can reduce the time spent on searching for the nearest barcode. 2) finding nearest barcodes for pixel traces can be paralyzed by GPU. There are multiple GPU-supported nearest neighbor search algorithms. 3) extract the barcode could also be optimized.

@HazenBabcock
Copy link
Contributor Author

For your suggestion (1) see pull #64 :).

@seichhorn
Copy link
Collaborator

@r3fang in my hands a typical dataset takes about 20 seconds to finish the neighbor search for each z plane, it's not clear to me this needs to be sped up.

I think we're not that far off from @emanuega target performance. One aspect that could be tuned to improve overall performance is to provide more oversight to the scheduling of tasks. An example I've run into is when a set of optimize jobs get placed in a job queue behind a bunch of segmentation jobs. This slows down optimize and bottlenecks the pipeline, since segmentation could happen at a lot of different points without slowing down the overall run, but optimize needs to happen before a lot of other things can happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants