-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU option? #58
Comments
Ideally the whole decoding process would be computationally efficient enough to run on a single modest workstation computer within 3-6 hours. It's somewhat difficult to fit more than ~100 cores into a workstation, so GPUs could be worth pursuing if it's possible to achieve more than a 100-fold speed up on a single GPU such as the Titan RTX vs a single CPU core for decoding a single FOV. If a 500 or 1000-fold speed up is possible using a GPU it could be even more interesting. |
Can you elaborate a bit more on the performance target? Ideally an X bit dataset with Y FOV and Z z planes could be processed on a single workstation in 3-6 hours. 22 bits? 100 FOV? 8 z planes? |
Ideally a 22 bit dataset with 2500 FOVs each 2048x2408 pixels and 7 z planes could be decoded in 3-6 hours. This would cover a typical experiment of 1 square centimeter of 10-micron thick tissue. |
There are other ways to speed this up just using CPU. 1) you can remove the pixels with low intensity or based on some other features (number of non-zero rounds >= 4), this can reduce the time spent on searching for the nearest barcode. 2) finding nearest barcodes for pixel traces can be paralyzed by GPU. There are multiple GPU-supported nearest neighbor search algorithms. 3) extract the barcode could also be optimized. |
For your suggestion (1) see pull #64 :). |
@r3fang in my hands a typical dataset takes about 20 seconds to finish the neighbor search for each z plane, it's not clear to me this needs to be sped up. I think we're not that far off from @emanuega target performance. One aspect that could be tuned to improve overall performance is to provide more oversight to the scheduling of tasks. An example I've run into is when a set of optimize jobs get placed in a job queue behind a bunch of segmentation jobs. This slows down optimize and bottlenecks the pipeline, since segmentation could happen at a lot of different points without slowing down the overall run, but optimize needs to happen before a lot of other things can happen. |
Some processes could be faster if they were done on a GPU. Thoughts on whether this is worth testing? Clusters of GPUs are I think rare and probably expensive so maybe it doesn't make that much sense for the typical use case of analyzing 100s of FOV in parallel?
The text was updated successfully, but these errors were encountered: