Section on reading from blobs with Python #71

CPBridge · 2025-03-04T21:34:39Z

@fedorov here is a new page on using python tools to read directly from blob objects

This is the first time I've used gitbook, so not sure if there's anything else I need to do to make sure the page is included etc?

CPBridge · 2025-03-04T21:52:33Z

Ok I figured it out, it appears to be showing correctly in the preview

CPBridge · 2025-03-04T22:31:44Z

I'm getting a load of errors in the CI pipeline about incorrect URLs, but these seem unrelated to my changes

fedorov · 2025-03-06T17:13:38Z

@CPBridge thank you for the PR! I would like to first fix ImagingDataCommons/idc-index#102, since otherwise it will be difficult to explain where users can get those file paths.

Sorry about the false alarms with the failing checks.

fedorov · 2025-03-27T20:29:16Z

@CPBridge I finally implemented support for getting URLs from GCS, and while testing it, I discovered that it seems that the current sample code at least requires the user to authenticate and provide credentials. Is this right? Do you know if it is possible to somehow access the bucket without signing in - since the content is public. idc-index can download the entire file without login.

You can see the error and the updated part getting file URLs via idc-index in this notebook: https://colab.research.google.com/drive/1_5cc_kb-FgUl9r0jePX_QbSvwBFEluMg?usp=sharing

CPBridge · 2025-03-27T21:27:17Z

@fedorov this can be avoided by using an anonymous client. I have updated the examples in the PR to demonstrate this

fedorov · 2025-04-07T17:35:30Z

@CPBridge FYI, the below takes ~30 sec on Colab (and about 47 sec when I ran it first). Is this expected?

CPBridge · 2025-04-18T15:22:40Z

@fedorov I have updated the recommendations for S3 in most cases to avoid the slow behavior you observed.

I tracked it down to the way smart_open buffers data. It maintains a buffer of data it pulls down but it re-populates this whenever you do a seek operation, even if the new seek position exists within the current buffer. This is problematic because pydicom does a lot of little seek operations when reading in order to jump around through tags. There is some discussion on this topic on the smart_open repo here. I just made some suggestions, we'll see if they are interested in improving this.

CPBridge · 2025-05-14T21:51:32Z

@fedorov reminder about this

fedorov · 2025-05-15T16:01:02Z

@CPBridge I updated, please review and let me know if you have any concerns!

* added section on getting file URLs using idc-index * updated all examples with the details on getting bucket file URLs using idc-index * replaced direct use of tag numbers with keyword lookup * updated importance of offset tables to clarify it is about SM This notebook contains all of the code samples for convenient testing - I plan to update this notebook and add it to IDC-Tutorials. https://colab.research.google.com/drive/1_5cc_kb-FgUl9r0jePX_QbSvwBFEluMg?usp=sharing

CPBridge · 2025-05-15T19:40:04Z

Thanks @fedorov. I found a few bugs in your examples (some lines looked like they were out of date or spliced from the wrong example). So I fixed them and made some minor cosmetic changes at the same time.

One small concern I have is while the method used to find the largest image in the pyramid nicely demonstrates the some of the capabilities listed on the page, it is also quite inefficient and not how I would recommend people do this. I would recommend doing this either with bigquery or even idc-index, which if memory serves has an instance-level table for SM images that would contain this information. What do you think? I think at the very least we should point out the availability of better options

CPBridge · 2025-05-15T19:43:15Z

P.s. I also added a small snippet to figure out whether a given image has an EOT/BOT, which I figured out a few weeks ago for something else but it might be useful on this page

…index function

fedorov · 2025-05-16T22:31:51Z

I found a few bugs in your examples

Thank you for fixing them, sorry!

I would recommend doing this either with bigquery or even idc-index, which if memory serves has an instance-level table for SM images that would contain this information. What do you think?

It took me a little bit more than I hoped, but I added a helper function (otherwise it would be ugly to build AWS/GCP-specific URLs) in idc-index 0.8.7, and update the relevant code snippet!

If this looks good, I am going to merge - let me know!

CPBridge · 2025-05-16T23:51:49Z

@fedorov looks good to me!

CPBridge added 7 commits March 4, 2025 16:07

Add page on direct loading of data

06908dc

Various tweaks

7a891cf

Adapt image size

9c1bf30

Adjust image again

038d5b5

Center the image

9c0ab51

Correct image size again

ce77620

Further minor tweaks

765b91e

CPBridge added the documentation Improvements or additions to documentation label Mar 4, 2025

CPBridge requested a review from fedorov March 4, 2025 21:35

CPBridge added 3 commits March 4, 2025 16:37

Add page to summar.md

7521e26

Add section on offset tables

27f7de6

Fix typo, minor clarification

8f3a832

CPBridge added 2 commits March 4, 2025 18:10

Further typo

09c04eb

remove extra #

1efc9b2

CPBridge added 10 commits March 18, 2025 10:38

Add S3 examples

d842799

Change title to match conventio

594a9cd

Re-organization of pydicom section

1b70072

Fix text wrapping

811fb71

Update index entry title

72836b6

Rephrase pydicom section

68899ba

Further restructuring

c439d5c

GDS -> GCS

26cdd5b

title capitalisation

149cdb6

GCS typo

f87bfde

Switch to anonymous client

b6ade85

CPBridge added 2 commits April 18, 2025 11:14

Adapt S3 reading recommendations

7e8e117

fix line wrapping

26f8371

CPBridge added 2 commits April 23, 2025 09:48

Use context managers for open methods

80a944e

Fix minor typo

e136261

fedorov force-pushed the reading-from-blobs branch from 59166d2 to c5bb50e Compare May 15, 2025 16:29

CPBridge added 2 commits May 15, 2025 15:30

Tweaks to index examples

0d5fbdf

Minor tweaks

d9c8ee2

fedorov mentioned this pull request May 16, 2025

Add functionality to get instance file URL from SOPInstanceUID ImagingDataCommons/idc-index#161

Closed

replace iteration over instances with direct selection using new idc-…

cf9b7e5

…index function

fedorov merged commit f719411 into prod May 19, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Section on reading from blobs with Python #71

Section on reading from blobs with Python #71

Uh oh!

CPBridge commented Mar 4, 2025

Uh oh!

CPBridge commented Mar 4, 2025

Uh oh!

CPBridge commented Mar 4, 2025

Uh oh!

fedorov commented Mar 6, 2025

Uh oh!

fedorov commented Mar 27, 2025

Uh oh!

CPBridge commented Mar 27, 2025

Uh oh!

fedorov commented Apr 7, 2025

Uh oh!

CPBridge commented Apr 18, 2025 •

edited

Loading

Uh oh!

CPBridge commented May 14, 2025

Uh oh!

fedorov commented May 15, 2025

Uh oh!

CPBridge commented May 15, 2025 •

edited

Loading

Uh oh!

CPBridge commented May 15, 2025

Uh oh!

fedorov commented May 16, 2025

Uh oh!

CPBridge commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Section on reading from blobs with Python #71

Section on reading from blobs with Python #71

Uh oh!

Conversation

CPBridge commented Mar 4, 2025

Uh oh!

CPBridge commented Mar 4, 2025

Uh oh!

CPBridge commented Mar 4, 2025

Uh oh!

fedorov commented Mar 6, 2025

Uh oh!

fedorov commented Mar 27, 2025

Uh oh!

CPBridge commented Mar 27, 2025

Uh oh!

fedorov commented Apr 7, 2025

Uh oh!

CPBridge commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CPBridge commented May 14, 2025

Uh oh!

fedorov commented May 15, 2025

Uh oh!

CPBridge commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CPBridge commented May 15, 2025

Uh oh!

fedorov commented May 16, 2025

Uh oh!

CPBridge commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

CPBridge commented Apr 18, 2025 •

edited

Loading

CPBridge commented May 15, 2025 •

edited

Loading