-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IA-2932 Add upload to azure functionality #362
base: master
Are you sure you want to change the base?
Conversation
4fe81ac
to
ea2bc77
Compare
This reverts commit ea2bc77.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I'm starting to think that the Google and Azure portions should be a little more separated.
Ideally, you should be able to operate on Google files if you have valid Google credentials and invalid Azure credentials. And vice versa. The same goes for running tests. Azure tests should print a "no azure credentials found message" and skip azure tests. The same goes for Google imo (I can add the Google side).
What do you think about an "azure" TNU_TESTMODE
?
It looks like there may need to be some apt dependencies that should at least be added to the README.MD
? (sudo apt install python3-gi python3-gi-cairo gir1.2-secret-1
:
test_exists (__main__.TestBlobStore) ... EnvironmentCredential.get_token failed: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
ImdsCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
Runtime dependency of PyGObject is missing.
Depends on your Linux distro, you could install it system-wide by something like:
sudo apt install python3-gi python3-gi-cairo gir1.2-secret-1
If necessary, please refer to PyGObject's doc:
https://pygobject.readthedocs.io/en/latest/getting_started.html
Traceback (most recent call last):
File "/home/quokka/venv/lib/python3.9/site-packages/msal_extensions/libsecret.py", line 21, in <module>
import gi # https://github.com/AzureAD/microsoft-authentication-extensions-for-python/wiki/Encryption-on-Linux
ModuleNotFoundError: No module named 'gi'
Also, I haven't checked all of the tests, but should the azure blobstore be included in the test_open()
test?:
terra-notebook-utils/tests/test_blobstore.py
Line 141 in 843a26b
for bs in (local_blobstore, gs_blobstore, url_blobstore): |
terra_notebook_utils/drs.py
Outdated
src_info = get_drs_info(drs_uri) | ||
src_blob = get_drs_blob(src_info, workspace_namespace) | ||
if dst.startswith("gs://"): | ||
bucket_name, key = _resolve_bucket_target(dst, src_info) | ||
dst_blob = GSBlob(bucket_name, key) | ||
# Azure url looks like https://qijlbdgpc4zqdee.blob.core.windows.net/qi-test-container/subdir/another/qi-blob3 | ||
if "windows.net" in dst: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like, ideally, we'd have a wasb:// scheme to distiguish these, especially if we want plain https:// support in the future.
Would you consider making this stricter? If so, would something like:
if url.startswth('https://') and url[len('https://'):].split('/')[0].endswith('.blob.core.windows.net'):
...
or even a function like:
def is_azure_url(self, azure_url: str) -> bool:
try:
self._resolve_azure_blob_path(azure_url)
return True
except:
return False
if is_azure_url(uri):
...
Work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think I'll use your first suggestion..._resolve_azure_blob_path
is already being called within the branch, so I think we don't want to call it twice
client.upload_blob(data) | ||
|
||
# with open("test", "wb") as data: | ||
# data.write(client.download_blob().readall()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftovers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the commented-out code is reading data which can still be interesting to test...it's commented out just becuz I was testing just write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this its own test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a test for the actual feature in ci, which uses an Azure access key....this manual test is really just for testing out auth/permission with DefaultAzureCredential
...we still need to figure out how exactly we do auth still in terra (there's some on-going discussion)
tests/manual_azure_test.py
Outdated
return service | ||
|
||
client = get_service("qijlbdgpc4zqdee").get_blob_client("qi-test-container", "qi-blob-1") | ||
# client = get_service("qinonmanagedapp").get_blob_client("qi-test-container", "qi-blob-1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftovers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was testing different storage accounts... I'll remove this for now
bs.blob(key).put(expected_data) | ||
self.assertEqual(expected_data, bs.blob(key).get()) | ||
bs.blob(key).delete() | ||
client = bs.blob(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a blob instead of a client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the client object for doing stuff with the blob
@@ -69,10 +77,14 @@ def _do_copy(src_blob: AnyBlob, dst_blob: AnyBlob, multipart_threshold: int, ind | |||
_download(src_blob, dst_blob, indicator_type) | |||
elif isinstance(src_blob, type(dst_blob)): | |||
_copy_intra_cloud(src_blob, dst_blob, indicator_type) | |||
elif isinstance(dst_blob, CloudBlob): | |||
elif isinstance(dst_blob, AzureBlob): | |||
assert isinstance(src_blob, (URLBlob, GSBlob)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the src_blob
is a LocalBlob
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this very well...but it's copied from https://github.com/DataBiosphere/terra-notebook-utils/blob/master/terra_notebook_utils/blobstore/copy_client.py#L73
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I'll investigate why this is disallowed then. My preference is to replace plain asserts with exceptions and helpful debug messages, but I don't know why this wouldn't be allowed right now.
I didn't test things on local laptop..I did this, which relies on a terra image, and that should have all the system dependencies |
I'll double check that there's a test for reading data...I don't know if |
@Qi77Qi The instructions you added were very helpful! Even though most use-cases will be inside of Terra, I'd like to keep instructions for TNU's use outside of Terra, as I think some of the functions, particularly DRS resolution, are quite useful as a stand-alone python API. Right now I think just noting the additional apt dependencies is sufficient and helpful when not in a Terra-like environment. |
|
https://broadworkbench.atlassian.net/browse/IA-2932
Tested on a terra VM using
b.adm.firec
account