Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IA-2932 Add upload to azure functionality #362

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

IA-2932 Add upload to azure functionality #362

wants to merge 14 commits into from

Conversation

Qi77Qi
Copy link
Collaborator

@Qi77Qi Qi77Qi commented Sep 28, 2021

https://broadworkbench.atlassian.net/browse/IA-2932

Tested on a terra VM using b.adm.firec account

>>> drs.copy("drs://jade.datarepo-dev.broadinstitute.org/v1_0c86170e-312d-4b39-a0a4-2a2bfaa24c7a_c0e40912-8b14-43f6-9a2f-b278144d0060", "https://qijlbdgpc4zqdee.blob.core.windows.net/qi-test-container/qi-blob-10-6")
2021-10-06 06:36:51::INFO  Using Access Key
2021-10-06 06:36:51::INFO  Request URL: 'https://qijlbdgpc4zqdee.blob.core.windows.net/qi-test-container/qi-blob-10-6'/nRequest method: 'PUT'/nRequest headers:/n    'x-ms-blob-type': 'REDACTED'/n    'Content-Length': '62043448'/n    'If-None-Match': '*'/n    'x-ms-version': 'REDACTED'/n    'Content-Type': 'application/octet-stream'/n    'Accept': 'application/xml'/n   'User-Agent': 'azsdk-python-storage-blob/12.8.1 Python/3.7.10 (Linux-5.4.144+-x86_64-with-debian-buster-sid)'/n    'x-ms-date': 'REDACTED'/n    'x-ms-client-request-id': '5f7692cc-26d4-11ec-9ae3-0242ac120005'/n    'Authorization': 'REDACTED'/nA body is sent with the request
2021-10-06 06:36:53::INFO  Response status: 201/nResponse headers:/n    'Content-Length': '0'/n    'Content-MD5': 'REDACTED'/n    'Last-Modified': 'Wed, 06 Oct 2021 18:36:53 GMT'/n    'ETag': '"0x8D988F844B358B1"'/n    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'/n    'x-ms-request-id': 'cff454e3-401e-00ef-77e1-ba31ac000000'/n    'x-ms-client-request-id': '5f7692cc-26d4-11ec-9ae3-0242ac120005'/n    'x-ms-version': 'REDACTED'/n    'x-ms-content-crc64': 'REDACTED'/n    'x-ms-request-server-encrypted': 'REDACTED'/n    'Date': 'Wed, 06 Oct 2021 18:36:53 GMT'
 https://qijlbdgpc4zqdee.blob.core.window   100%   [========================================]   59.2MiB   25.6MiB/s   2.31s

@Qi77Qi Qi77Qi changed the title WIP Add upload to azure functionality IA-2932 WIP Add upload to azure functionality Sep 28, 2021
@Qi77Qi Qi77Qi force-pushed the add-azure branch 6 times, most recently from 4fe81ac to ea2bc77 Compare October 6, 2021 16:20
@Qi77Qi Qi77Qi changed the title IA-2932 WIP Add upload to azure functionality IA-2932 Add upload to azure functionality Oct 6, 2021
@Qi77Qi Qi77Qi requested a review from DailyDreaming October 7, 2021 16:15
Copy link
Member

@DailyDreaming DailyDreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'm starting to think that the Google and Azure portions should be a little more separated.

Ideally, you should be able to operate on Google files if you have valid Google credentials and invalid Azure credentials. And vice versa. The same goes for running tests. Azure tests should print a "no azure credentials found message" and skip azure tests. The same goes for Google imo (I can add the Google side).

What do you think about an "azure" TNU_TESTMODE?

It looks like there may need to be some apt dependencies that should at least be added to the README.MD? (sudo apt install python3-gi python3-gi-cairo gir1.2-secret-1:

test_exists (__main__.TestBlobStore) ... EnvironmentCredential.get_token failed: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
ImdsCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
ManagedIdentityCredential.get_token failed: ManagedIdentityCredential authentication unavailable, no managed identity endpoint found.
Runtime dependency of PyGObject is missing.
Depends on your Linux distro, you could install it system-wide by something like:
    sudo apt install python3-gi python3-gi-cairo gir1.2-secret-1
If necessary, please refer to PyGObject's doc:
https://pygobject.readthedocs.io/en/latest/getting_started.html
Traceback (most recent call last):
  File "/home/quokka/venv/lib/python3.9/site-packages/msal_extensions/libsecret.py", line 21, in <module>
    import gi  # https://github.com/AzureAD/microsoft-authentication-extensions-for-python/wiki/Encryption-on-Linux
ModuleNotFoundError: No module named 'gi'

Also, I haven't checked all of the tests, but should the azure blobstore be included in the test_open() test?:

for bs in (local_blobstore, gs_blobstore, url_blobstore):

src_info = get_drs_info(drs_uri)
src_blob = get_drs_blob(src_info, workspace_namespace)
if dst.startswith("gs://"):
bucket_name, key = _resolve_bucket_target(dst, src_info)
dst_blob = GSBlob(bucket_name, key)
# Azure url looks like https://qijlbdgpc4zqdee.blob.core.windows.net/qi-test-container/subdir/another/qi-blob3
if "windows.net" in dst:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like, ideally, we'd have a wasb:// scheme to distiguish these, especially if we want plain https:// support in the future.

Would you consider making this stricter? If so, would something like:

if url.startswth('https://') and url[len('https://'):].split('/')[0].endswith('.blob.core.windows.net'):
    ...

or even a function like:

def is_azure_url(self, azure_url: str) -> bool:
    try:
        self._resolve_azure_blob_path(azure_url)
        return True
    except:
        return False

if is_azure_url(uri):
    ...

Work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think I'll use your first suggestion..._resolve_azure_blob_path is already being called within the branch, so I think we don't want to call it twice

client.upload_blob(data)

# with open("test", "wb") as data:
# data.write(client.download_blob().readall())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftovers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the commented-out code is reading data which can still be interesting to test...it's commented out just becuz I was testing just write

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make this its own test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a test for the actual feature in ci, which uses an Azure access key....this manual test is really just for testing out auth/permission with DefaultAzureCredential...we still need to figure out how exactly we do auth still in terra (there's some on-going discussion)

return service

client = get_service("qijlbdgpc4zqdee").get_blob_client("qi-test-container", "qi-blob-1")
# client = get_service("qinonmanagedapp").get_blob_client("qi-test-container", "qi-blob-1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftovers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing different storage accounts... I'll remove this for now

bs.blob(key).put(expected_data)
self.assertEqual(expected_data, bs.blob(key).get())
bs.blob(key).delete()
client = bs.blob(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a blob instead of a client?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the client object for doing stuff with the blob

@@ -69,10 +77,14 @@ def _do_copy(src_blob: AnyBlob, dst_blob: AnyBlob, multipart_threshold: int, ind
_download(src_blob, dst_blob, indicator_type)
elif isinstance(src_blob, type(dst_blob)):
_copy_intra_cloud(src_blob, dst_blob, indicator_type)
elif isinstance(dst_blob, CloudBlob):
elif isinstance(dst_blob, AzureBlob):
assert isinstance(src_blob, (URLBlob, GSBlob))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the src_blob is a LocalBlob?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'll investigate why this is disallowed then. My preference is to replace plain asserts with exceptions and helpful debug messages, but I don't know why this wouldn't be allowed right now.

@Qi77Qi
Copy link
Collaborator Author

Qi77Qi commented Oct 12, 2021

It looks like there may need to be some apt dependencies that should at least be added to the README.MD? (sudo apt install python3-gi python3-gi-cairo gir1.2-secret-1

I didn't test things on local laptop..I did this, which relies on a terra image, and that should have all the system dependencies

@Qi77Qi
Copy link
Collaborator Author

Qi77Qi commented Oct 12, 2021

Also, I haven't checked all of the tests, but should the azure blobstore be included in the test_open() test?:

I'll double check that there's a test for reading data...I don't know if test_open() applies directly

@DailyDreaming
Copy link
Member

@Qi77Qi The instructions you added were very helpful! Even though most use-cases will be inside of Terra, I'd like to keep instructions for TNU's use outside of Terra, as I think some of the functions, particularly DRS resolution, are quite useful as a stand-alone python API. Right now I think just noting the additional apt dependencies is sufficient and helpful when not in a Terra-like environment.

@Qi77Qi
Copy link
Collaborator Author

Qi77Qi commented Oct 12, 2021

test_put_get_delete is already testing read/write/delete...not sure exactly what test_open() is validating 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants