Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant manifest generations due to race condition #6850

Open
hannes-ucsc opened this issue Jan 25, 2025 · 4 comments
Open

Redundant manifest generations due to race condition #6850

hannes-ucsc opened this issue Jan 25, 2025 · 4 comments
Assignees
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified groomed [process] Issue was recently looked at during backlog grooming manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team

Comments

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Jan 25, 2025

The race is transiently manifesting in a failed assertion in an integration test that is intended to catch them.

https://gitlab.azul.data.humancellatlas.org/ucsc/azul/-/jobs/71737

https://ucsc-gi.slack.com/archives/C705Y6G9Z/p1737762174254489?thread_ts=1737595686.060409&cid=C705Y6G9Z

@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label Jan 25, 2025
@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented Jan 25, 2025

The IT logs from the failed subtest (i=1). Note how two of the workers receive token lMQQjOlyoYCSWtaqiysTlLVdywEAAQ== while one worker gets token lMQQjOlyoYCSWtaqiysTlLVdywIAAQ==:

1 2025-01-24 23:13:58,128 INFO MainThread test.integration_test: Beginning sub-test [manifest] {'catalog': 'dcp45-it', 'format': <ManifestFormat.verbatim_jsonl: 'verbatim.jsonl'>, 'i': 1, 'fetch': False}
2 2025-01-24 23:13:58,131 INFO ThreadPoolExecutor-19_0 test.integration_test: Making PUT request to 'https://service.azul.data.humancellatlas.org/manifest/files?catalog=dcp45-it&filters=%7B%22fileSize%22%3A+%7B%22within%22%3A+%5B%5B0%2C+1817888187269%5D%5D%7D%7D&format=verbatim.jsonl'
3 2025-01-24 23:13:58,131 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … without request body
4 2025-01-24 23:13:58,133 INFO ThreadPoolExecutor-19_1 test.integration_test: Making PUT request to 'https://service.azul.data.humancellatlas.org/manifest/files?catalog=dcp45-it&filters=%7B%22fileSize%22%3A+%7B%22within%22%3A+%5B%5B0%2C+1817888187269%5D%5D%7D%7D&format=verbatim.jsonl'
5 2025-01-24 23:13:58,135 INFO ThreadPoolExecutor-19_2 test.integration_test: Making PUT request to 'https://service.azul.data.humancellatlas.org/manifest/files?catalog=dcp45-it&filters=%7B%22fileSize%22%3A+%7B%22within%22%3A+%5B%5B0%2C+1817888187269%5D%5D%7D%7D&format=verbatim.jsonl'
6 2025-01-24 23:13:58,135 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … without request body
7 2025-01-24 23:13:58,136 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … without request body
8 2025-01-24 23:13:58,460 INFO ThreadPoolExecutor-19_1 test.integration_test: Got 301 response after 0.324s from PUT to https://service.azul.data.humancellatlas.org/manifest/files?catalog=dcp45-it&filters=%7B%22fileSize%22%3A+%7B%22within%22%3A+%5B%5B0%2C+1817888187269%5D%5D%7D%7D&format=verbatim.jsonl
9 2025-01-24 23:13:58,461 WARNING ThreadPoolExecutor-19_2 urllib3.connectionpool: Connection pool is full, discarding connection: service.azul.data.humancellatlas.org. Connection pool size: 1
10 2025-01-24 23:13:58,461 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … with response headers HTTPHeaderDict({'Content-Type': 'application/json', 'Content-Length': '4', 'Connection': 'keep-alive', 'Date': 'Fri, 24 Jan 2025 23:13:58 GMT', 'X-Amzn-Trace-Id': 'Root=1-67941eb6-5bb2dc7636bf207315b8a9ba;Parent=65669abf7bb3eb4f;Sampled=0;Lineage=1:2808a99a:0', 'x-amzn-RequestId': 'd245afd4-f8a5-487f-b32f-03baba272836', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Access-Control-Allow-Headers': 'Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key', 'X-Frame-Options': 'DENY', 'Content-Security-Policy': "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", 'x-amz-apigw-id': 'E6m8iFSUIAMEPKA=', 'Cache-Control': 'no-store', 'X-Content-Type-Options': 'nosniff', 'Location': 'https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ==', 'Retry-After': '1', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 c2ab43aa4f5ff7b5cc58894f6a8494bc.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P3', 'X-Amz-Cf-Id': '8Wi0cln5liXCZaWJYB_xnc_lwRjBTH-cOunKlEKjS_aE52niiBql3Q=='})
11 2025-01-24 23:13:58,462 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … with response body b'null'
12 2025-01-24 23:13:58,462 INFO ThreadPoolExecutor-19_1 test.integration_test: Sleeping 1.000s to honor Retry-After header
13 2025-01-24 23:13:58,464 INFO ThreadPoolExecutor-19_2 test.integration_test: Got 301 response after 0.327s from PUT to https://service.azul.data.humancellatlas.org/manifest/files?catalog=dcp45-it&filters=%7B%22fileSize%22%3A+%7B%22within%22%3A+%5B%5B0%2C+1817888187269%5D%5D%7D%7D&format=verbatim.jsonl
14 2025-01-24 23:13:58,465 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … with response headers HTTPHeaderDict({'Content-Type': 'application/json', 'Content-Length': '4', 'Connection': 'keep-alive', 'Date': 'Fri, 24 Jan 2025 23:13:58 GMT', 'X-Amzn-Trace-Id': 'Root=1-67941eb6-7a4245d81abfd06947b9520b;Parent=11e602d5ed8ac293;Sampled=0;Lineage=1:2808a99a:0', 'x-amzn-RequestId': '0135d676-87b3-4dbb-9398-e2ca4c10318c', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Access-Control-Allow-Headers': 'Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key', 'X-Frame-Options': 'DENY', 'Content-Security-Policy': "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", 'x-amz-apigw-id': 'E6m8iGu8oAMEtLA=', 'Cache-Control': 'no-store', 'X-Content-Type-Options': 'nosniff', 'Location': 'https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ==', 'Retry-After': '1', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 1e54a46a2c47322a41a5f070c060e878.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P3', 'X-Amz-Cf-Id': 'iQN2zfbwL531cZKNAbwTehQM6rEMtiJzpts2hgW_nzARQ2qEFelEYQ=='})
15 2025-01-24 23:13:58,465 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … with response body b'null'
16 2025-01-24 23:13:58,465 INFO ThreadPoolExecutor-19_2 test.integration_test: Sleeping 1.000s to honor Retry-After header
17 2025-01-24 23:13:59,162 WARNING ThreadPoolExecutor-19_0 urllib3.connectionpool: Connection pool is full, discarding connection: service.azul.data.humancellatlas.org. Connection pool size: 1
18 2025-01-24 23:13:59,164 INFO ThreadPoolExecutor-19_0 test.integration_test: Got 301 response after 1.031s from PUT to https://service.azul.data.humancellatlas.org/manifest/files?catalog=dcp45-it&filters=%7B%22fileSize%22%3A+%7B%22within%22%3A+%5B%5B0%2C+1817888187269%5D%5D%7D%7D&format=verbatim.jsonl
19 2025-01-24 23:13:59,164 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … with response headers HTTPHeaderDict({'Content-Type': 'application/json', 'Content-Length': '4', 'Connection': 'keep-alive', 'Date': 'Fri, 24 Jan 2025 23:13:59 GMT', 'X-Amzn-Trace-Id': 'Root=1-67941eb6-4914599347d60d1508ab45f1;Parent=003717684b0b3fda;Sampled=0;Lineage=1:2808a99a:0', 'x-amzn-RequestId': '8c1396c8-1898-4379-99fc-690675de2cf5', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Access-Control-Allow-Headers': 'Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key', 'X-Frame-Options': 'DENY', 'Content-Security-Policy': "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", 'x-amz-apigw-id': 'E6m8iE72oAMEc3g=', 'Cache-Control': 'no-store', 'X-Content-Type-Options': 'nosniff', 'Location': 'https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywIAAQ==', 'Retry-After': '1', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 f7f49dad2d783fde3adeef21381de800.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P3', 'X-Amz-Cf-Id': 'jLgo3tYeZsiUMWKOrbUOB_ji7NJOdT9PPo3nLKd-BWNfOJNlTy6vhg=='})
20 2025-01-24 23:13:59,165 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … with response body b'null'
21 2025-01-24 23:13:59,165 INFO ThreadPoolExecutor-19_0 test.integration_test: Sleeping 1.000s to honor Retry-After header
22 2025-01-24 23:13:59,464 INFO ThreadPoolExecutor-19_1 test.integration_test: Making GET request to 'https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ=='
23 2025-01-24 23:13:59,464 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … without request body
24 2025-01-24 23:13:59,466 INFO ThreadPoolExecutor-19_2 test.integration_test: Making GET request to 'https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ=='
25 2025-01-24 23:13:59,466 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … without request body
26 2025-01-24 23:13:59,581 INFO ThreadPoolExecutor-19_1 test.integration_test: Got 302 response after 0.116s from GET to https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ==
27 2025-01-24 23:13:59,581 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … with response headers HTTPHeaderDict({'Content-Type': 'text/plain', 'Content-Length': '3065', 'Connection': 'keep-alive', 'Date': 'Fri, 24 Jan 2025 23:13:59 GMT', 'X-Amzn-Trace-Id': 'Root=1-67941eb7-09b794f6587c8a3303d479e2;Parent=3835f84b0e95c574;Sampled=0;Lineage=1:2808a99a:0', 'x-amzn-RequestId': '2a7fc13c-6ce5-4c1e-a4d4-de6da5c777e9', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Access-Control-Allow-Headers': 'Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key', 'X-Frame-Options': 'DENY', 'Content-Security-Policy': "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", 'x-amz-apigw-id': 'E6m8uHKNoAMEv4A=', 'Cache-Control': 'no-store', 'X-Content-Type-Options': 'nosniff', 'Location': 'https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHLU5XHM5U&Signature=n6K4DvYAiBO5AYhezmC3M40Vbno%3D&x-amz-security-token=[MASKED]&Expires=1737764039', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 c2ab43aa4f5ff7b5cc58894f6a8494bc.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P3', 'X-Amz-Cf-Id': '-yMM5X-YaGR1fgHgVMdZrrUG9gpzAhas_XOe_hnj9sqP6WP1jL4ABw=='})
28 2025-01-24 23:13:59,581 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … with response body b'\nDownload the manifest in cmd.exe with curl using:\n\ncurl.exe --location --fail --output "hca-manifest-20dcb2a8-a9a9-5abc-80...'
29 2025-01-24 23:13:59,586 INFO ThreadPoolExecutor-19_1 test.integration_test: Making GET request to 'https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHLU5XHM5U&Signature=n6K4DvYAiBO5AYhezmC3M40Vbno%3D&x-amz-security-token=[MASKED]&Expires=1737764039'
30 2025-01-24 23:13:59,587 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … without request body
31 2025-01-24 23:13:59,615 INFO ThreadPoolExecutor-19_1 test.integration_test: Got 200 response after 0.028s from GET to https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHLU5XHM5U&Signature=n6K4DvYAiBO5AYhezmC3M40Vbno%3D&x-amz-security-token=[MASKED]&Expires=1737764039
32 2025-01-24 23:13:59,615 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … with response headers HTTPHeaderDict({'x-amz-id-2': 'OdN8trm172HLoExPeHhezEk7BtEcFjNoQFLgNjgpDo6cjhDD1yn6RY8pBJPfizGXuNZuz+fUurxf1OPx75Lc8YfMHb5nxymvr2B7UOq9Ybg=', 'x-amz-request-id': 'ZNX76YYVQK9EXX0E', 'Date': 'Fri, 24 Jan 2025 23:14:00 GMT', 'Last-Modified': 'Fri, 24 Jan 2025 23:13:59 GMT', 'x-amz-expiration': 'expiry-date="Sat, 01 Feb 2025 00:00:00 GMT", rule-id="manifests"', 'ETag': '"83477e7a5d7534b5cbed31a9cd968054"', 'x-amz-tagging-count': '1', 'x-amz-server-side-encryption': 'AES256', 'Content-Disposition': 'attachment;filename="hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl"', 'Accept-Ranges': 'bytes', 'Content-Type': 'application/jsonl', 'Content-Length': '185001', 'Server': 'AmazonS3'})
33 2025-01-24 23:13:59,615 DEBUG ThreadPoolExecutor-19_1 test.integration_test: … with response body b'{"value": {"links": [{"link_type": "process_link", "process_id": "7fcc4a62-cd11-46ca-90ae-4ad8fdcb9e1a", "process_type": "pro...'
34 2025-01-24 23:13:59,622 INFO ThreadPoolExecutor-19_1 test.integration_test: Manifest contains 73 replicas
35 2025-01-24 23:13:59,643 WARNING ThreadPoolExecutor-19_2 urllib3.connectionpool: Connection pool is full, discarding connection: service.azul.data.humancellatlas.org. Connection pool size: 1
36 2025-01-24 23:13:59,646 INFO ThreadPoolExecutor-19_2 test.integration_test: Got 302 response after 0.180s from GET to https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ==
37 2025-01-24 23:13:59,647 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … with response headers HTTPHeaderDict({'Content-Type': 'text/plain', 'Content-Length': '3045', 'Connection': 'keep-alive', 'Date': 'Fri, 24 Jan 2025 23:13:59 GMT', 'X-Amzn-Trace-Id': 'Root=1-67941eb7-61d77da317b0d44e5fbc34d8;Parent=091c1e3c11988438;Sampled=0;Lineage=1:2808a99a:0', 'x-amzn-RequestId': '063ff90a-69ad-4d98-97d5-781c77c9fae4', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Access-Control-Allow-Headers': 'Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key', 'X-Frame-Options': 'DENY', 'Content-Security-Policy': "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", 'x-amz-apigw-id': 'E6m8vHLioAMEFQw=', 'Cache-Control': 'no-store', 'X-Content-Type-Options': 'nosniff', 'Location': 'https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHGYRMTT5A&Signature=kcYimdFQoOuvRJ2kZZIyJYxoDZg%3D&x-amz-security-token=[MASKED]&Expires=1737764039', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 05f4e6c9553ff5b6620e13adbd08b064.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P3', 'X-Amz-Cf-Id': 'pQ8K7ZjByn-_uYoqn1OpRtWvcL5DUWtIWo6BNP5kH3xdxoWeZKVopg=='})
38 2025-01-24 23:13:59,647 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … with response body b'\nDownload the manifest in cmd.exe with curl using:\n\ncurl.exe --location --fail --output "hca-manifest-20dcb2a8-a9a9-5abc-80...'
39 2025-01-24 23:13:59,651 INFO ThreadPoolExecutor-19_2 test.integration_test: Making GET request to 'https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHGYRMTT5A&Signature=kcYimdFQoOuvRJ2kZZIyJYxoDZg%3D&x-amz-security-token=[MASKED]&Expires=1737764039'
40 2025-01-24 23:13:59,651 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … without request body
41 2025-01-24 23:13:59,698 INFO ThreadPoolExecutor-19_2 test.integration_test: Got 200 response after 0.045s from GET to https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHGYRMTT5A&Signature=kcYimdFQoOuvRJ2kZZIyJYxoDZg%3D&x-amz-security-token=[MASKED]&Expires=1737764039
42 2025-01-24 23:13:59,698 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … with response headers HTTPHeaderDict({'x-amz-id-2': 'pVDa2y+aHYsfrQTAccMHvnjHzqAUV7qjx8dM+soZBYlHQpcI+QdcjEAVMdyszlrFoGBiTSiiepr5Zvn2rAZKyfzSaSbns+eNwldbdxFO/zM=', 'x-amz-request-id': 'ZNX6ZG8RY97CSRK4', 'Date': 'Fri, 24 Jan 2025 23:14:00 GMT', 'Last-Modified': 'Fri, 24 Jan 2025 23:13:59 GMT', 'x-amz-expiration': 'expiry-date="Sat, 01 Feb 2025 00:00:00 GMT", rule-id="manifests"', 'ETag': '"83477e7a5d7534b5cbed31a9cd968054"', 'x-amz-tagging-count': '1', 'x-amz-server-side-encryption': 'AES256', 'Content-Disposition': 'attachment;filename="hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl"', 'Accept-Ranges': 'bytes', 'Content-Type': 'application/jsonl', 'Content-Length': '185001', 'Server': 'AmazonS3'})
43 2025-01-24 23:13:59,698 DEBUG ThreadPoolExecutor-19_2 test.integration_test: … with response body b'{"value": {"links": [{"link_type": "process_link", "process_id": "7fcc4a62-cd11-46ca-90ae-4ad8fdcb9e1a", "process_type": "pro...'
44 2025-01-24 23:13:59,703 INFO ThreadPoolExecutor-19_2 test.integration_test: Manifest contains 73 replicas
45 2025-01-24 23:14:00,167 INFO ThreadPoolExecutor-19_0 test.integration_test: Making GET request to 'https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywIAAQ=='
46 2025-01-24 23:14:00,167 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … without request body
47 2025-01-24 23:14:00,263 INFO ThreadPoolExecutor-19_0 test.integration_test: Got 302 response after 0.096s from GET to https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywIAAQ==
48 2025-01-24 23:14:00,263 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … with response headers HTTPHeaderDict({'Content-Type': 'text/plain', 'Content-Length': '3073', 'Connection': 'keep-alive', 'Date': 'Fri, 24 Jan 2025 23:14:00 GMT', 'X-Amzn-Trace-Id': 'Root=1-67941eb8-19dd6ad85d4b59a95e08b0d9;Parent=2d3d657fff12413b;Sampled=0;Lineage=1:2808a99a:0', 'x-amzn-RequestId': '68eedc90-c73e-44fa-85ed-e4793aced0ef', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'X-XSS-Protection': '1; mode=block', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Access-Control-Allow-Headers': 'Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key', 'X-Frame-Options': 'DENY', 'Content-Security-Policy': "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", 'x-amz-apigw-id': 'E6m81FYQIAMEgyg=', 'Cache-Control': 'no-store', 'X-Content-Type-Options': 'nosniff', 'Location': 'https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHCUSGLOWQ&Signature=7J2v%2BQoQVIt02NjQK%2BPyO8J4Y%2Fc%3D&x-amz-security-token=[MASKED]&Expires=1737764040', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 c2ab43aa4f5ff7b5cc58894f6a8494bc.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P3', 'X-Amz-Cf-Id': 't2QGpFNg_fE9uHYdqlXokYyhXytZzEDL6js9YpRyRig0oKGRkMdQzw=='})
49 2025-01-24 23:14:00,264 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … with response body b'\nDownload the manifest in cmd.exe with curl using:\n\ncurl.exe --location --fail --output "hca-manifest-20dcb2a8-a9a9-5abc-80...'
50 2025-01-24 23:14:00,265 INFO ThreadPoolExecutor-19_0 test.integration_test: Making GET request to 'https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHCUSGLOWQ&Signature=7J2v%2BQoQVIt02NjQK%2BPyO8J4Y%2Fc%3D&x-amz-security-token=[MASKED]&Expires=1737764040'
51 2025-01-24 23:14:00,266 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … without request body
52 2025-01-24 23:14:00,303 INFO ThreadPoolExecutor-19_0 test.integration_test: Got 200 response after 0.037s from GET to https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl?response-content-disposition=attachment%3Bfilename%3D%22hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl%22&AWSAccessKeyId=ASIAX4XVRZKHCUSGLOWQ&Signature=7J2v%2BQoQVIt02NjQK%2BPyO8J4Y%2Fc%3D&x-amz-security-token=[MASKED]&Expires=1737764040
53 2025-01-24 23:14:00,303 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … with response headers HTTPHeaderDict({'x-amz-id-2': 'hGNrn99NeDY7FWa7x3WAbzVvuaElP7tFp7xIYeZ20C7BA+1ZUJ+Dm9EUEjvp4o7KING+ECx/NhUOMq2+PXM9hCG9cLCea7VaXt29wycWDqA=', 'x-amz-request-id': 'FMX86QBEMSW1VP6V', 'Date': 'Fri, 24 Jan 2025 23:14:01 GMT', 'Last-Modified': 'Fri, 24 Jan 2025 23:13:59 GMT', 'x-amz-expiration': 'expiry-date="Sat, 01 Feb 2025 00:00:00 GMT", rule-id="manifests"', 'ETag': '"83477e7a5d7534b5cbed31a9cd968054"', 'x-amz-tagging-count': '1', 'x-amz-server-side-encryption': 'AES256', 'Content-Disposition': 'attachment;filename="hca-manifest-20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl"', 'Accept-Ranges': 'bytes', 'Content-Type': 'application/jsonl', 'Content-Length': '185001', 'Server': 'AmazonS3'})
54 2025-01-24 23:14:00,303 DEBUG ThreadPoolExecutor-19_0 test.integration_test: … with response body b'{"value": {"links": [{"link_type": "process_link", "process_id": "7fcc4a62-cd11-46ca-90ae-4ad8fdcb9e1a", "process_type": "pro...'
55 2025-01-24 23:14:00,307 INFO ThreadPoolExecutor-19_0 test.integration_test: Manifest contains 73 replicas
56 2025-01-24 23:14:00,313 INFO MainThread test.integration_test: Failed sub-test [manifest] {'catalog': 'dcp45-it', 'format': <ManifestFormat.verbatim_jsonl: 'verbatim.jsonl'>, 'i': 1, 'fetch': False}

@hannes-ucsc
Copy link
Member Author

The request logs for that subtest, but only the initial three request, not the redirects. There are three requests, one for each worker. Two requests run more or less in parallel. One is a straggler. The first two requests observe the execution from the previous subtest (i=0) and initiate the second execution. One request actually initiates it, the other one idempotently adopts it. This execution finishes before the straggler gets to attempting to start any execution at all. When it finally does, it observes two complete executions: the one from the previous subtest (line 79) and the one from this subtest (line 81), the one that just completed. The straggler then launches a third execution (line 83) because it is still under the, albeit false, impression that the manifest object is still absent, since it only checks for manifest presence once, at the beginning.

1 START RequestId: 233c5da7-628b-42d4-9b3a-21f9d471d7d0 Version: $LATEST
2 [INFO] 2025-01-24T23:13:58.284Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.chalice Received PUT request for '/manifest/files', with {"query": {"catalog": "dcp45-it", "filters": "{"fileSize": {"within": [[0, 1817888187269]]}}", "format": "verbatim.jsonl"}, "headers": {"accept-encoding": "identity", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.20", "via": "1.1 1e54a46a2c47322a41a5f070c060e878.cloudfront.net (CloudFront)", "x-amz-cf-id": "iQN2zfbwL531cZKNAbwTehQM6rEMtiJzpts2hgW_nzARQ2qEFelEYQ==", "x-amzn-trace-id": "Root=1-67941eb6-7a4245d81abfd06947b9520b", "x-forwarded-for": "52.3.190.111, 18.68.30.135", "x-forwarded-port": "443", "x-forwarded-proto": "https"}}.
3 [INFO] 2025-01-24T23:13:58.284Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.chalice Did not authenticate request.
4 START RequestId: 9f8ef097-61db-45a2-97cd-922f825bc166 Version: $LATEST
5 [INFO] 2025-01-24T23:13:58.285Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 dynamodb.GetItem: Making POST request to https://dynamodb.us-east-1.amazonaws.com/
6 [INFO] 2025-01-24T23:13:58.286Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 dynamodb.GetItem: … with request body b'{"TableName": "azul-sources_cache_by_auth-prod", "Key": {"identity": {"S": "dcp45-it:"}}, "ProjectionExpression": "sources,ex...'
7 [INFO] 2025-01-24T23:13:58.285Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.chalice Received PUT request for '/manifest/files', with {"query": {"catalog": "dcp45-it", "filters": "{"fileSize": {"within": [[0, 1817888187269]]}}", "format": "verbatim.jsonl"}, "headers": {"accept-encoding": "identity", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.20", "via": "1.1 c2ab43aa4f5ff7b5cc58894f6a8494bc.cloudfront.net (CloudFront)", "x-amz-cf-id": "8Wi0cln5liXCZaWJYB_xnc_lwRjBTH-cOunKlEKjS_aE52niiBql3Q==", "x-amzn-trace-id": "Root=1-67941eb6-5bb2dc7636bf207315b8a9ba", "x-forwarded-for": "52.3.190.111, 18.68.30.170", "x-forwarded-port": "443", "x-forwarded-proto": "https"}}.
8 [INFO] 2025-01-24T23:13:58.286Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.chalice Did not authenticate request.
9 [INFO] 2025-01-24T23:13:58.287Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 dynamodb.GetItem: Making POST request to https://dynamodb.us-east-1.amazonaws.com/
10 [INFO] 2025-01-24T23:13:58.287Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 dynamodb.GetItem: … with request body b'{"TableName": "azul-sources_cache_by_auth-prod", "Key": {"identity": {"S": "dcp45-it:"}}, "ProjectionExpression": "sources,ex...'
11 START RequestId: d37a5351-da2f-4356-802b-6df64ca4b09e Version: $LATEST
12 [INFO] 2025-01-24T23:13:58.289Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.chalice Received PUT request for '/manifest/files', with {"query": {"catalog": "dcp45-it", "filters": "{"fileSize": {"within": [[0, 1817888187269]]}}", "format": "verbatim.jsonl"}, "headers": {"accept-encoding": "identity", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.20", "via": "1.1 f7f49dad2d783fde3adeef21381de800.cloudfront.net (CloudFront)", "x-amz-cf-id": "jLgo3tYeZsiUMWKOrbUOB_ji7NJOdT9PPo3nLKd-BWNfOJNlTy6vhg==", "x-amzn-trace-id": "Root=1-67941eb6-4914599347d60d1508ab45f1", "x-forwarded-for": "52.3.190.111, 18.68.30.148", "x-forwarded-port": "443", "x-forwarded-proto": "https"}}.
13 [INFO] 2025-01-24T23:13:58.290Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.chalice Did not authenticate request.
14 [INFO] 2025-01-24T23:13:58.293Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 dynamodb.GetItem: Got 200 response
15 [INFO] 2025-01-24T23:13:58.293Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 dynamodb.GetItem: … with response body b'{"Item":{"expiration":{"N":"1737760462"},"sources":{"S":"[\"81b6569e-bbd8-4e4a-a267-174865e293da\", \"b9aa492e-056b-490d-956a...'
16 [INFO] 2025-01-24T23:13:58.293Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 dynamodb.GetItem: Got 200 response
17 [INFO] 2025-01-24T23:13:58.293Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 dynamodb.GetItem: … with response body b'{"Item":{"expiration":{"N":"1737760462"},"sources":{"S":"[\"81b6569e-bbd8-4e4a-a267-174865e293da\", \"b9aa492e-056b-490d-956a...'
18 [DEBUG] 2025-01-24T23:13:58.294Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.service.manifest_service Computing content hash for manifest using filters Filters(explicit={'fileSize': {'within': [[0, 1817888187269]]}}, source_ids={…}) ...
19 [DEBUG] 2025-01-24T23:13:58.294Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.service.manifest_service Computing content hash for manifest using filters Filters(explicit={'fileSize': {'within': [[0, 1817888187269]]}}, source_ids={…}) ...
20 [INFO] 2025-01-24T23:13:58.297Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 elasticsearch Making POST request to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp45-it_files_aggregate/_search
21 [INFO] 2025-01-24T23:13:58.297Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 elasticsearch … with request body b'{"_source":{"includes":["entity_id","contents.projects.document_id","contents.donors.document_id","contents.analysis_protocol...'
22 [INFO] 2025-01-24T23:13:58.297Z 9f8ef097-61db-45a2-97cd-922f825bc166 elasticsearch Making POST request to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp45-it_files_aggregate/_search
23 [INFO] 2025-01-24T23:13:58.297Z 9f8ef097-61db-45a2-97cd-922f825bc166 elasticsearch … with request body b'{"_source":{"includes":["entity_id","contents.projects.document_id","contents.donors.document_id","contents.analysis_protocol...'
24 [INFO] 2025-01-24T23:13:58.310Z 9f8ef097-61db-45a2-97cd-922f825bc166 elasticsearch Got 200 response after 0.013s from POST to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp45-it_files_aggregate/_search
25 [INFO] 2025-01-24T23:13:58.310Z 9f8ef097-61db-45a2-97cd-922f825bc166 elasticsearch … with response body '{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":8,"relation":"…'
26 [INFO] 2025-01-24T23:13:58.310Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 elasticsearch Got 200 response after 0.013s from POST to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp45-it_files_aggregate/_search
27 [INFO] 2025-01-24T23:13:58.311Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 elasticsearch … with response body '{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":8,"relation":"…'
28 [INFO] 2025-01-24T23:13:58.311Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.service.manifest_service Manifest content hash -1718303542 was computed in 0.017s using filters Filters(explicit={'fileSize': {'within': [[0, 1817888187269]]}}, source_ids={…}).
29 [INFO] 2025-01-24T23:13:58.310Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.service.manifest_service Manifest content hash -1718303542 was computed in 0.016s using filters Filters(explicit={'fileSize': {'within': [[0, 1817888187269]]}}, source_ids={…}).
30 [INFO] 2025-01-24T23:13:58.312Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 s3.HeadObject: Making HEAD request to https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl
31 [INFO] 2025-01-24T23:13:58.312Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 s3.HeadObject: … without request body
32 [INFO] 2025-01-24T23:13:58.313Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 s3.HeadObject: Making HEAD request to https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl
33 [INFO] 2025-01-24T23:13:58.313Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 s3.HeadObject: … without request body
34 [INFO] 2025-01-24T23:13:58.326Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 s3.HeadObject: Got 404 response
35 [INFO] 2025-01-24T23:13:58.326Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 s3.HeadObject: Got 404 response
36 [INFO] 2025-01-24T23:13:58.326Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.boto3 s3.HeadObject: … with response body b''
37 [INFO] 2025-01-24T23:13:58.327Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.boto3 s3.HeadObject: … with response body b''
38 [INFO] 2025-01-24T23:13:58.327Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.service.manifest_service Cached manifest not found: ManifestKey(catalog='dcp45-it', format=<ManifestFormat.verbatim_jsonl: 'verbatim.jsonl'>, manifest_hash=UUID('20dcb2a8-a9a9-5abc-8094-925d8536cbf3'), source_hash=UUID('cd5a02ef-5b9f-5597-9e23-c576c1db5277'))
39 [INFO] 2025-01-24T23:13:58.327Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.service.manifest_service Cached manifest not found: ManifestKey(catalog='dcp45-it', format=<ManifestFormat.verbatim_jsonl: 'verbatim.jsonl'>, manifest_hash=UUID('20dcb2a8-a9a9-5abc-8094-925d8536cbf3'), source_hash=UUID('cd5a02ef-5b9f-5597-9e23-c576c1db5277'))
40 [INFO] 2025-01-24T23:13:58.327Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.service.async_manifest_service Starting execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_0' for input {'filters': {'explicit': {'fileSize': {'within': [[0, 1817888187269]]}}, 'source_ids': […]}, 'manifest_key': {'catalog': 'dcp45-it', 'format': 'verbatim.jsonl', 'manifest_hash': '20dcb2a8-a9a9-5abc-8094-925d8536cbf3', 'source_hash': 'cd5a02ef-5b9f-5597-9e23-c576c1db5277'}, 'partition': {'index': 0, 'is_last': False, 'file_name': None, 'config': None, 'multipart_upload_id': None, 'part_etags': None, 'page_index': None, 'is_last_page': None, 'search_after': None}}
41 [INFO] 2025-01-24T23:13:58.328Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.service.async_manifest_service Starting execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_0' for input {'filters': {'explicit': {'fileSize': {'within': [[0, 1817888187269]]}}, 'source_ids': […]}, 'manifest_key': {'catalog': 'dcp45-it', 'format': 'verbatim.jsonl', 'manifest_hash': '20dcb2a8-a9a9-5abc-8094-925d8536cbf3', 'source_hash': 'cd5a02ef-5b9f-5597-9e23-c576c1db5277'}, 'partition': {'index': 0, 'is_last': False, 'file_name': None, 'config': None, 'multipart_upload_id': None, 'part_etags': None, 'page_index': None, 'is_last_page': None, 'search_after': None}}
42 [INFO] 2025-01-24T23:13:58.328Z d37a5351-da2f-4356-802b-6df64ca4b09e botocore.credentials Found credentials in environment variables.
43 [INFO] 2025-01-24T23:13:58.392Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.service.async_manifest_service A completed execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_0' already exists
44 [INFO] 2025-01-24T23:13:58.392Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.service.async_manifest_service Starting execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_1' for input {'filters': {'explicit': {'fileSize': {'within': [[0, 1817888187269]]}}, 'source_ids': […]}, 'manifest_key': {'catalog': 'dcp45-it', 'format': 'verbatim.jsonl', 'manifest_hash': '20dcb2a8-a9a9-5abc-8094-925d8536cbf3', 'source_hash': 'cd5a02ef-5b9f-5597-9e23-c576c1db5277'}, 'partition': {'index': 0, 'is_last': False, 'file_name': None, 'config': None, 'multipart_upload_id': None, 'part_etags': None, 'page_index': None, 'is_last_page': None, 'search_after': None}}
45 [INFO] 2025-01-24T23:13:58.398Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.service.async_manifest_service A completed execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_0' already exists
46 [INFO] 2025-01-24T23:13:58.399Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.service.async_manifest_service Starting execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_1' for input {'filters': {'explicit': {'fileSize': {'within': [[0, 1817888187269]]}}, 'source_ids': […]}, 'manifest_key': {'catalog': 'dcp45-it', 'format': 'verbatim.jsonl', 'manifest_hash': '20dcb2a8-a9a9-5abc-8094-925d8536cbf3', 'source_hash': 'cd5a02ef-5b9f-5597-9e23-c576c1db5277'}, 'partition': {'index': 0, 'is_last': False, 'file_name': None, 'config': None, 'multipart_upload_id': None, 'part_etags': None, 'page_index': None, 'is_last_page': None, 'search_after': None}}
47 [INFO] 2025-01-24T23:13:58.427Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.deployment Allocated new Boto3 client for 'dynamodb' with ID 140483364102384
48 [INFO] 2025-01-24T23:13:58.430Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 dynamodb.GetItem: Making POST request to https://dynamodb.us-east-1.amazonaws.com/
49 [INFO] 2025-01-24T23:13:58.430Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 dynamodb.GetItem: … with request body b'{"TableName": "azul-sources_cache_by_auth-prod", "Key": {"identity": {"S": "dcp45-it:"}}, "ProjectionExpression": "sources,ex...'
50 [INFO] 2025-01-24T23:13:58.454Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.service.async_manifest_service Started execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_1' or it was already running
51 [INFO] 2025-01-24T23:13:58.453Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.service.async_manifest_service Started execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_1' or it was already running
52 [DEBUG] 2025-01-24T23:13:58.454Z 9f8ef097-61db-45a2-97cd-922f825bc166 azul.chalice Returning 301 response with headers {"Location": "https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ==", "Retry-After": "1", "Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Content-Security-Policy": "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", "Referrer-Policy": "strict-origin-when-cross-origin", "Strict-Transport-Security": "max-age=63072000; includeSubDomains; preload", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "X-XSS-Protection": "1; mode=block", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body.
53 null
54 [DEBUG] 2025-01-24T23:13:58.455Z 233c5da7-628b-42d4-9b3a-21f9d471d7d0 azul.chalice Returning 301 response with headers {"Location": "https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywEAAQ==", "Retry-After": "1", "Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Content-Security-Policy": "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", "Referrer-Policy": "strict-origin-when-cross-origin", "Strict-Transport-Security": "max-age=63072000; includeSubDomains; preload", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "X-XSS-Protection": "1; mode=block", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body.
55 null
56 END RequestId: 233c5da7-628b-42d4-9b3a-21f9d471d7d0
57 REPORT RequestId: 233c5da7-628b-42d4-9b3a-21f9d471d7d0 Duration: 172.67 ms Billed Duration: 173 ms Memory Size: 2048 MB Max Memory Used: 177 MB
58 END RequestId: 9f8ef097-61db-45a2-97cd-922f825bc166
59 REPORT RequestId: 9f8ef097-61db-45a2-97cd-922f825bc166 Duration: 171.00 ms Billed Duration: 172 ms Memory Size: 2048 MB Max Memory Used: 217 MB
60 [INFO] 2025-01-24T23:13:58.497Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 dynamodb.GetItem: Got 200 response
61 [INFO] 2025-01-24T23:13:58.498Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 dynamodb.GetItem: … with response body b'{"Item":{"expiration":{"N":"1737760462"},"sources":{"S":"[\"81b6569e-bbd8-4e4a-a267-174865e293da\", \"b9aa492e-056b-490d-956a...'
62 [DEBUG] 2025-01-24T23:13:58.499Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.manifest_service Computing content hash for manifest using filters Filters(explicit={'fileSize': {'within': [[0, 1817888187269]]}}, source_ids={…}) ...
63 [DEBUG] 2025-01-24T23:13:58.500Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.es Creating ES client [vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443]
64 [INFO] 2025-01-24T23:13:58.507Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.deployment Allocated new Boto3 client for 'sts' with ID 140483298719120
65 [INFO] 2025-01-24T23:13:58.527Z d37a5351-da2f-4356-802b-6df64ca4b09e botocore.credentials Found credentials in environment variables.
66 [INFO] 2025-01-24T23:13:58.532Z d37a5351-da2f-4356-802b-6df64ca4b09e elasticsearch Making POST request to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp45-it_files_aggregate/_search
67 [INFO] 2025-01-24T23:13:58.532Z d37a5351-da2f-4356-802b-6df64ca4b09e elasticsearch … with request body b'{"_source":{"includes":["entity_id","contents.projects.document_id","contents.donors.document_id","contents.analysis_protocol...'
68 [INFO] 2025-01-24T23:13:58.594Z d37a5351-da2f-4356-802b-6df64ca4b09e elasticsearch Got 200 response after 0.062s from POST to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp45-it_files_aggregate/_search
69 [INFO] 2025-01-24T23:13:58.594Z d37a5351-da2f-4356-802b-6df64ca4b09e elasticsearch … with response body '{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":8,"relation":"…'
70 [INFO] 2025-01-24T23:13:58.595Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.manifest_service Manifest content hash -1718303542 was computed in 0.095s using filters Filters(explicit={'fileSize': {'within': [[0, 1817888187269]]}}, source_ids={…}).
71 [INFO] 2025-01-24T23:13:58.696Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.deployment Allocated new Boto3 client for 's3' with ID 140483296310048
72 [INFO] 2025-01-24T23:13:58.699Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 s3.HeadObject: Making HEAD request to https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-prod-storage-prod.us-east-1/manifests/20dcb2a8-a9a9-5abc-8094-925d8536cbf3.cd5a02ef-5b9f-5597-9e23-c576c1db5277.jsonl
73 [INFO] 2025-01-24T23:13:58.699Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 s3.HeadObject: … without request body
74 [INFO] 2025-01-24T23:13:58.768Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 s3.HeadObject: Got 404 response
75 [INFO] 2025-01-24T23:13:58.769Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.boto3 s3.HeadObject: … with response body b''
76 [INFO] 2025-01-24T23:13:58.769Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.manifest_service Cached manifest not found: ManifestKey(catalog='dcp45-it', format=<ManifestFormat.verbatim_jsonl: 'verbatim.jsonl'>, manifest_hash=UUID('20dcb2a8-a9a9-5abc-8094-925d8536cbf3'), source_hash=UUID('cd5a02ef-5b9f-5597-9e23-c576c1db5277'))
77 [INFO] 2025-01-24T23:13:58.770Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.async_manifest_service Starting execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_0' for input {'filters': {'explicit': {'fileSize': {'within': [[0, 1817888187269]]}}, 'source_ids': […]}, 'manifest_key': {'catalog': 'dcp45-it', 'format': 'verbatim.jsonl', 'manifest_hash': '20dcb2a8-a9a9-5abc-8094-925d8536cbf3', 'source_hash': 'cd5a02ef-5b9f-5597-9e23-c576c1db5277'}, 'partition': {'index': 0, 'is_last': False, 'file_name': None, 'config': None, 'multipart_upload_id': None, 'part_etags': None, 'page_index': None, 'is_last_page': None, 'search_after': None}}
78 [INFO] 2025-01-24T23:13:58.783Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.deployment Allocated new Boto3 client for 'stepfunctions' with ID 140483292236064
79 [INFO] 2025-01-24T23:13:59.013Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.async_manifest_service A completed execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_0' already exists
80 [INFO] 2025-01-24T23:13:59.013Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.async_manifest_service Starting execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_1' for input {'filters': {'explicit': {'fileSize': {'within': [[0, 1817888187269]]}}, 'source_ids': […]}, 'manifest_key': {'catalog': 'dcp45-it', 'format': 'verbatim.jsonl', 'manifest_hash': '20dcb2a8-a9a9-5abc-8094-925d8536cbf3', 'source_hash': 'cd5a02ef-5b9f-5597-9e23-c576c1db5277'}, 'partition': {'index': 0, 'is_last': False, 'file_name': None, 'config': None, 'multipart_upload_id': None, 'part_etags': None, 'page_index': None, 'is_last_page': None, 'search_after': None}}
81 [INFO] 2025-01-24T23:13:59.087Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.async_manifest_service A completed execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_1' already exists
82 [INFO] 2025-01-24T23:13:59.087Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.async_manifest_service Starting execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_2' for input {'filters': {'explicit': {'fileSize': {'within': [[0, 1817888187269]]}}, 'source_ids': […]}, 'manifest_key': {'catalog': 'dcp45-it', 'format': 'verbatim.jsonl', 'manifest_hash': '20dcb2a8-a9a9-5abc-8094-925d8536cbf3', 'source_hash': 'cd5a02ef-5b9f-5597-9e23-c576c1db5277'}, 'partition': {'index': 0, 'is_last': False, 'file_name': None, 'config': None, 'multipart_upload_id': None, 'part_etags': None, 'page_index': None, 'is_last_page': None, 'search_after': None}}
83 [INFO] 2025-01-24T23:13:59.156Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.service.async_manifest_service Started execution 'arn:aws:states:us-east-1:542754589326:execution:azul-manifest-prod:8ce972a1-8092-5ad6-aa8b-2b1394b55dcb_2' or it was already running
84 [DEBUG] 2025-01-24T23:13:59.157Z d37a5351-da2f-4356-802b-6df64ca4b09e azul.chalice Returning 301 response with headers {"Location": "https://service.azul.data.humancellatlas.org/manifest/files/lMQQjOlyoYCSWtaqiysTlLVdywIAAQ==", "Retry-After": "1", "Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Content-Security-Policy": "default-src 'self';img-src 'self' data:;script-src 'self';style-src 'self';frame-ancestors 'none'", "Referrer-Policy": "strict-origin-when-cross-origin", "Strict-Transport-Security": "max-age=63072000; includeSubDomains; preload", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "X-XSS-Protection": "1; mode=block", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body.
85 null
86 END RequestId: d37a5351-da2f-4356-802b-6df64ca4b09e
87 REPORT RequestId: d37a5351-da2f-4356-802b-6df64ca4b09e Duration: 870.23 ms Billed Duration: 871 ms Memory Size: 2048 MB Max Memory Used: 161 MB Init Duration: 2636.35 ms

@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented Jan 25, 2025

We can identify the following relevant event types:

A: the service detects that the requested manifest is absent
B: the service attempts to start an execution that has already completed
C: the service successfully starts a new execution
D: the execution finishes, the resulting manifest object exists

The following sequence represents the race:

request 0:   A     B   C
request 1:     A     B   C
request 2:       A           B B C
execution 1:               D
execution 2:                       D

@dsotirho-ucsc
Copy link
Contributor

@hannes-ucsc: "Assignee to file a partial PR that adds a workaround by setting the number of workers for that integration test from 3 to 1."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- [priority] Medium bug [type] A defect preventing use of the system as specified groomed [process] Issue was recently looked at during backlog grooming manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

2 participants