-
Notifications
You must be signed in to change notification settings - Fork 3
Enable mirroring in anvildev and anvilbox (#7154, #7214) #7223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Enable mirroring in anvildev and anvilbox (#7154, #7214) #7223
Conversation
85448b8 to
1cb13db
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7223 +/- ##
===========================================
- Coverage 84.88% 84.86% -0.02%
===========================================
Files 157 157
Lines 22796 22816 +20
===========================================
+ Hits 19350 19363 +13
- Misses 3446 3453 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1cb13db to
9dc69e5
Compare
dsotirho-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
hannes-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will get conflicts once the backport I just approved land on develop.
It's not evident from the title of that commit that removes the snapshot what the defect is. Either add a ticket reference or describe the defect in the body of the commit message. Probably the former is warranted so that we can point our collaborators at it.
9dc69e5 to
a0ad747
Compare
The base branch was changed.
a0ad747 to
8056e9e
Compare
hannes-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the link in the message body of the middle commit is helpful, the body should be self-contained and include a description of the alleged defect.
dsotirho-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
| def missing_md5(row: BigQueryRow) -> bool: | ||
| missing = row['file_md5sum'] is None | ||
| if missing: | ||
| assert source.spec.name == 'ANVIL_1000G_2019_Dev_20230609_ANV5_202306121732', R( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a (hopefully) temporary workaround, there should be a FIXME conmment before it. The FIXME comment should also be in every environment.py where this specific source is mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return {'*'} | ||
| else: | ||
| columns = set(columns) | ||
| columns.add('datarepo_row_id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a little inefficient that we're copying the entire set, and are making the same additions/replacements, for every call. Let's use a cached property that returns a dict from table name to set of column names. We would still need this method to handle the default, but that is cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, but it would be good to add a comment explaining when we expect the default to kick in. I assume it's for non-schema tables.
and document use case for default value
c9c334a to
259849d
Compare
| @cached_property | ||
| def _columns_by_table(self) -> Mapping[str, AbstractSet[str]]: | ||
| # Include all columns for replicas of non-schema tables | ||
| columns_by_table = defaultdict(lambda: {'*'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like using defaultdict because read access can modify the dictionary, which contradicts the return type, or the intend behind it. I recently fixed a bug that was caused by exactly this misuse of defaultdict. As I wrote in my previous review, my preferred solution is to retain the _columns method.
If I remember correctly, this is now the second time in a short period were nuances in my review comments were (dis)missed. This is worries me a bit, given that we frequently only get to do one review per PR per day, and that this PR is for a high priority issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would still need this method to handle the default, but that is cheap.
I misinterpreted "this method" to refer to the newly added cached property (which is technically a method, or at least implemented using one), not the original method. Hence my mistaken belief that the default ought to provided within the body of that property definition.
| columns.add('datarepo_row_id') | ||
| return columns | ||
| @cached_property | ||
| def _columns_by_table(self) -> Mapping[str, AbstractSet[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _columns_by_table(self) -> Mapping[str, AbstractSet[str]]: | |
| def __columns_by_table(self) -> Mapping[str, AbstractSet[str]]: |
unless the resulting name mangling breaks the caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does break the caching.
>>> from azul.azulclient import AzulClient
>>> p = AzulClient().repository_plugin('anvil')
'_Plugin__schema_columns_by_table'?
>>> s1 = p._Plugin__schema_columns_by_table
>>> s2 = p._Plugin__schema_columns_by_table
>>> s1 == s2
True
>>> s1 is s2
False
hannes-ucsc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split commit doesn't follow our convention that all parts in a split have the same title, except for differences in the tags. Maybe you intended to remove the split designation altogether? Please also change "Workaround" to "Work around" in the commit title.
Previously disabled due to the issues with MD5 encoding
2969660 to
4695a1a
Compare
Connected issues: #7154, #7214
Checklist
Author
developissues/<GitHub handle of author>/<issue#>-<slug>1 when the issue title describes a problem, the corresponding PR
title is
Fix:followed by the issue titleAuthor (partiality)
ptag to titles of partial commitspartialor completely resolves all connected issuespartiallabelAuthor (chains)
baseor this PR is not chained to another PRchainedor is not chained to another PRAuthor (reindex, API changes)
rtag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:devor the changes introduced by it will not require reindexing ofdevreindex:anvildevor the changes introduced by it will not require reindexing ofanvildevreindex:anvilprodor the changes introduced by it will not require reindexing ofanvilprodreindex:prodor the changes introduced by it will not require reindexing ofprodreindex:partialand its description documents the specific reindexing procedure fordev,anvildev,anvilprodandprodor requires a full reindex or carries none of the labelsreindex:dev,reindex:anvildev,reindex:anvilprodandreindex:prodAPIor this PR does not modify a REST APIa(A) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.pyor this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.jsonand committed the resulting changes or this PR does not modifyazul_docker_images, or any other variables referenced in the definition of that variableutag to commit title or this PR does not require upgrading deploymentsupgradeor does not require upgrading deploymentsdeploy:sharedor does not modifydocker_images.json, and does not require deploying thesharedcomponent for any other reasondeploy:gitlabor does not require deploying thegitlabcomponentdeploy:runneror does not require deploying therunnerimageAuthor (hotfixes)
Ftag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprodandprod) have temporary hotfixes for any of the issues connected to this PRAuthor (before every review)
develop, squashed fixups from prior reviewsmake requirements_updateor this PR does not modifyrequirements*.txt,common.mk,MakefileandDockerfileRtag to commit title or this PR does not modifyrequirements*.txtreqsor does not modifyrequirements*.txtmake integration_testpasses in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
System administrator (after approval)
demoorno demono demono sandboxN reviewslabel is accurateOperator (before pushing merge the commit)
reindex:…labels andrcommit title tagno demodevelop_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlab_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlabdeploy:gitlabdeploy:gitlabSystem administrator
dev.gitlabare complete or this PR is not labeleddeploy:gitlabanvildev.gitlabare complete or this PR is not labeleddeploy:gitlabOperator (before pushing merge the commit)
_select dev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runner_select anvildev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runnersandboxlabel or PR is labeledno sandboxdevor PR is labeledno sandboxanvildevor PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxor this PR does not remove catalogs or otherwise causes unreferenced indices indevanvilboxor this PR does not remove catalogs or otherwise causes unreferenced indices inanvildevsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevSystem administrator
anvilboxOperator
pif the PR is also labeledpartialOperator (chain shortening)
developor this PR is not labeledbasechainedlabel from the blocked PR or this PR is not labeledbasebasebaselabel from this PR or this PR is not labeledbaseOperator (after pushing the merge commit)
devanvildevdevdevanvildevanvildev_select dev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shared_select anvildev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shareddevanvildevOperator (reindex)
devor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevdevor this PR does not require reindexingdevdeploy_browserjob in the GitLab pipeline for this PR indevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdeploy_browserjob in the GitLab pipeline for this PR inanvildevor this PR does not require reindexinganvildevSystem administrator
anvildevOperator
deploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels to the next promotion PRs or this PR carries none of these labelsdeploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
Lline is too longWline wrapping is wrongQbad quotesFother formatting problem