-
Notifications
You must be signed in to change notification settings - Fork 41
Refactored Water Data Pipeline #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,356 @@ | |||
from message_ix_models.model.water.utils import Rule | |||
|
|||
WD_CONST = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some description for each series/rule on what are they for, what should be included and what no?
Does each _rule.py script need to have a xx_CONST series with a different name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add the descriptions for the rules.
The constants are just to hold the magic numbers which were used in the rules. Some of them are common, but I have left them for now.
WD : Water Demand
WS : Water Supply
|
||
|
||
@minimum_version("python 3.10") | ||
def run_standard( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you could rename this function, as the word "run" has in our context a different meaning. Something like "build" "fill" "parametrize", might be more understandable. not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just commenting, did not want to make a full review
Introduces a string-based DSL engine that can evaluate Python expressions within strings, plus a full suite of water-data operations. Incl unit tests for engine and operations. BREAKING CHANGE: requires Python 3.10+; minimum-version markers added to functions.
Migrated water demans to use the new DSL engine. Includes the creation of demand_rules.py. Added a new test for parity with the legacy code and updated demands test.
Migrated water supply to use the new DSL engine. Includes the creation of supply_rules. Added a new test for parity with the legacy code and updated supply test.
Similar to previous for infrastructure now.
Helpful utility to wrap citations, used to replace citations in comments with a consistent and uniform format.
Similar to previous, uses citation wrapper.
Additional to DSL migration, this breaks file into two files: water_for_ppl_rules.py and water_for_ppl_support.py. Cyclomatic complexity is reduced.
6b2de6c
to
3a255cc
Compare
Thanks @Wegatriespython to polish the PR, actually make a new one. Vignesh still has to:
|
Refactored Water Data Pipeline
Short Desc
Improve readability & testability via DSL(Domain Specific Language)
Long Desc
CitationWrapper
utility for citationsExample Before
Procedural code with repeated
make_df
/broadcast
steps and chainedconcat
on the same variable names, making it hard to follow flow and test individual pieces:Example After
How to review
PR checklist