Issue by elijahbenizzy
Monday Jul 04, 2022 at 22:38 GMT
Originally opened as stitchfix/hamilton#149
OK so this is a pure proof of concept. Not necessarily the right way to do things, and not tested. That said, I wanted to prove the following:
- That we could build a two-step data quality pass (E.G. with a profiler and a validator). This will quickly be a whylogs blocker.
- That we can use config to enable/disable items at run/compile time.
- That we can add an applies_to keyword to narrow focus of data quality.
(1) is useful for integrations with complex stuff -- E.G. an expensive profiling step with lots of validations.
(2) is useful for disabling -- this will probably be the first we release.
(3) is useful for extract_columns -- it now makes it clear what it applies to.
While some of this code still has placeholders and isn't tested, it demonstrates feasible solutions, and de-risks the release of data quality enough to make me comfortable.
Look through commits for more explanations.
Changes
Testing
Notes
Checklist
Testing checklist
Python - local testing
elijahbenizzy included the following code: https://github.com/stitchfix/hamilton/pull/149/commits
Monday Jul 04, 2022 at 22:38 GMT
Originally opened as stitchfix/hamilton#149
OK so this is a pure proof of concept. Not necessarily the right way to do things, and not tested. That said, I wanted to prove the following:
(1) is useful for integrations with complex stuff -- E.G. an expensive profiling step with lots of validations.
(2) is useful for disabling -- this will probably be the first we release.
(3) is useful for
extract_columns-- it now makes it clear what it applies to.While some of this code still has placeholders and isn't tested, it demonstrates feasible solutions, and de-risks the release of data quality enough to make me comfortable.
Look through commits for more explanations.
Changes
Testing
Notes
Checklist
Testing checklist
Python - local testing
elijahbenizzy included the following code: https://github.com/stitchfix/hamilton/pull/149/commits