Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zep PostgresDatasource returns a list of batches. #6341

Merged
merged 3 commits into from
Nov 14, 2022

Conversation

billdirks
Copy link
Contributor

@billdirks billdirks commented Nov 10, 2022

Previously get_batch_list_from_batch_request always sent out a list of 1 item since the batch request had to fully specify a batch. The PR allows one to partial specify a batch request and get a list of batches that match that criteria. Note, along any dimension, one can set it exactly or leave it unset (eg, you can see all months in a year). Functionality to filter on a dimension (eg, just the last 3 months of a year) is coming in a later PR.

As part of this work I removed the nascent namespacing support for batch parameters. These were unused and not fully thought out making this work harder than necessary. We can add namespacing back in a better way when we figure out the use case better.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

@netlify
Copy link

netlify bot commented Nov 10, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit cba2fc5
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/63729e0639f33e00080c8ff8
😎 Deploy Preview https://deploy-preview-6341--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ghost
Copy link

ghost commented Nov 10, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@Kilo59 Kilo59 self-requested a review November 11, 2022 02:31
Copy link
Contributor

@Kilo59 Kilo59 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go back and review it in more detail. But I don't see any issues based on the config work that I'm doing.

Comment on lines +105 to +106
def _valid_batch_request_options(self, options: BatchRequestOptions) -> bool:
return set(options.keys()).issubset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit

On closer look, I don't think my suggestion below makes sense given how open batch_request_options_template is.

Leaving the comment up in-case it inspires someone to try this technique elsewhere.


This may not be that useful. But we might be to do 2 things to improve the type narrowing here.

  1. Define a more specific PostgresBatchOptions TypedDict
  2. Make this _valid_batch_request_options method a TypeGuard for the PostgresBatchOptions type.

I'm not totally sure how a TypedDict works with extra keys though.

from typing_extensions import TypedDict, TypeGuard

class PostgresBatchOptions(TypedDict):
    foo: str
    bar: int

def _valid_batch_request_options(self, options: BatchRequestOptions) -> TypeGaurd[PostgresBatchOptions]:
        return set(options.keys()).issubset(
            set(self.batch_request_options_template().keys())
        )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra arguments are not supported in TypedDicts. Here's the github issue about it with some interesting discussion: python/mypy#4617

Originally I had implemented BatchOptions using generics and had it more strongly typed. I abandoned that after getting some UX feedback and have reverted it to this untyped dict which I'm validated at runtime for a particular DataAsset. To help with the pain of "what can I put here" I've made the batch_request_options_template method. I would be nice to make better checks in static analysis but I'm not sure how to do that with the current requirements.

I am not familiar with TypeGuards and will look into that.

default_option_values.append(
self.column_splitter.param_defaults[option]
)
for option_values in itertools.product(*default_option_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to use itertools more 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to implement this but then did some searching. Happy to have found it!

@billdirks billdirks enabled auto-merge (squash) November 14, 2022 19:58
@billdirks billdirks merged commit f6828a5 into develop Nov 14, 2022
@billdirks billdirks deleted the f/GREAT-1314/batch_list branch November 14, 2022 20:36
Shinnnyshinshin pushed a commit that referenced this pull request Nov 15, 2022
* develop:
  [MAINTENANCE] Update column_reflection_fallback to also use schema name for Trino (#6350)
  [MAINTENANCE] Misc cleanup of GX Cloud helpers (#6352)
  [MAINTENANCE] Pin `mypy` to `0.990` (#6361)
  Subject: Support to include ID/PK in validation result for each row t… (#5876)
  [MAINTENANCE] Fix computed metrics type hint in ExecutionEngine.resolve_metrics() method (#6347)
  [MAINTENANCE] Refactor `usage_stats_opt_out` method in DataContext (#5339)
  Zep PostgresDatasource returns a list of batches. (#6341)
  [DOCS] Add `yarn snippet-check` command (#6351)
  [MAINTENANCE] Refactor out `termcolor` dependency (#6348)
  [MAINTENANCE] Move Cloud-specific enums to `cloud_constants.py` (#6349)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants