-
Notifications
You must be signed in to change notification settings - Fork 141
SNOW-2084165 Add dataframe operation lineage on SnowparkSQLException #3339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lakedb/snowpark-python into aalam-SNOW-2084165-add-error-trace
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
…lakedb/snowpark-python into aalam-SNOW-2084165-add-error-trace
| """Returns the batch_ids of the children of this node.""" | ||
| return get_dependent_bind_ids(self.stmt_cache[self.batch_id]) | ||
|
|
||
| def get_src(self) -> Optional[proto.SrcPosition]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the hybrid client prototype we are using a slightly different method to get the source location; by just using inspect to walk the stack to the appropriate source location. We have to do this because modin is not using any of the AST stuff, but it's also relatively straight forward.
I sort of want to use your debugging tool for snowpandas as well; but we may want to refactor this so we don't require any of the protobuf work.
| def get_user_source_location(group: str) -> dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using this function?
| def context_manager_code_location(frame_info, func) -> Tuple[str, int]: |
Essentially we seem to have three approaches to this problem. I'm /less/ of a fan of the AST because it doesn't help pandas for this type of debugging. but it seems like we might be able to consolidate w/ the open telemetry approach.
| _enable_dataframe_trace_on_error = False | ||
|
|
||
|
|
||
| def configure_development_features( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is similar to what I would expect, see my comment on: #3380 (comment)
I'm thinking of the following to provide unified debug config experience.
@experimental(version="1.33.0")
def debug_config(
*
enable_eager_schema_validation=False,
enable_dataframe_trace_on_error=False,
)when users want to enable, they do
import snowflake.snowpark.context
snowflake.snowpark.context.debug_config(enable_eager_schema_validation=True)
# or
snowflake.snowpark.context.debug_config(
enable_eager_schema_validation=True,
enable_dataframe_trace_on_error=True
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. @sfc-gh-jrose and I are aligned on the name. Let me add the @experimental decorator as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could not import decorator due to circular import issues but added a warning there
src/snowflake/snowpark/context.py
Outdated
|
|
||
| def configure_development_features( | ||
| *, | ||
| enable_dataframe_trace_on_error: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should default to True. That way users that want a basic development mode can call this function without any parameters.
…ion' into aalam-SNOW-2084165-add-error-trace
…ion' into aalam-SNOW-2084165-add-error-trace
…ion' into aalam-SNOW-2084165-add-error-trace
…lakedb/snowpark-python into aalam-SNOW-2084165-add-error-trace
| """A node representing a dataframe operation in the DAG that represents the lineage of a DataFrame.""" | ||
|
|
||
| def __init__(self, batch_id: int, stmt_cache: Dict[int, proto.Stmt]) -> None: | ||
| self.batch_id = batch_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would argue that this isn't meant to be batch ID anymore. Within each Python session that imports the Snowpark module, each AST ID for a Table or Dataframe will be a UID.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2084165
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
see doc.