Skip to content

fix: Respect remote function config changes even if logic unchanged#2512

Open
TrevorBergeron wants to merge 13 commits intomainfrom
fix_cf_reuse_configs
Open

fix: Respect remote function config changes even if logic unchanged#2512
TrevorBergeron wants to merge 13 commits intomainfrom
fix_cf_reuse_configs

Conversation

@TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 13, 2026
@TrevorBergeron TrevorBergeron requested a review from tswast March 18, 2026 00:06
@TrevorBergeron TrevorBergeron marked this pull request as ready for review March 18, 2026 00:06
@TrevorBergeron TrevorBergeron requested review from a team as code owners March 18, 2026 00:06
cloud_function_memory_mib=cloud_function_memory_mib,
cloud_function_cpus=cloud_function_cpus,
cloud_function_ingress_settings=cloud_function_ingress_settings,
bq_metadata=bqrf_metadata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we don't have to annotate the BQ Functions we create with virtual types in the metadata anymore? If so, probably worth a mini design if only so you can get the proper attribution for this simplification, but also to share the technique with the team.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I'm just using the signature object now to capture everything about type mappings, including virtual ones.

session_id: str | None = None,
):
"""Get a name for the bigframes managed function for the given user defined function."""
# TODO: Move over to logic used by remote functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like some more context behind this TODO. Could you file a bug and put some more information there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created bug 495508827, moved todo, added bug reference

routine: bigquery.Routine, session: bigframes.Session
) -> BigqueryCallableRoutine:
udf_def = _routine_as_udf_def(routine)
override_type = _get_output_type_override(routine)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this removal also means we don't need to look at the routine's metadata anymore, right?

Side thought: I wonder if this means we don't support timedelta/duration in remote functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just encapsulating more in the udf_def. If anything, overall available metadata is more rich now

# Protocol version 4 is available in python version 3.4 and above
# https://docs.python.org/3/library/pickle.html#data-stream-format
_pickle_protocol_version = 4
logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I've recently been trying to follow a different pattern in some of my projects where there is one logger instance per package (e.g. bigframes) rather than per-module, as the per-module version makes applying filters much more annoying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind if we do this as a separate PR? Already I'm probably doing too much in one PR. This was the pre-existing logger config

Comment on lines +49 to +51
def __post_init__(self):
assert isinstance(self.name, str)
assert isinstance(self.dtype, (DirectScalarType, RowSeriesInputFieldV1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some comments for why this is necessary. If it's for the type checker, I'm surprised the class attributes above weren't sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataclasses class attributes aren't actually checked at runtime. This is why people use tools like pydantic to actually validate their records. This validation here is just a convenience to catch runtime issues earlier than getting some missing attribute later.

inputs: tuple[UdfArg, ...] = dataclasses.field()
output: DirectScalarType | VirtualListTypeV1

def __post_init__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Docs for why this is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not for any special reason, just runtime validation to catch things static type checking doesn't

def protocol_metadata(self) -> str | None:
import bigframes.functions._utils

# TODO: The output field itself should handle this, to handle protocol versioning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this comment, not a functional requirement rn, just speculation about a refactor.

def_copy, protocol=_pickle_protocol_version
)

hash_val = hashlib.md5()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. crc32c or at least a todo and bug to use a better hash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to crc32c



@dataclasses.dataclass(frozen=True)
class CloudRunFunctionConfig:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on how to keep this in sync with the arguments we add to our decorators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function that creates the artifact uses this struct solely (along with a name), so that is the forcing mechanism - anything that needs to reflect in the artifact must first be a field in this struct.

@dataclasses.dataclass(frozen=True)
class ArrayMapOp(base_ops.UnaryOp):
name: typing.ClassVar[str] = "array_map"
# TODO: Generalize to chained expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created bug: 495513753

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants