-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow definition of collector_fun
#13
base: main
Are you sure you want to change the base?
Conversation
collector_fun
A higher granularity variant would be to be able to provide for every patched functions the SE args that we should force. This specification can be generated automatically by a helper with a rule that every 1st arg is SE, then we can tweak it if desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globals$collecting is great
collector_fun is good but needs to be documented
forcing the first arg seems brittle to me, I think I'd like to at least opt in, or have a more elegant system.
funs = NULL, | ||
pkg = NULL, | ||
path = "collector", | ||
collector_fun = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation is missing for new arg
@@ -52,8 +57,16 @@ set_collector <- function(funs = NULL, pkg = NULL, path = "collector") { | |||
# Replace bodies a call to collect_and_run(), keep original call as attr | |||
for (nm in names(funs)) { | |||
val <- ns[[nm]] | |||
if (length(formals(val)) > 0 && names(formals(val))[[1]] != "...") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this force every first argument ? I can think of a few functions with a NSE 1st arg, like quote, substitute, ~, expr, quo, unquo, inject.
This is also not "true" collection because we lose the logic on how the data was built.
I get that it is useful but do we need an argument to switch this on ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the results aren't great here either. We need to find a better way to deal with nested calls. The next step would be to create a reprex to better understand the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By nested call do you mean that with something like :
df |>
mutate(...) |>
select(...)
We'd like to collect both calls ? if so globals$collecting
is indeed preventing it, the older behavior wasn't.
Since we don't collect when a function is called from its namespace (or a child of its namespace) the globals$collecting
is really doing just that as far as I understand.
I can propose a force
argument, a named list where names are functions and values a vector of argument names.
This list could be pre-computed from static analysis, and then fixed manually.
- Clean srcref attribute from call - Record sys.calls() - Fix
This started as an attempt to filter the results of the collection before saving them to disk. Unclear if this is needed, but it does no harm.
The two subsequent commits add value. With that, a collection that previously took 40 GB now takes under 200 MB. Unfortunately, these commits can't be applied directly to the main branch.