-
Notifications
You must be signed in to change notification settings - Fork 1k
rowwise(): Detect and reject non-atomic, non-list column values with clear error message #7250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7250 +/- ##
=======================================
Coverage 99.01% 99.02%
=======================================
Files 81 81
Lines 15503 15517 +14
=======================================
+ Hits 15351 15365 +14
Misses 152 152 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
No obvious timing issues in HEAD=issue_7219 Generated via commit dafc34f Download link for the artifact containing the test results: ↓ atime-results.zip
|
R/rowwiseDT.R
Outdated
nrows = length(body) %/% ncols | ||
if (length(body) != nrows * ncols) | ||
stopf("There are %d columns but the number of cells is %d, which is not an integer multiple of the columns", ncols, length(body)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please undo addition of empty lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix
hi @tdhock I did the modifications can you please have a look when you got time. |
R/rowwiseDT.R
Outdated
stopf("There are %d columns but the number of cells is %d, which is not an integer multiple of the columns", ncols, length(body)) | ||
is_problematic = vapply( | ||
body, | ||
function(v) !is.atomic(v) && !is.null(v) && typeof(v) != "list", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this check really what @aitap wrote about catering for that is.atomic(NULL)=TRUE
on older R versions?
it should also be faster to check !(is.atomic(v) || is.null(v) || typeof(v) == "list")
Note that we have our internal versions of vapply e.g. vapply_1b
R/rowwiseDT.R
Outdated
first_problem_idx = which(is_problematic)[1L] | ||
col_idx = (first_problem_idx - 1L) %% ncols + 1L | ||
col_name = header[col_idx] | ||
obj_type = typeof(body[[first_problem_idx]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if typeof
is the right choice here. For the problem raised in #7219 class
would be a better choice since class
would be "function" but typeof
would be still "closure".
R/rowwiseDT.R
Outdated
col_name = header[col_idx] | ||
obj_type = typeof(body[[first_problem_idx]]) | ||
stopf( | ||
"In column '%s', received an object of type '%s'.\nComplex objects (like functions, models, etc.) must be wrapped in list() to be stored in a data.table column.\nPlease use `list(...)` for this value.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really true? Many models
are indeed lists, e.g. lm(mpg~., data=mtcars)
.
Other thing on top of my head would be expression
s which we do not fully support. Storing environments into a data.table
seems uncommon to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also still miss a NEWS item!
closes #7219
This PR enhances rowwise() to detect all non-atomic, non-list column values (e.g., language objects, expressions, pairlists, environments, S4 objects) instead of only functions. It adds a clear error message with the offending column name and type, plus guidance to wrap values in list(...) if intended. Includes updated tests for both allowed and rejected cases.
hi @tdhock @MichaelChirico @aitap can you please have a look when you got time .
thanks.