tidyr 0.7.0
This release includes important changes to tidyr internals. Tidyr now
supports the new tidy evaluation framework for quoting (NSE)
functions. It also uses the new tidyselect package as selecting
backend.
Breaking changes
-
If you see error messages about objects or functions not found, it
is likely because the selecting functions are now stricter in their
arguments An example of selecting function isgather()and its
...argument. This change makes the code more robust by
disallowing ambiguous scoping. Consider the following code:x <- 3 df <- tibble(w = 1, x = 2, y = 3) gather(df, "variable", "value", 1:x)Does it select the first three columns (using the
xdefined in the
global environment), or does it select the first two columns (using
the column namedx)?To solve this ambiguity, we now make a strict distinction between
data and context expressions. A data expression is either a bare
name or an expression likex:yorc(x, y). In a data expression,
you can only refer to columns from the data frame. Everything else
is a context expression in which you can only refer to objects that
you have defined with<-.In practice this means that you can no longer refer to contextual
objects like this:mtcars %>% gather(var, value, 1:ncol(mtcars)) x <- 3 mtcars %>% gather(var, value, 1:x) mtcars %>% gather(var, value, -(1:x))You now have to be explicit about where to find objects. To do so,
you can use the quasiquotation operator!!which will evaluate its
argument early and inline the result:mtcars %>% gather(var, value, !! 1:ncol(mtcars)) mtcars %>% gather(var, value, !! 1:x) mtcars %>% gather(var, value, !! -(1:x))An alternative is to turn your data expression into a context
expression by usingseq()orseq_len()instead of:. See the
section on tidyselect for more information about these semantics. -
Following the switch to tidy evaluation, you might see warnings
about the "variable context not set". This is most likely caused by
supplyng helpers likeeverything()to underscored versions of
tidyr verbs. Helpers should be always be evaluated lazily. To fix
this, just quote the helper with a formula:drop_na(df, ~everything()). -
The selecting functions are now stricter when you supply integer
positions. If you see an error along the lines of`-0.949999999999999`, `-0.940000000000001`, ... must resolve to integer column positions, not a double vectorplease round the positions before supplying them to tidyr. Double
vectors are fine as long as they are rounded.
Switch to tidy evaluation
tidyr is now a tidy evaluation grammar. See the
programming vignette
in dplyr for practical information about tidy evaluation.
The tidyr port is a bit special. While the philosophy of tidy
evaluation is that R code should refer to real objects (from the data
frame or from the context), we had to make some exceptions to this
rule for tidyr. The reason is that several functions accept bare
symbols to specify the names of new columns to create (gather()
being a prime example). This is not tidy because the symbol do not
represent any actual object. Our workaround is to capture these
arguments using rlang::quo_name() (so they still support
quasiquotation and you can unquote symbols or strings). This type of
NSE is now discouraged in the tidyverse: symbols in R code should
represent real objects.
Following the switch to tidy eval the underscored variants are softly
deprecated. However they will remain around for some time and without
warning for backward compatibility.
Switch to the tidyselect backend
The selecting backend of dplyr has been extracted in a standalone
package tidyselect which tidyr now uses for selecting variables. It is
used for selecting multiple variables (in drop_na()) as well as
single variables (the col argument of extract() and separate(),
and the key and value arguments of spread()). This implies the
following changes:
-
The arguments for selecting a single variable now support all
features fromdplyr::pull(). You can supply a name or a position,
including negative positions. -
Multiple variables are now selected a bit differently. We now make a
strict distinction between data and context expressions. A data
expression is either a bare name of an expression likex:yor
c(x, y). In a data expression, you can only refer to columns from
the data frame. Everything else is a context expression in which you
can only refer to objects that you have defined with<-.You can still refer to contextual objects in a data expression by
being explicit. One way of being explicit is to unquote a variable
from the environment with the tidy eval operator!!:x <- 2 drop_na(df, 2) # Works fine drop_na(df, x) # Object 'x' not found drop_na(df, !! x) # Works as if you had supplied 2
On the other hand, select helpers like
start_with()are context
expressions. It is therefore easy to refer to objects and they will
never be ambiguous with data columns:x <- "d" drop_na(df, starts_with(x))While these special rules is in contrast to most dplyr and tidyr
verbs (where both the data and the context are in scope) they make
sense for selecting functions and should provide more robust and
helpful semantics.