Skip to content

pivot_wider: unexpected behavior specifying id_cols to omit that are included in names/values_from #1506

@jwhendy

Description

@jwhendy

I was running pivot_wider on some data and was surprised by the inability to use -c(col1, col2) to choose my id_cols, resulting in the error:

Error in `pivot_wider()`:
`id_cols` can't select a column already selected by `names_from`.
Column `type` has already been selected.

Repro:

library(dplyr)
library(tidyr)

tmp <- data.frame(id1 = c("a", "a", "b", "b"),
                  unused = c(NA, NA, NA, NA),
                  type = c("c", "d", "c", "d"),
                  values = c(1, 2, 3, 4))

Base case:

tmp %>% pivot_wider(id_cols = id1, names_from = type, values_from = values)

# A tibble: 2 × 3
  id1       c     d
  <chr> <dbl> <dbl>
1 a         1     2
2 b         3     4

But say you had a lot of columns; it's more concise to remove a few than name them all. Neither of these work, and produce the error above:

tmp %>% pivot_wider(id_cols = -c(type, values, unused), names_from = type, values_from = values)
tmp %>% pivot_wider(id_cols = c(-type, -values, -unused), names_from = type, values_from = values)

My failure mode may be covered by this statement from the docs:

id_cols [...] Defaults to all columns in data except for the columns specified through names_from and values_from. If a tidyselect expression is supplied, it will be evaluated on data after removing the columns specified through names_from and values_from.

This is why I included the "unused" column, as for data with many columns, one would have to think about "ok, I'm removing type and values 'for free' since they are used in other args, but I do need to remember to remove those other columns."

tmp %>% pivot_wider(id_cols = -unused, names_from = type, values_from = values)

# A tibble: 2 × 3
  id1       c     d
  <chr> <dbl> <dbl>
1 a         1     2
2 b         3     4

Thoughts:

  • this is a bug, in that there should be no problem specifying columns to drop, even if they are implicitly dropped by being passed to names_from or values_from
  • this is not a bug, but documentation could be improved. I was confused by the message: "Column type has already been selected"... by what/how?" It was non-intuitive to me that it's "selected" when I'm trying to explicitly not select it as an id_col
  • not a bug, and the documentation is perfectly clear. I admittedly don't use pivot functions that often, so it could be a misunderstanding on my part.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions