Description
The problem
Currently, a big source of fiddlyness in ggplot2 happens when you need to make manual adjustments to the positions of objects in the final plot. Often (but not always) this happens when doing annotation, e.g. positioning labels such that they are nicely and consistently spaced or positioned relative to data, and/or relative to the plot itself. This can be frustrating because if the plot or data dimensions change, manually-positioned elements will also move, because the only way to position them is by supplying values in data units. However, annotation usually needs to be specified in plot units (e.g., points, npcs, etc)---or even worse---in a combination of data units and plot units.
That said, {grid} has robust support for unit conversion and combination in the form of grid::unit()
. If ggplot was able to specify positions (and even size
s, linewidth
s, etc) using grid::unit()
s, it would be a lot easier to create charts that look good even when the limits of underlying x/y scales change or when the dimensions of the plot changes.
Even better would be able to combine data units and plot units: e.g., to be able to specify something like "put this text label exactly 5 points to the left of this data position". grid::unit()
already has a data unit (the "native"
unit) that could be used for this purpose. It is somewhat unused in ggplot2 because ggplot internally scales everything into c(0,1)
, so "native"
and "npc"
are effectively equivalent. That means "native"
units could be used to solve this problem...
A possible solution for positional aesthetics (draft PR: #5610)
I started playing with the above problem earlier this week, and I think I've come up with something that, with some polish, might be able to solve it without much surgery---at least for positional aesthetics. The solution would also allow extension packages to more-or-less automatically take advantage of it.
The basic idea is to allow unit()
vectors containing at most one "native"
unit to be assigned to positional aesthetics, and for this native unit to represent the (transformed) data. Then you can do stuff like this:
data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
ggplot(aes(var1, var2)) +
geom_point() +
# a line exactly 5 points lower than `var2`
geom_line(aes(y = unit(var2, "native") - unit(5, "pt"))) +
# labels exactly 10 points left of their points, no matter how the
# plot is resized
geom_text(aes(label = name, x = unit(var1, "native") - unit(10, "pt"))) +
# an annotation that is always 10 points inset from the lower right
annotate("text",
x = unit(1, "npc") - unit(10, "pt"),
y = unit(10, "pt"),
label = "some label", vjust = 0, hjust = 1
)
The implementation is a work-in-progress (draft PR: #5610). It hides/exposes units in a way similar to what @teunbrand implemented for "AsIs"
objects. It works slightly differently in that when a unit column is hidden, the "native"
unit contained within the unit expression is left behind so scales can manipulate it, then the transformed native values replace their corresponding values in the hidden unit expression when it is unhidden later.
I was able to do this without modifying most components of the grammar, except for Coord
s --- these also need to do some hiding/unhiding of units which cannot be done in ggplot_build
. The solution I came up with was to do the hiding/unhiding in Coord$transform()
and move the implementation of each Coord's transform to Coord$transform_native()
, which means that extension package Coord
s would continue to work (but without supporting unit
s), and they would need only to change the name of their Coord$transform()
functions to Coord$transform_native()
to get support for unit
s.
I also ran into some snags with grid::unit()
in that it (1) does not directly support being added to data frames (needs an as.data.frame
implementation); (2) doesn't support zero-length vectors (I had to construct them manually); and (3) needed some additional vctrs
methods to be implemented to work within ggplot2 more easily. I put these in the draft PR (#5610), but these are probably more appropriate for some combination of vctrs
and grid
.
Finally, I think a subclass of unit()
specific to ggplot2
(call it ggunit()
) could be useful, as it would allow some simplification of syntax and improved semantics specific to the "only one native-unit component of the unit subexpression" interpretation of units. Specifically, if casting rules are written such that numerics are cast to "native"
units when combined with ggunits()
, this allows instances like unit(var1, "native")
above to be replaced with var1
, making the code much cleaner:
data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
ggplot(aes(var1, var2)) +
geom_point() +
# a line exactly 5 points lower than `var2`
geom_line(aes(y = var2 - ggunit(5, "pt"))) +
# labels exactly 10 points left of their points, no matter how the
# plot is resized
geom_text(aes(label = name, x = var1 - ggunit(10, "pt"))) +
# an annotation that is always 10 points inset from the lower right
annotate("text",
x = ggunit(1, "npc") - ggunit(10, "pt"),
y = ggunit(10, "pt"),
label = "some label", vjust = 0, hjust = 1
)
Some shortcut functions for commonly-used units, like as_pt(...)
and as_npc(...)
, simplify things further:
data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
ggplot(aes(var1, var2)) +
geom_point() +
# a line exactly 5 points lower than `var2`
geom_line(aes(y = var2 - as_pt(5))) +
# labels exactly 10 points left of their points, no matter how the
# plot is resized
geom_text(aes(label = name, x = var1 - as_pt(10))) +
# an annotation that is always 10 points inset from the lower right
annotate("text",
x = as_npc(1) - as_pt(10),
y = as_pt(10),
label = "some label", vjust = 0, hjust = 1
)
It's worth noting that this all works with coordinate transformations, too:
data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
ggplot(aes(var1, var2)) +
geom_point() +
geom_line() +
# labels exactly 10 points right of their points, no matter how the
# plot is resized
geom_text(aes(label = name, x = var1 + as_pt(10))) +
# an annotation that is always 10 points inset from the lower right
annotate("text",
x = as_npc(1) - as_pt(10),
y = as_pt(10),
label = "some label", vjust = 0, hjust = 1
) +
coord_polar()
On the slightly crazier side, I also prototyped an implementation of pmin
and pmax
for unit
s, which makes it easy to say things like "put the label 10 pts left/up from the point, but also make sure it's at least 10pts from the plot edge"):
data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
ggplot(aes(var1, var2)) +
geom_point() +
geom_text(aes(
label = name,
x = ggunit_pmax(var1 - as_pt(10), as_pt(10)),
y = ggunit_pmin(var2 + as_pt(10), as_npc(1) - as_pt(10))
))
(I don't like the names ggunit_pmin
and ggunit_pmax
; and possibly it would be better to make pmin
/pmax
generic)
unit
in non-positional aesthetics
Getting unit
to work in non-positional aesthetics, like size
and linewidth
, is a bit more complicated. The issue is that the corresponding properties in grid
grobs for these aesthetics (e.g. fontsize
and lwd
) don't take unit
s, so it is necessary to wrap the grid
version of the grob to get the desired functionality.
I only did this to geom_point()
to test it. Here are some points that are always the same width in data space when resized:
ggplot(data.frame(var1 = 1:5, var2 = c(1,3,3,3,5)), aes(var1, var2)) +
geom_point(size = unit(1.33, "native"))
While a bit of meta-programming could make this a straightforward task, it might make more sense to petition to get the underlying grid
grobs to fully support unit
s for these properties. Otherwise, extension package developers would not necessarily get these changes "for free", but would have to change to using ggplot's version of each grob. Tagging @pmur002 for thoughts.
Why I think this should be in ggplot2, not an extension
Fundamentally, annotation is very important to good visualization, and can be very fiddly to do well in ggplot2 currently (in fact, in a study of ggplot2 experts I conducted awhile back, this was one of their big pain points; see Sec 5.2.1). Comprehensive support for unit()
s could go a long way to making this easier, and I think can only be done within core ggplot2 --- plus, if the solution works well, all (or nearly all) extension packages would then support it too.