Lines by jurek.d is licensed under
CC
BY-NC 2.0
introduction
line color
line type
line size
reference lines vertical
reference lines horizontal
reference lines sloped
exercises
references
Here we use lines to represent data or we add reference lines to assist the reader’s comprehension.
library("tidyverse")
library("seplyr")
library("graphclassmate")Some color assignments I like to use.
my_color_6 <- c(rcb("dark_BG"), rcb("mid_BG"), rcb("light_BG"),
rcb("light_Br"), rcb("mid_Br"), rcb("dark_Br"))
my_color_3 <- my_color_6[c(1, 2, 6)]Editing lines is similar to editing symbols except we use geom_line()
instead of geom_point().
I’ll use the UKLungDeaths dataset in base R for three time series
giving the monthly deaths from bronchitis, emphysema, and asthma in the
UK, 1974–1979, males (mdeaths), females (fdeaths), and total
(ldeaths).
I use the tsbox package to convert the Time Series data format to a tidy data frame.
library("tsbox")
# collect the time series
collected_ts <- ts_c(mdeaths, fdeaths, ldeaths)
# convert to data frame
df <- ts_df(collected_ts)
# then to a tibble
df <- as_tibble(df)
df
#> # A tibble: 216 x 3
#> id time value
#> <chr> <date> <dbl>
#> 1 mdeaths 1974-01-01 2134
#> 2 mdeaths 1974-02-01 1863
#> 3 mdeaths 1974-03-01 1877
#> 4 mdeaths 1974-04-01 1877
#> 5 mdeaths 1974-05-01 1492
#> 6 mdeaths 1974-06-01 1249
#> 7 mdeaths 1974-07-01 1280
#> 8 mdeaths 1974-08-01 1131
#> 9 mdeaths 1974-09-01 1209
#> 10 mdeaths 1974-10-01 1492
#> # ... with 206 more rowsDistinguish by color. First, convert the id to a factor and order it.
df <- df %>%
mutate(id = fct_reorder(id, value))Then assign color to the id variable in the aes() function.
ggplot(data = df, mapping = aes(x = time, y = value, color = id)) +
geom_line() +
guides(color = guide_legend(reverse = TRUE))We can specify line colors using scale_* functions
ggplot(data = df, mapping = aes(x = time, y = value, color = id)) +
geom_line() +
guides(color = guide_legend(reverse = TRUE)) +
scale_color_manual(values = my_color_3)Like geom_point() has a shape argument, geom_line() has a
linetype argument with 7 levels,
Distinguish by line type
ggplot(data = df, mapping = aes(x = time, y = value, linetype = id)) +
geom_line() +
guides(linetype = guide_legend(reverse = TRUE))If we want to assign the same color to every line, we do it in the geom.
ggplot(data = df, mapping = aes(x = time, y = value, linetype = id)) +
guides(linetype = guide_legend(reverse = TRUE)) +
geom_line(color = rcb("dark_BG"))Line type and color can both be assigned to a variable. When we do that, however, we get two legends.
ggplot(data = df,
mapping = aes(x = time, y = value,
linetype = id,
color = id)) +
guides(linetype = guide_legend(reverse = TRUE)) +
geom_line() +
scale_color_manual(values = my_color_3)We’d prefer one legend. To keep the top legend with color and omit the
bottom legend, we use guides() with its linetype argument set to
"none".
ggplot(data = df,
mapping = aes(x = time, y = value,
linetype = id,
color = id)) +
guides(linetype = guide_legend(reverse = TRUE)) +
geom_line() +
scale_color_manual(values = my_color_3) +
guides(color = guide_legend(reverse = TRUE), linetype = "none")If we want all line sizes to be the same, we use the geom for the size
argument.
ggplot(data = df, mapping = aes(x = time, y = value, linetype = id)) +
guides(linetype = guide_legend(reverse = TRUE)) +
geom_line(color = rcb("dark_BG"), size = 1)Using facets, can reduce the number of linetypes and colors needed to distinguish between lines.
ggplot(data = df, mapping = aes(x = time, y = value)) +
geom_line(size = 1, color = rcb("dark_BG")) +
facet_wrap(vars(id), as.table = FALSE, ncol = 3)The geom_vline() draws a vertical line at a specified xintercept.
Start with a basic scatterplot.
# diamonds dataset from ggplot2
df <- diamonds %>%
filter(carat <= 1 & price <= 4000) %>%
drop_na()
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3)
xintercept is arbitrary
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_vline(xintercept = 0.69,
color = rcb("dark_PR"),
size = 1)
xintercept depends on the data. Note that the data and
mapping arguments appear in geom_vline()
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_vline(data = df,
mapping = aes(xintercept = median(carat)),
color = rcb("dark_PR"),
size = 1)
xintercept is the same in every facet if we use the same
geom_vline() arguments as above
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_vline(data = df,
mapping = aes(xintercept = median(carat)),
color = rcb("dark_PR"),
size = 1) +
facet_wrap(vars(cut), as.table = FALSE, ncol = 1)
To get the vertical line to represent the median in every panel, we
construct a new variable, med_carat, grouped by cut because that’s
what we are faceting on.
grouping_variables <- c("cut")
find_medians <- df %>%
group_summarize(grouping_variables,
med_carat = median(carat),
med_price = median(price)) %>%
glimpse()
#> Observations: 5
#> Variables: 3
#> $ cut <ord> Fair, Good, Very Good, Premium, Ideal
#> $ med_carat <dbl> 0.71, 0.51, 0.50, 0.42, 0.41
#> $ med_price <dbl> 2117, 1351, 1213, 1087, 1075
df <- left_join(df, find_medians, by = "cut")ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_vline(data = df,
mapping = aes(xintercept = med_carat),
color = rcb("dark_PR"),
size = 1) +
facet_wrap(vars(cut), as.table = FALSE, ncol = 1)The geom_hline() draws a horiontal line at a specified yintercept.
yintercept is arbitrary
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_hline(yintercept = 3000,
color = rcb("dark_PR"),
size = 1)
xintercept depends on the data. Note that the data and
mapping arguments appear in geom_vline()
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_hline(data = df,
mapping = aes(yintercept = median(price)),
color = rcb("dark_PR"),
size = 1)
yintercept is the same in every facet if we use the same
geom_hline() arguments as above
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_hline(data = df,
mapping = aes(yintercept = median(price)),
color = rcb("dark_PR"),
size = 1) +
facet_wrap(vars(cut), as.table = FALSE, ncol = 5)
To get the horizontal line to represent the median in every panel,
we use the med_price variable we constructed earlier.
ggplot(data = df, aes(x = carat, y = price)) +
geom_jitter(alpha = 0.3) +
geom_hline(data = df,
mapping = aes(yintercept = med_price),
color = rcb("dark_PR"),
size = 1) +
facet_wrap(vars(cut), as.table = FALSE, ncol = 5)Lines of any slope are constructed using geom_abline() which requires
two arguments: slope and intercept.
grouping_variables <- c("race", "path", "sex")
df <- nontraditional %>%
seplyr::group_summarize(grouping_variables,
enrolled = mean(enrolled))
df
#> # A tibble: 16 x 4
#> race path sex enrolled
#> <chr> <chr> <chr> <dbl>
#> 1 Asian Nontraditional Female 4.06
#> 2 Asian Nontraditional Male 4.22
#> 3 Asian Traditional Female 4.14
#> 4 Asian Traditional Male 4.27
#> 5 Black Nontraditional Female 4.14
#> 6 Black Nontraditional Male 4.33
#> 7 Black Traditional Female 4.08
#> 8 Black Traditional Male 4.28
#> 9 Hispanic Nontraditional Female 3.95
#> 10 Hispanic Nontraditional Male 4.13
#> 11 Hispanic Traditional Female 4.04
#> 12 Hispanic Traditional Male 4.26
#> 13 White Nontraditional Female 3.84
#> 14 White Nontraditional Male 4.14
#> 15 White Traditional Female 3.93
#> 16 White Traditional Male 4.19I’d like to plot compare traditional students to nontraditional students
by having one on each axis of a scatterplot, so I need to reshape the
data frame to create two new variables trad and nontrad.
df <- cdata::pivot_to_rowrecs(df,
columnToTakeKeysFrom = "path",
columnToTakeValuesFrom = "enrolled",
rowKeyColumns = c("race", "sex"))
knitr::kable(df)| race | sex | Nontraditional | Traditional |
|---|---|---|---|
| Asian | Female | 4.057262 | 4.143750 |
| Asian | Male | 4.215313 | 4.272284 |
| Black | Female | 4.136124 | 4.084110 |
| Black | Male | 4.330957 | 4.282462 |
| Hispanic | Female | 3.948345 | 4.042273 |
| Hispanic | Male | 4.125917 | 4.257056 |
| White | Female | 3.839782 | 3.925950 |
| White | Male | 4.138989 | 4.193669 |
The point was to plot Nontraditional students on one axis and Traditional students on the other.
ggplot(data = df, mapping = aes(x = Traditional, y = Nontraditional)) +
geom_point()To which I can add a 45-degree line using geom_abline(). Points below
the line indiacte groups for which Traditional students take more years
to graduate.
ggplot(data = df, mapping = aes(x = Traditional, y = Nontraditional)) +
geom_abline(slope = 1, intercept = 0, color = rcb("light_Gray")) +
geom_point()Add coord_fixed() to ensure the line is at 45 degrees.
p <- ggplot(data = df, mapping = aes(x = Traditional, y = Nontraditional)) +
geom_abline(slope = 1, intercept = 0, color = rcb("light_Gray")) +
geom_point() +
coord_fixed(ratio = 1)
pThe graph tells us that for most groups by race and sex, traditional students on average take longer to graduate than nontraditional students.
Try a facet,
ggplot(data = df, mapping = aes(x = Traditional, y = Nontraditional)) +
geom_abline(slope = 1, intercept = 0, color = rcb("light_Gray")) +
geom_point() +
coord_fixed(ratio = 1) +
facet_wrap(vars(sex), as.table = FALSE, ncol = 2)We might add horizontal and vertical lines reference lines at 4 years.
ggplot(data = df, mapping = aes(x = Traditional, y = Nontraditional)) +
geom_hline(yintercept = 4, color = rcb("light_Gray")) +
geom_vline(xintercept = 4, color = rcb("light_Gray")) +
geom_abline(slope = 1, intercept = 0, color = rcb("light_Gray")) +
geom_point() +
coord_fixed(ratio = 1) +
facet_wrap(vars(sex), as.table = FALSE, ncol = 2)Only women are graduating on average in under 4 years. Of course, these are means.
1. nontraditional
Edit this graph (developed earlier) to that the data markers are replaced with a letter to represent race and that sex is encoded using color.
Wickham H and Grolemund G (2017) R for Data Science. O’Reilly Media, Inc., Sebastopol, CA https://r4ds.had.co.nz/























