-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathvisualization.qmd
166 lines (109 loc) · 7.29 KB
/
visualization.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
title: "Visualization"
---
[DatVis](https://socviz.co/lookatdata.html)
- Chapter 1: Look at data
- Chapter 3: Make a plot
- Chapter 4: Show the right numbers
- Chapter 5, section 5.3 & 5.4 Plot text directly and label outliers
# Why data visualisations
- communicate results
- explore data
- if done correctly: efficient way of processing & remembering data, becaue reduce the cognitive load and take it easy into long-term memory → because we have a limited working memory and keep in mind \~ 7 variables
- Reducing cognitive load makes the audience:
- More willing to read your analysis
- More likely to understand the data/results
- More prone to accept the results
- More likely to remember them
- Often the only part of the analysis that the audience ever sees.
## How to communicate via visualization?
- How do we make sure that the graphs we make transfer:
- The right part of the data, and; with the less effort possible? ( minimizes cognitive load)
- First step in a data visualization task: Write down the main message you want to convey
**Central questions:**
1. What are the main elements of a graph? (labels, dots, bars, facets ...)
{width="200"}
2. What type of plot should you use?
- Barplots for a categorical and a numerical variable, compare the frequency
- Scatterplots for 2 numerical variables, shows covariances and relations of the two variables
{width="200"}
3.How can we make a plot look more professional? - take it as minimal as possible, no "junk", no Color, if no color is needed, scale comprehensible
{width="500"}
4. How to guide the reader?
- highlight the central aspect
{width="400"}
## Criteria for good graphs and visualization
Guidelines for routine plotting:
- properly chosen format and design
- use words, numbers and drawing together
- display an accessible complexity of detail
- avoid content-free decoration
- maximize the "data-to-ink" ratio
- simplify, remove everything that is not necessary
- no cherry picking in data, visuals must be chosen in relation to data, example: Age cohorts in Barcharts, longtidual changes in point charts
- reduce aesthetics to a minimal and use colour and so on only if it has a meaning.
- humans ability to see contrast is stronger for monochrome images than for color
- using color in data visualization introduces a number of other complications, because color contains the hue (Farbton) and a chrominance ot chroma (intesity or vividness of the color):
- how bright an object looks depends partly on the brightness of objects near it.
- distance of variables should be found also in a perceptually sense in the choice of colors, not only in a numerical one
- "preattentive pop-out": Some objects in our visual field are easier to see than others → indicate with shapes, color & position.
- Most people see the Poisson-generated pattern (a random generated pattern) as having more structure, or less 'randomness', than the Matérn (an equally distributed), whereas the reverse is true.
- humans are always looking for structure, the tendency of infer relationships, "gestalt rules":
- Similarity: Things that look alike seem to be related.
- Connection: Things that are visually tied to one another seem to be related.
- Continuity: Partially hidden objects are completed into familiar shapes.
- Closure: Incomplete shapes are perceived as complete.
- Figure and Ground: Visual elements are taken to be either in the foreground or the background.
- Common Fate: Elements sharing a direction of movement are perceived as a unit.
- humans can identify and estimate percentages of differences of two sizes for graphs on a different level, here the results of testing:
{width="400"}
## Channels and type of graph in overwiew
{width="400"}
{width="400"}
{width="400"}
{width="400"}
{width="400"}
## Principles of Design
{width="400"}
**Pracitcal advice**
Reduce cognitive load: - Removing unnecessary clutter - More professional/aesthetically pleasant Contrast: - Eliminate unnecessary lines (all frames, use gray grid lines, etc) - ’t use a gray background - White space is your friend (allows for “breathing”) - Enlarge the labels - Use vector graphics (svg/pdf/eps) to avoid blurry figures –\> Edit them in AI or Inkscape Repetition: Be consistent in different figures Alignment: Make sure you align subplots/labels Proximity: When possible, label data directly (instead of using legends)
## Guide the reader
- We read plots in a Z-shaped flow: top-left to top-right to bottom-left to bottom-right
{width="200"}
With this elements:
{width="400"}
The most useful pre-attentive attribute: - Increases contrast - Allows for consistency (same country with the same color)
Color affect emotion and this is culture-dependent. Some responses are nearly universal - Warm colors –\> alive/alert - Blue colors –\> calming/focus
Color for the colorblinding: https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40
In addition of highlighting, colours can be used to: - Represent categories (not more than 4 colors) - Represent values: - Only if necessary (i.e. you are using the x and y axis for more important variables) - Not accurate (show trends)
{width="400"}
- left, too much: you are lost
- right, your attention is guided to the important aspects
{width="400"}
{width="400"}
- Qualitative: categorial data
- Sequential: The minimum or maximum is important
- Diverging: The middle value is the important one, which comparison is drawn on
## how ggplot function works
{width="200"}
required library: `library(tidyverse)`
In R, grammar of graphics is implemented in `ggplot()`, a function in the `ggplot2` package.
Elements of a graph:
- The data: ggplot(data = gapminder)
- Aesthetic mappings (position, shape, color, …) – map variables to influence visual channels: mapping = aes(x = gdp, y = pop)
- Geometric objects (points, lines, bars, …) – use those mappings: + geom_point()
- Labels (titles, caption, axes labels): + labs(x = "GDP", y="Population")
`ggplot()` is the *function* to plot `aes` or *astehtic mappings* is the logical connection bewtween your data and the plot element `geom` defines the type of plot like
- `geom_point`
- `geom_bar`
- `geom_boxplot`
- in this function additional elements could be added like scales, labels and so on
ggplot function is additive, you add layer by layer, e. g.:
```
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y=lifeExp))
p + geom_point() +
geom_smooth(method = "gam") +
scale_x_log10(labels = scales::dollar)
```
Overview ggplot aesthetics: <https://ggplot2.tidyverse.org/reference/index.html#section-aesthetics>
Overview ggplot geometrics: <https://ggplot2.tidyverse.org/reference/index.html#section-geoms>