Skip to content

Commit 5f4bd24

Browse files
committed
differences for PR #149
1 parent e684e92 commit 5f4bd24

File tree

4 files changed

+112
-5
lines changed

4 files changed

+112
-5
lines changed

04-data-structures-part2.md

+111-4
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat
3636

3737
::::::::::::::::::::::::::::::::::::::::: instructor
3838

39-
Pay attention to and explain the errors and warnings generated from the
39+
Pay attention to and explain the errors and warnings generated from the
4040
examples in this episode.
4141

42-
:::::::::::::::::::::::::::::::::::::::::
42+
:::::::::::::::::::::::::::::::::::::::::
4343

4444

4545
```r
@@ -77,7 +77,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind
7777

7878
- You can read directly from excel spreadsheets without
7979
converting them to plain text first by using the [readxl](https://cran.r-project.org/package=readxl) package.
80-
80+
8181

8282
::::::::::::::::::::::::::::::::::::::::::::::::::
8383

@@ -99,7 +99,8 @@ str(gapminder)
9999
$ gdpPercap: num 779 821 853 836 740 ...
100100
```
101101

102-
We can also examine individual columns of the data frame with our `class` function:
102+
We can also examine individual columns of the data frame with the `class` or
103+
'typeof' functions:
103104

104105

105106
```r
@@ -110,6 +111,14 @@ class(gapminder$year)
110111
[1] "integer"
111112
```
112113

114+
```r
115+
typeof(gapminder$year)
116+
```
117+
118+
```{.output}
119+
[1] "integer"
120+
```
121+
113122
```r
114123
class(gapminder$country)
115124
```
@@ -424,6 +433,104 @@ tail(gapminder_norway)
424433

425434
To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.
426435

436+
437+
## Removing columns and rows in data frames
438+
439+
To remove columns from a data frame, we can use the 'subset' function.
440+
This function allows us to remove columns using their names:
441+
442+
443+
```r
444+
life_expectancy <- subset(gapminder, select = -c(continent, pop, gdpPercap))
445+
head(life_expectancy)
446+
```
447+
448+
```{.output}
449+
country year lifeExp below_average
450+
1 Afghanistan 1952 28.801 TRUE
451+
2 Afghanistan 1957 30.332 TRUE
452+
3 Afghanistan 1962 31.997 TRUE
453+
4 Afghanistan 1967 34.020 TRUE
454+
5 Afghanistan 1972 36.088 TRUE
455+
6 Afghanistan 1977 38.438 TRUE
456+
```
457+
458+
We can also use a logical vector to achieve the same result. Make sure the
459+
vector's length match the number of columns in the data frame (to avoid vector
460+
recycling):
461+
462+
463+
```r
464+
life_expectancy <- gapminder[c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)]
465+
head(life_expectancy)
466+
```
467+
468+
```{.output}
469+
country year lifeExp below_average
470+
1 Afghanistan 1952 28.801 TRUE
471+
2 Afghanistan 1957 30.332 TRUE
472+
3 Afghanistan 1962 31.997 TRUE
473+
4 Afghanistan 1967 34.020 TRUE
474+
5 Afghanistan 1972 36.088 TRUE
475+
6 Afghanistan 1977 38.438 TRUE
476+
```
477+
478+
Alternatively, we can use column's positions:
479+
480+
481+
```r
482+
life_expectancy <- gapminder[-c(3, 4, 6)]
483+
head(life_expectancy)
484+
```
485+
486+
```{.output}
487+
country year lifeExp below_average
488+
1 Afghanistan 1952 28.801 TRUE
489+
2 Afghanistan 1957 30.332 TRUE
490+
3 Afghanistan 1962 31.997 TRUE
491+
4 Afghanistan 1967 34.020 TRUE
492+
5 Afghanistan 1972 36.088 TRUE
493+
6 Afghanistan 1977 38.438 TRUE
494+
```
495+
496+
Note that the easy way to remove rows from a data frame is selecting the rows
497+
we want to keep instead.
498+
Anyway, to remove rows from a data frame, we can use their positions:
499+
500+
501+
```r
502+
# Filter data for Afghanistan during the 20th century:
503+
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan" &
504+
gapminder$year > 2000, ]
505+
506+
# Now remove data for 2002, that is, the first row:
507+
afghanistan_20c[-1, ]
508+
```
509+
510+
```{.output}
511+
country year pop continent lifeExp gdpPercap below_average
512+
12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
513+
```
514+
515+
516+
An interesting case is removing rows containing NAs:
517+
518+
519+
```r
520+
# Turn some values into NAs:
521+
afghanistan_20c <- gapminder[gapminder$country == "Afghanistan", ]
522+
afghanistan_20c[afghanistan_20c$year < 2007, "year"] <- NA
523+
524+
# Remove NAs
525+
na.omit(afghanistan_20c)
526+
```
527+
528+
```{.output}
529+
country year pop continent lifeExp gdpPercap below_average
530+
12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
531+
```
532+
533+
427534
## Factors
428535

429536
Here is another thing to look out for: in a `factor`, each different value

fig/06-rmd-generate-figures.sh

100755100644
File mode changed.

fig/12-plyr-generate-figures.sh

100755100644
File mode changed.

md5sum.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"episodes/01-rstudio-intro.Rmd" "f4e11815e378019213cd8bc32bd5d292" "site/built/01-rstudio-intro.md" "2023-11-21"
77
"episodes/02-project-intro.Rmd" "00024461ca6e3ea1ec659cf9434377d4" "site/built/02-project-intro.md" "2023-11-21"
88
"episodes/03-data-structures-part1.Rmd" "a83070b1d04789704c8173e6813aba66" "site/built/03-data-structures-part1.md" "2023-11-21"
9-
"episodes/04-data-structures-part2.Rmd" "22100d1539c25cba0459d909f346f516" "site/built/04-data-structures-part2.md" "2023-11-21"
9+
"episodes/04-data-structures-part2.Rmd" "df5db7ccfc08dc2a55831652fc07de31" "site/built/04-data-structures-part2.md" "2024-01-11"
1010
"episodes/05-data-subsetting.Rmd" "b673744f991a865b9996504197cc013e" "site/built/05-data-subsetting.md" "2023-11-21"
1111
"episodes/06-dplyr.Rmd" "5d6106566981f73f1e3dc6a5c011fa28" "site/built/06-dplyr.md" "2023-11-21"
1212
"episodes/07-plot-ggplot2.Rmd" "7cbd4da57c055ecbc3ee80bd2694497a" "site/built/07-plot-ggplot2.md" "2023-11-21"

0 commit comments

Comments
 (0)