@@ -36,10 +36,10 @@ So far, you have seen the basics of manipulating data frames with our nordic dat
36
36
37
37
::::::::::::::::::::::::::::::::::::::::: instructor
38
38
39
- Pay attention to and explain the errors and warnings generated from the
39
+ Pay attention to and explain the errors and warnings generated from the
40
40
examples in this episode.
41
41
42
- :::::::::::::::::::::::::::::::::::::::::
42
+ :::::::::::::::::::::::::::::::::::::::::
43
43
44
44
45
45
``` r
@@ -77,7 +77,7 @@ gapminder <- read.csv("https://datacarpentry.org/r-intro-geospatial/data/gapmind
77
77
78
78
- You can read directly from excel spreadsheets without
79
79
converting them to plain text first by using the [ readxl] ( https://cran.r-project.org/package=readxl ) package.
80
-
80
+
81
81
82
82
::::::::::::::::::::::::::::::::::::::::::::::::::
83
83
@@ -99,7 +99,8 @@ str(gapminder)
99
99
$ gdpPercap: num 779 821 853 836 740 ...
100
100
```
101
101
102
- We can also examine individual columns of the data frame with our ` class ` function:
102
+ We can also examine individual columns of the data frame with the ` class ` or
103
+ 'typeof' functions:
103
104
104
105
105
106
``` r
@@ -110,6 +111,14 @@ class(gapminder$year)
110
111
[1] "integer"
111
112
```
112
113
114
+ ``` r
115
+ typeof(gapminder $ year )
116
+ ```
117
+
118
+ ``` {.output}
119
+ [1] "integer"
120
+ ```
121
+
113
122
``` r
114
123
class(gapminder $ country )
115
124
```
@@ -424,6 +433,104 @@ tail(gapminder_norway)
424
433
425
434
To understand why R is giving us a warning when we try to add this row, let's learn a little more about factors.
426
435
436
+
437
+ ## Removing columns and rows in data frames
438
+
439
+ To remove columns from a data frame, we can use the 'subset' function.
440
+ This function allows us to remove columns using their names:
441
+
442
+
443
+ ``` r
444
+ life_expectancy <- subset(gapminder , select = - c(continent , pop , gdpPercap ))
445
+ head(life_expectancy )
446
+ ```
447
+
448
+ ``` {.output}
449
+ country year lifeExp below_average
450
+ 1 Afghanistan 1952 28.801 TRUE
451
+ 2 Afghanistan 1957 30.332 TRUE
452
+ 3 Afghanistan 1962 31.997 TRUE
453
+ 4 Afghanistan 1967 34.020 TRUE
454
+ 5 Afghanistan 1972 36.088 TRUE
455
+ 6 Afghanistan 1977 38.438 TRUE
456
+ ```
457
+
458
+ We can also use a logical vector to achieve the same result. Make sure the
459
+ vector's length match the number of columns in the data frame (to avoid vector
460
+ recycling):
461
+
462
+
463
+ ``` r
464
+ life_expectancy <- gapminder [c(TRUE , TRUE , FALSE , FALSE , TRUE , FALSE )]
465
+ head(life_expectancy )
466
+ ```
467
+
468
+ ``` {.output}
469
+ country year lifeExp below_average
470
+ 1 Afghanistan 1952 28.801 TRUE
471
+ 2 Afghanistan 1957 30.332 TRUE
472
+ 3 Afghanistan 1962 31.997 TRUE
473
+ 4 Afghanistan 1967 34.020 TRUE
474
+ 5 Afghanistan 1972 36.088 TRUE
475
+ 6 Afghanistan 1977 38.438 TRUE
476
+ ```
477
+
478
+ Alternatively, we can use column's positions:
479
+
480
+
481
+ ``` r
482
+ life_expectancy <- gapminder [- c(3 , 4 , 6 )]
483
+ head(life_expectancy )
484
+ ```
485
+
486
+ ``` {.output}
487
+ country year lifeExp below_average
488
+ 1 Afghanistan 1952 28.801 TRUE
489
+ 2 Afghanistan 1957 30.332 TRUE
490
+ 3 Afghanistan 1962 31.997 TRUE
491
+ 4 Afghanistan 1967 34.020 TRUE
492
+ 5 Afghanistan 1972 36.088 TRUE
493
+ 6 Afghanistan 1977 38.438 TRUE
494
+ ```
495
+
496
+ Note that the easy way to remove rows from a data frame is selecting the rows
497
+ we want to keep instead.
498
+ Anyway, to remove rows from a data frame, we can use their positions:
499
+
500
+
501
+ ``` r
502
+ # Filter data for Afghanistan during the 20th century:
503
+ afghanistan_20c <- gapminder [gapminder $ country == " Afghanistan" &
504
+ gapminder $ year > 2000 , ]
505
+
506
+ # Now remove data for 2002, that is, the first row:
507
+ afghanistan_20c [- 1 , ]
508
+ ```
509
+
510
+ ``` {.output}
511
+ country year pop continent lifeExp gdpPercap below_average
512
+ 12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
513
+ ```
514
+
515
+
516
+ An interesting case is removing rows containing NAs:
517
+
518
+
519
+ ``` r
520
+ # Turn some values into NAs:
521
+ afghanistan_20c <- gapminder [gapminder $ country == " Afghanistan" , ]
522
+ afghanistan_20c [afghanistan_20c $ year < 2007 , " year" ] <- NA
523
+
524
+ # Remove NAs
525
+ na.omit(afghanistan_20c )
526
+ ```
527
+
528
+ ``` {.output}
529
+ country year pop continent lifeExp gdpPercap below_average
530
+ 12 Afghanistan 2007 31889923 Asia 43.828 974.5803 TRUE
531
+ ```
532
+
533
+
427
534
## Factors
428
535
429
536
Here is another thing to look out for: in a ` factor ` , each different value
0 commit comments