-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1. Introduction to R Syntax.Rmd
401 lines (291 loc) · 11.8 KB
/
1. Introduction to R Syntax.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
---
title: "Introduction to R Syntax"
author: Daphne Janelyn L. Go^[De La Salle University, Manila, Philippines, [email protected]]
output: html_notebook
---
R is an open source programming language that is popular for statistics and data analysis. R was built by statisticians, so many common statistical operations are built into the base language. R also features powerful and intuitive libraries for plotting and a variety of packages for predictive modeling, making it one of the most popular languages for data science.
## Data Types
Decimal numbers (real numbers) in R are known as doubles. Doubles are the default numeric data type so when you manually enter a number in R, you are working with a double.
```{r}
typeof(1)
typeof(-10.5)
typeof(Inf)
typeof(-Inf)
typeof(NaN) # dividing zero by zero produces NaN
```
Integers are a second numeric data type that only take whole numbered values.
```{r}
as.integer(1) # Convert the double 1 to integer 1
typeof(as.integer(1))
# Convert back from integer to double
as.numeric(as.integer(1))
```
Our first non-numeric data type is the Logical. A Logical takes on the value of TRUE or FALSE. You must type TRUE and FALSE in all capital letters for R to recognize them as logical values. Data that only takes on the values of True or False are also called "Booleans".
```{r}
typeof(TRUE)
typeof(FALSE)
typeof(T)
typeof(F)
```
You can create logical values with logical comparisons. R supports a variety of standard logic comparison operators including > (greater than), < (less than), >= (greater than or equal), <= (less than or equal).
```{r}
20 > 20
20 >= 20
10 == 10
10 != 20
!FALSE
(2 > 1) & (10 == 9)
(2 > 1) | (10 == 9)
```
Strings of text in R are known as characters. Surround text with quotation marks to create a character.
```{r}
typeof("cat")
typeof("1")
typeof(as.numeric("12"))
typeof(as.character(12))
```
## Simple Arithmetic Operations¶
```{r}
12 + 6 # addition
12 - 6 # subtraction
12 * 6 # multiplication
12 / 6 # division
12^6 # exponentiation
12**6 # exponentiation
12 %% 6 # modulo (get remainder)
```
## Variables
A variable is a name you assign a value or object. After assigning a variable, you can access its associated value or object using the variable's name. To simply put, this is how we store data. In R, assign variables using <- (the less than sign followed by a hyphen.).
```{r}
var <- 3 + 3
x <- 10
y <- "R Workshop"
z <- (sqrt(144) == 12)
print(var)
print(x)
print(y)
print(z)
```
It is possible to assign variables in R using the equals symbol = instead of <-. The equal sign is used for variable assignment in many other programming languages (such as Python). One reason for using <- besides conforming to the style preferred by the R community is that the equals sign is used in places other than variable assignment statements. Functions often take named arguments and when calling a function you use the = symbol to assign values to named arguments.
## Vectors or Collections
A vector is a sequence of data elements of the same data type. You can have numeric, logical and character vectors.
To create and store a vector with specific values, use the c() function and assign the result to a variable. c() takes a comma separated sequence of elements as input and combines them into a vector. You can also combine two vectors using the c() function. If you try to combine vectors of different types, R will automatically convert the vectors into the type that fits best.
```{r}
# Creating a character vector for the days of the week
weekday <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
weekend <- c("Saturday", "Sunday")
days <- c(weekday, weekend)
print(days)
```
**Vector Indexing Starts at 1**
The first element is at index position 1, the second element is at index position 2 and so on.
*Note: unlike many other programming languages, indexes in R start at 1 instead of 0.
When you print a vector to the screen, each line starts with a number in square brackets followed by vector values. The number in square brackets indicates the index of the next value listed on that line.
You can access a specific value in a vector by typing the name of the vector and then wrapping the index associated with the value you want to access in square brackets.
```{r}
days[1]
```
**Range of Values: Inclusive**
You can access ranges of values by placing a colon between the starting and ending indices of the range:
```{r}
days[1:3]
```
**Pulling out specific elements from Vectors**
```{r}
days[c(1, 3, 5, 7)]
```
**Subset out of a Collection**
A subset of a vector is just a shorter vector. You can access a specific subset of values by wrapping a vector in the square brackets.
```{r}
weekdays <- days[1:5]
weekdays
```
**Generate a vector using 100 random Numbers**
```{r}
random_data <- runif(100) # Create a vector of 100 random numbers
print(random_data)
print(length(random_data))
```
**Filtering of Vectors using Logical Expressions**
You can also index a vector with a logical vector of the same length. In this case, the subset is created from each index where the corresponding logical vector is TRUE. Indexing with a logical vector is a common way to filter a numeric or character vector for values that fulfill certain criteria.
```{r}
# Exclude everything except for your specified index
y <- c(1, 0, 3)
y <- y[-2]
y
# Exclude the range 2 to 9
random_data <- runif(50)
random_data_sub <- random_data[-(2:49)]
random_data_sub
# Exclude using logical expression
over_half <- (random_data > 0.5)
new_subset <- random_data[over_half]
new_subset
```
**Use %in% to filter a vector**
```{r}
my_letters <- c("a", "b", "c", "d", "a", "c")
# Get only the a's and c's
my_letters[my_letters %in% c("a", "c")]
```
## Vectorized Operations
Many R functions and operations behave in a "vectorized" manner, meaning they act upon each element of a vector individually and return the result of each of the operations in a new vector. Vectorized operations simplify the process of performing the same calculations on related data. All the basic operators and functions we've learned so far that operate on single values work on vectors longer than length 1.
```{r}
example_vector <- c(1, 2, 3)
# adds to each value in the vector
example_vector + 10
# performs subtraction on each value
example_vector - 10
```
**Different Ways to Generate Vectors**
```{r}
x <- 1:20
y <- seq(from = 1, to = 20, by = 2)
r <- rep(1, times = 10)
s <- runif(n = 5, min = 0, max = 100)
x
y
r
s
```
## Control Flow
### If, Else and Else If
An if statement checks whether some logical expression is true or false and executes a specified block of code if the logical expression is true.
In R, an if statement starts with if, followed by a logical expression in parentheses, followed by the code to execute when the if statement is true in curly braces.
If statements are often accompanied by else statements. Else statements come after if statements and allow you to execute code in the event that the logical expression of an if statement is false.
```{r}
x <- 5
if (x > 0) {
print("Positive number")
} else {
print("Negative number")
}
```
### For Loops
For loops are a programming construct that let you go through each item in a sequence and then perform some operation on each one.
```{r}
my_sequence <- seq(0, 100, 10)
# Create a new loop over the specified items
for (item in my_sequence) {
print(item)
}
```
**The next keyword causes a for loop to skip to the next iteration of the loop.**
```{r}
my_sequence <- seq(0, 100, 10)
for (item in my_sequence) {
if (item < 50) { # this if statement skips items less than 50
next
}
print(item)
}
```
**The break keyword halts the execution of the loop entirely. Use break to break out of a loop.**
```{r}
my_sequence <- seq(0, 100, 10)
for (item in my_sequence) {
if (item > 50) {
break
}
print(item)
}
```
### While Loops
While loops are similar to for loops in that they allow you to execute code over and over again. For loops execute their contents, at most, a number of iterations equal to the length of the sequence you are looping over. While loops, on the other hand, **keep executing their contents as long as a certain logical expression you supply remains true.**
```{r}
x <- 5
iterations <- 0
# Execute as long as iterations < x
while (iterations < x) {
print("Study")
iterations <- iterations + 1 # Increment iterations by 1 each time the loop executes
}
```
### If Else on Vectors
For example, imagine you have a vector of numbers and you want to set all the negative values in the vector to zero. One way to do it is to use a for loop with an inner if statement.
```{r}
my_vect <- runif(25, -1, 1) # Generate some random data between -1 and 1
for (index in 1:length(my_vect)) { # loop through the sequence 1:25
number <- my_vect[index] # look up the next number using indexing
if (number < 0) { # check if the number is less than 0
my_vect[index] <- 0 # if so, set it to 0
}
}
print(my_vect)
```
Using a for loop requires writing quite a bit of code and loops are not particularly fast.
Instead we could have used R's ifelse() function to the same thing in a vectorized manner. ifelse() takes a logical test as the first argument, a value to return if the test is true as the second argument and a value to return if the test is false as the third argument:
```{r}
data <- c(11, 7, NA, 9, NA, 13, 15, NaN, 19, 17, 14, NaN)
# Use if else statements to conditionally fill these bad values with the mean
ifelse(is.na(data) | is.nan(data), # logical check
mean(data, na.rm = T), # value to set if TRUE
data
) # value to set if FALSE
```
```{r}
# Chaining ifelse to perform multiple operations
data <- c(11, 7, NA, 9, NA, 13, 15, NaN, 19, 17, 14, NaN)
d <- ifelse(is.na(data) | is.nan(data), "missing", ifelse(data < 10, "low", ifelse(data < 15, "medium", "high")))
table(d)
```
## Functions
A function is just an R object that runs a per-defined snippet of code, usually on some input that you supply to it. A function can return an output based on the input you provide. For example, the sum() function built into R simply takes a numeric vector as input and returns their sum as output. Built-in functions and packages can take you a long way in R, but it can be useful to define your own functions to perform specific tasks outside the scope of built-in functions.
Create your own function in R using this syntax:
```{r}
# Assign the function() to a name and declare arguments within ()
new_function <- function(arguments) {
# Write a function body within the {} to execute
for (x in 1:arguments) {
print("This is a function!")
}
}
```
Here is an actual example
```{r}
exampleFunction <- function(x, y) {
c(x + 1, y + 10)
}
exampleFunction(2, 4)
```
**Functions with Return Value**
Functions in R return the last expression evaluated by default.
```{r}
add_10 <- function(number) {
number + 10
# return (number+10)
}
add_10(5)
add_20 <- function(number) {
return(number + 20) # Exit and return specified value
number + 10 # The function exits before running this line
}
```
## Function Arguments
A function can have one or more named arguments. You can assign a default value to an argument when creating a function with the argument_name = argument_value syntax.
```{r}
sum_3_items <- function(x, y, z, print_args = TRUE) {
if (print_args) {
print(x)
print(y)
print(z)
}
return(x + y + z)
}
sum <- sum_3_items(1, 2, 3)
sum2 <- sum_3_items(10, 20, 30, print_args = FALSE)
```
**Ellipsis**
The ... argument collects all extra arguments passed to a function that are not matched. The ... argument can be used in functions where the number of arguments is not known in advance.
```{r}
addition_function <- function(...) {
total <- 0
# list (...) extracts the arguments to a list
for (value in list(...)) {
# Add each argument in ... to the total
total <- total + value
}
total
}
addition_function(2, 4, 6, 8, 10, 12, 14)
```