-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
146 lines (91 loc) · 4.24 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# rd
<!-- badges: start -->
<!-- badges: end -->
**Quick and Easy way to read multiple files from a directory.**
## Installation
You can install rd from [GitHub](https://github.com/Lightbridge-AI/rd) with:
``` r
# install.packages("devtools")
devtools::install_github("Lightbridge-AI/rd")
```
## How to read multiple files from a directory
Usually, if you want to read any multiple files form a directory (folder) into a list you can do this ...
**Step 1 : list all files path with `list.files()`**
```{r eval=FALSE}
abs.path <- list.files("path/to/dir", full.names = T)
```
**Step 2 : Filter files you want**
```{r eval=FALSE}
csv.abs.path <- grep("\\.csv$", abs.path, value = T) # select only .csv files
```
**Step 3 : Loop each file path to read function** (use `utils::read.csv` in this case )
```{r eval=FALSE}
list_of_df <- lapply(csv.abs.path, utils::read.csv) # or you can use `purrr:map()`
```
**Step 4 : Set files name to each component (data frame) of the list**
```{r eval=FALSE}
names(list_of_df) <- sub("\\.[^\\.]+$","", basename(csv.abs.path))
list_of_df
```
`basename()` : gives file names
`sub("\\.[^\\.]+$","", basename(csv.abs.path))` : removes everything after the last dot
## Quick & Easy Way !
`rd` package wrap all of the above code in to a single function: `rd()`
`rd()` has 3 main arguments
1. `.f` : Function to read files from a directory (any function that has file path as first argument will work)
2. `path` : Path to desired directory
3. `pattern` : Regular expression to match file names and/or file extensions
### Examples
- **Read all .csv files form working directory** (default) using `utils::read.csv` .
- File names are automatically set to names of each data frame.
```{r eval=FALSE}
library(rd)
rd(utils::read.csv) # default `pattern` is "\\.csv$" (csv files)
```
- **Read .xlsx file from specified directory** using `readxl::read_excel` .
- Must specify regular expression to match file extension.
```{r eval=FALSE}
rd(readxl::read_excel, path = "path/to/dir" ,pattern = "\\.xlsx$")
```
- **You can pass extra arguments to `.f` by 2 ways**
1. Pass inside `.f` : using formula style similar to [`purrr`](https://purrr.tidyverse.org/reference/map.html) package (recommend)
2. Pass outside `.f` : an argument `...` of `rd()` will passed to `.f`
```{r eval=FALSE}
# Pass `col_names = FALSE` to `readr::read_csv`
rd(~readr::read_csv(.x, col_names = FALSE), recursive = TRUE, snake_case = TRUE) # inside `.f`
rd(readr::read_csv, recursive = TRUE, snake_case = TRUE, col_names = FALSE) # outside `.f`
```
Passed an argument `col_names = FALSE` into `readr::read_csv`
`recursive = TRUE` to also read files from sub-directory
`snake_case = TRUE` to format names to snake_case (require `snakecase` package installed)
- **Read files using multiple engine from multiple path and multiple file extension.**
Using [`purrr::pmap`](https://purrr.tidyverse.org/reference/map2.html) in combination with `rd()`
Now, you can customize to read any files from any directory with any file reading function you want !!
```{r eval=FALSE}
params <- list(
.f = c(~read_csv(.x, col_names = F), readxl::read_excel),
path = c("path/to/dir1", "path/to/dir2"), # `path` can be individual file as well
pattern = c("\\.csv$", "\\.xlsx")
)
purrr::pmap(params, rd)
```
## Note
The `rd()` function was created as a wrapper for possibly any reading functions using functional programming to read multiple files.
**The main goal is to provide a simple and easy way read multiple files from a directory.**
However, if you want to further customize file reading beyond this function can do, have a look at
[`fs`](https://fs.r-lib.org) package for more advance files system operation.
<br>
Argument `.f` of `rd()` actually can be **any function** that take file path as input (similar to [`fs::dir_map`](https://fs.r-lib.org/reference/dir_ls.html) and [`fs::dir_walk`](https://fs.r-lib.org/reference/dir_ls.html) )
However, you should use `rd()` to read files because it was optimized in this way.