Quick and Easy way to read multiple files from a directory.
You can install rd from GitHub with:
# install.packages("devtools")
devtools::install_github("Lightbridge-AI/rd")
Usually, if you want to read any multiple files form a directory (folder) into a list you can do this …
Step 1 : list all files path with list.files()
abs.path <- list.files("path/to/dir", full.names = T)
Step 2 : Filter files you want
csv.abs.path <- grep("\\.csv$", abs.path, value = T) # select only .csv files
Step 3 : Loop each file path to read function (use utils::read.csv
in this case )
list_of_df <- lapply(csv.abs.path, utils::read.csv) # or you can use `purrr:map()`
Step 4 : Set files name to each component (data frame) of the list
names(list_of_df) <- sub("\\.[^\\.]+$","", basename(csv.abs.path))
list_of_df
basename()
: gives file names
sub("\\.[^\\.]+$","", basename(csv.abs.path))
: removes everything
after the last dot
rd
package wrap all of the above code in to a single function: rd()
rd()
has 3 main arguments
-
.f
: Function to read files from a directory (any function that has file path as first argument will work) -
path
: Path to desired directory -
pattern
: Regular expression to match file names and/or file extensions
- Read all .csv files form working directory (default) using
utils::read.csv
. - File names are automatically set to names of each data frame.
library(rd)
rd(utils::read.csv) # default `pattern` is "\\.csv$" (csv files)
- Read .xlsx file from specified directory using
readxl::read_excel
. - Must specify regular expression to match file extension.
rd(readxl::read_excel, path = "path/to/dir" ,pattern = "\\.xlsx$")
-
You can pass extra arguments to
.f
by 2 ways-
Pass inside
.f
: using formula style similar topurrr
package (recommend) -
Pass outside
.f
: an argument...
ofrd()
will passed to.f
-
# Pass `col_names = FALSE` to `readr::read_csv`
rd(~readr::read_csv(.x, col_names = FALSE), recursive = TRUE, snake_case = TRUE) # inside `.f`
rd(readr::read_csv, recursive = TRUE, snake_case = TRUE, col_names = FALSE) # outside `.f`
Passed an argument col_names = FALSE
into readr::read_csv
recursive = TRUE
to also read files from sub-directory
snake_case = TRUE
to format names to snake_case (require snakecase
package installed)
- Read files using multiple engine from multiple path and multiple file extension.
Using purrr::pmap
in combination with rd()
Now, you can customize to read any files from any directory with any file reading function you want !!
params <- list(
.f = c(~read_csv(.x, col_names = F), readxl::read_excel),
path = c("path/to/dir1", "path/to/dir2"), # `path` can be individual file as well
pattern = c("\\.csv$", "\\.xlsx")
)
purrr::pmap(params, rd)
The rd()
function was created as a wrapper for possibly any reading
functions using functional programming to read multiple files.
The main goal is to provide a simple and easy way read multiple files from a directory.
However, if you want to further customize file reading beyond this function can do, have a look at
fs
package for more advance files system
operation.
Argument .f
of rd()
actually can be any function that take file
path as input (similar to
fs::dir_map
and
fs::dir_walk
)
However, you should use rd()
to read files because it was optimized in
this way.