-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathread_resource.Rd
240 lines (208 loc) · 9.67 KB
/
read_resource.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/read_resource.R
\name{read_resource}
\alias{read_resource}
\alias{read_interlaced_resource}
\title{Read data from a Data Resource into a tibble data frame}
\usage{
read_resource(package, resource_name, col_select = NULL, interlaced = FALSE)
read_interlaced_resource(...)
}
\arguments{
\item{package}{Data Package object, created with \code{\link[=read_package]{read_package()}} or
\code{\link[=create_package]{create_package()}}.}
\item{resource_name}{Name of the Data Resource.}
\item{col_select}{Character vector of the columns to include in the result,
in the order provided.
Selecting columns can improve read speed.}
\item{interlaced}{Boolean value indicating if interlaced columns should
be loaded using the interlacer package.}
\item{...}{arguments to pass to `read_resource()``}
}
\value{
\code{\link[=tibble]{tibble()}} data frame with the Data Resource's tabular data.
If there are parsing problems, a warning will alert you.
You can retrieve the full details by calling \code{\link[=problems]{problems()}} on your data
frame.
}
\description{
Reads data from a \href{https://specs.frictionlessdata.io/data-resource/}{Data Resource} (in a Data
Package) into a tibble (a Tidyverse data frame).
The resource must be a \href{https://specs.frictionlessdata.io/tabular-data-resource/}{Tabular Data Resource}.
The function uses \code{\link[readr:read_delim]{readr::read_delim()}} to read CSV files, passing the
resource properties \code{path}, CSV dialect, column names, data types, etc.
Column names are taken from the provided Table Schema (\code{schema}), not from
the header in the CSV file(s).
}
\section{Resource properties}{
The \href{https://specs.frictionlessdata.io/data-resource/}{Data Resource properties} are handled as
follows:
\subsection{Path}{
\href{https://specs.frictionlessdata.io/data-resource/#data-location}{\code{path}} is
required.
It can be a local path or URL, which must resolve.
Absolute path (\code{/}) and relative parent path (\verb{../}) are forbidden to avoid
security vulnerabilities.
When multiple paths are provided (\verb{"path": [ "myfile1.csv", "myfile2.csv"]})
then data are merged into a single data frame, in the order in which the
paths are listed.
}
\subsection{Data}{
If \code{path} is not present, the function will attempt to read data from the
\code{data} property.
\strong{\code{schema} will be ignored}.
}
\subsection{Name}{
\code{name} is \href{https://specs.frictionlessdata.io/data-resource/#name}{required}.
It is used to find the resource with \code{name} = \code{resource_name}.
}
\subsection{Profile}{
\code{profile} is
\href{https://specs.frictionlessdata.io/tabular-data-resource/#specification}{required}
to have the value \code{tabular-data-resource}.
}
\subsection{File encoding}{
\code{encoding} (e.g. \code{windows-1252}) is
\href{https://specs.frictionlessdata.io/data-resource/#optional-properties}{required}
if the resource file(s) is not encoded as UTF-8.
The returned data frame will always be UTF-8.
}
\subsection{CSV Dialect}{
\code{dialect} properties are
\href{https://specs.frictionlessdata.io/csv-dialect/#specification}{required} if
the resource file(s) deviate from the default CSV settings (see below).
It can either be a JSON object or a path or URL referencing a JSON object.
Only deviating properties need to be specified, e.g. a tab delimited file
without a header row needs:
\if{html}{\out{<div class="sourceCode json">}}\preformatted{"dialect": \{"delimiter": "\\t", "header": "false"\}
}\if{html}{\out{</div>}}
These are the CSV dialect properties.
Some are ignored by the function:
\itemize{
\item \code{delimiter}: default \verb{,}.
\item \code{lineTerminator}: ignored, line terminator characters \code{LF} and \code{CRLF} are
interpreted automatically by \code{\link[readr:read_delim]{readr::read_delim()}}, while \code{CR} (used by
Classic Mac OS, final release 2001) is not supported.
\item \code{doubleQuote}: default \code{true}.
\item \code{quoteChar}: default \verb{"}.
\item \code{escapeChar}: anything but \verb{\\} is ignored and it will set \code{doubleQuote} to
\code{false} as these fields are mutually exclusive.
You can thus not escape with \verb{\\"} and \code{""} in the same file.
\item \code{nullSequence}: ignored, use \code{missingValues}.
\item \code{skipInitialSpace}: default \code{false}.
\item \code{header}: default \code{true}.
\item \code{commentChar}: not set by default.
\item \code{caseSensitiveHeader}: ignored, header is not used for column names, see
Schema.
\item \code{csvddfVersion}: ignored.
}
}
\subsection{File compression}{
Resource file(s) with \code{path} ending in \code{.gz}, \code{.bz2}, \code{.xz}, or \code{.zip} are
automatically decompressed using default \code{\link[readr:read_delim]{readr::read_delim()}}
functionality.
Only \code{.gz} files can be read directly from URL \code{path}s.
Only the extension in \code{path} can be used to indicate compression type,
the \code{compression} property is
\href{https://specs.frictionlessdata.io/patterns/#specification-3}{ignored}.
}
\subsection{Ignored resource properties}{
\itemize{
\item \code{title}
\item \code{description}
\item \code{format}
\item \code{mediatype}
\item \code{bytes}
\item \code{hash}
\item \code{sources}
\item \code{licenses}
}
}
}
\section{Table schema properties}{
\code{schema} is required and must follow the \href{https://specs.frictionlessdata.io/table-schema/}{Table Schema} specification.
It can either be a JSON object or a path or URL referencing a JSON object.
\itemize{
\item Field \code{name}s are used as column headers.
\item Field \code{type}s are use as column types (see further).
\item \href{https://specs.frictionlessdata.io/table-schema/#missing-values}{\code{missingValues}}
are used to interpret as \code{NA}, with \code{""} as default.
}
\subsection{Field types}{
Field \code{type} is used to set the column type, as follows:
\itemize{
\item \href{https://specs.frictionlessdata.io/table-schema/#string}{string} as
\code{character}; or \code{factor} when \code{enum} is present.
\code{format} is ignored.
\item \href{https://specs.frictionlessdata.io/table-schema/#number}{number} as
\code{double}; or \code{factor} when \code{enum} is present.
Use \code{bareNumber: false} to ignore whitespace and non-numeric characters.
\code{decimalChar} (\code{.} by default) and \code{groupChar} (undefined by default) can
be defined, but the most occurring value will be used as a global value for
all number fields of that resource.
\item \href{https://specs.frictionlessdata.io/table-schema/#integer}{integer} as
\code{double} (not integer, to avoid issues with big numbers); or \code{factor} when
\code{enum} is present.
Use \code{bareNumber: false} to ignore whitespace and non-numeric characters.
\item \href{https://specs.frictionlessdata.io/table-schema/#boolean}{boolean} as
\code{logical}.
Non-default \code{trueValues/falseValues} are not supported.
\item \href{https://specs.frictionlessdata.io/table-schema/#object}{object} as
\code{character}.
\item \href{https://specs.frictionlessdata.io/table-schema/#array}{array} as
\code{character}.
\item \href{https://specs.frictionlessdata.io/table-schema/#date}{date} as \code{date}.
Supports \code{format}, with values \code{default} (ISO date), \code{any} (guess \code{ymd})
and \href{https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior}{Python/C strptime}
patterns, such as \verb{\%a, \%d \%B \%Y} for \verb{Sat, 23 November 2013}.
\verb{\%x} is \verb{\%m/\%d/\%y}.
\verb{\%j}, \verb{\%U}, \verb{\%w} and \verb{\%W} are not supported.
\item \href{https://specs.frictionlessdata.io/table-schema/#time}{time} as
\code{\link[hms:hms]{hms::hms()}}.
Supports \code{format}, with values \code{default} (ISO time), \code{any} (guess \code{hms})
and \href{https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior}{Python/C strptime}
patterns, such as \verb{\%I\%p\%M:\%S.\%f\%z} for \verb{8AM30:00.300+0200}.
\item \href{https://specs.frictionlessdata.io/table-schema/#datetime}{datetime} as
\code{POSIXct}.
Supports \code{format}, with values \code{default} (ISO datetime), \code{any}
(ISO datetime) and the same patterns as for \code{date} and \code{time}.
\verb{\%c} is not supported.
\item \href{https://specs.frictionlessdata.io/table-schema/#year}{year} as \code{date},
with \code{01} for month and day.
\item \href{https://specs.frictionlessdata.io/table-schema/#yearmonth}{yearmonth} as
\code{date}, with \code{01} for day.
\item \href{https://specs.frictionlessdata.io/table-schema/#duration}{duration} as
\code{character}.
Can be parsed afterwards with \code{\link[lubridate:duration]{lubridate::duration()}}.
\item \href{https://specs.frictionlessdata.io/table-schema/#geopoint}{geopoint} as
\code{character}.
\item \href{https://specs.frictionlessdata.io/table-schema/#geojson}{geojson} as
\code{character}.
\item \href{https://specs.frictionlessdata.io/table-schema/#any}{any} as \code{character}.
\item Any other value is not allowed.
\item Type is guessed if not provided.
}
}
}
\examples{
# Read a datapackage.json file
package <- read_package(
system.file("extdata", "datapackage.json", package = "frictionless")
)
package
# Read data from the resource "observations"
read_resource(package, "observations")
# The above tibble is merged from 2 files listed in the resource path
package$resources[[2]]$path
# The column names and types are derived from the resource schema
purrr::map_chr(package$resources[[2]]$schema$fields, "name")
purrr::map_chr(package$resources[[2]]$schema$fields, "type")
# Read data from the resource "deployments" with column selection
read_resource(package, "deployments", col_select = c("latitude", "longitude"))
}
\seealso{
Other read functions:
\code{\link{read_package}()},
\code{\link{resources}()}
}
\concept{read functions}