Skip to content
Raven Computing edited this page Mar 17, 2021 · 1 revision

Developer Documentation

This is the developer documentation for the rdf library. In the following you'll see examples on how to use the API functions.

Install and Load the Package:

Install the library from CRAN:

install.packages("raven.rdf")

Load the library package with:

library("raven.rdf")

Files

The rdf library provides functions to directly read and write DataFrame files specified by a filepath.

Read DataFrame Files

You can read any DataFrame (.df) file into memory by calling the readDataFrame() function. For example:

df <- readDataFrame("myfile.df")

All column types of the read DataFrame are mapped to the corresponding R types. To be precise, byte, short, int and long are mapped to the R integer type. The float and double types are both mapped to the R double type. The string and char types are both mapped to the R character type. The boolean type is mapped to the R logical type. The binary type is mapped to the R list type. A binary column is therefore represented as a list of raw vectors.

Write R data.frames to DataFrame Files

You can persist data.frame objects to DataFrame files by calling the writeDataFrame() function. For example:

writeDataFrame("myfile.df", df)

The above code persists the df data.frame to a file with the name myfile.df. That file could then be read by another program written in a language for which a DataFrame API implementation exists.

In the previous example all columns are persisted in the largest type possible. For example, all columns containing elements of R type integer are persisted as LongColumns and all columns containing floating point numbers are persisted as DoubleColumns regardless of the numeric range of the data. This is due to the fact that R does not distinguish between e.g. 32-bit and 64-bit floating point numbers.
But since the DataFrame specification allows such type differentiation, the writeDataFrame() function provides a was to directly specify the concrete type of each Column within the data.frame to be persisted by it. Those types must be specified as a vector containing the standardized type names of each Column. The type names and Columns are matched by the position within the specified vector.

The following example shows how the specify concrete Column types in a DataFrame file:

print(df)
#    name age   id group   rate
# 1  Paul  23 1001     A 45.216
# 2 Simon  54 1002     B 60.005
# 3   Bob  31 1003     C 51.140
# 4   Joe  26 1004     A 22.845

coltypes <- c("string", "byte", "int", "char", "float")
writeDataFrame("myfile.df", df, types = coltypes)

With the above code the age column will not be saved as a LongColumn but instead as a ByteColumn. Likewise, the rate column will be saved as a 32-bit FP FloatColumn and not as a 64-bit FP DoubleColumn, and so forth.

Another important thing to note is that a R data.frame is persisted as a DefaultDataFrame instance if it does not contain any NA values. On the other hand, if it contains at least one NA, it will be automatically persisted as a NullableDataFrame instance.
However, you may want to ensure that a R data.frame is always persisted as a NullableDataFrame instance regardless whether it contains any NAs. For that purpose you can use the as.nullable argument flag of the writeDataFrame() function.

For example, the data.frame from the previous example could be compulsorily persisted as a NullableDataFrame like this:

writeDataFrame("myfile.df", df, as.nullable = TRUE)

Of course, both column types and nullable flag can also be used when serializing a data.frame to a raw vector by means of the serializeDataFrame() function.


Serialization

The rdf also provides functions to serialize and deserialize data.frame objects to and from the binary format of Raven DataFrames. This can be useful when R data.frames should not be directly serialized and persisted to a file in the filesystem but instead should be sent or received to or from another process, for example over a network connection.

Serialize a data.frame Object

You can serialize a data.frame to a raw vector by calling the serializeDataFrame() function. For example:

df <- cars # get some data.frame
vec <- serializeDataFrame(df)

You can specify the concrete column types and DataFrame type to use by providing the corresponding arguments. Please see the example of writing DataFrame files above for further details as the same arguments can also be used with the serializeDataFrame() function.

Deserialize a DataFrame

You can deserialize a raw vector back to a data.frame by calling the deserializeDataFrame() function. For example:

vec <- # ... get the serialized data.frame raw vector
df <- deserializeDataFrame(vec)