Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Blobs #141

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/developers/applications/defining-schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ HarperDB supports the following field types in addition to user defined (object)
* `Any`: Any primitive, object, or array is allowed.
* `Date`: A Date object.
* `Bytes`: Binary data (as a Buffer or Uint8Array).
* `Blob`: Binary data designed for large blocks of data that can be streamed. It is recommend that you use this for binary data that will typically be larger than 20KB.

#### Renaming Tables

Expand Down
66 changes: 66 additions & 0 deletions docs/technical-details/reference/blob.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
Blobs are binary large objects that can used to store any type of unstructured/binary data and is designed for large content. Blobs support streaming and feature better performance for content larger than about 20KB. Blobs are built off the native JavaScript `Blob` type, and HarperDB extends the native `Blob` type for integrated storage with the database. To use blobs, you can create a blob which writes the binary data to disk, and can then be included (as a reference) in a record. For example, you can create a record with a blob like:

```javascript
let blob = await createBlob(largeBuffer);
await MyTable.put({ id: 'my-record', data: blob });
```
The `data` attribute in this example is a blob reference, and can be used like any other attribute in the record, but it is stored separately, and the data must be accessed asynchronously. You can retrieve the blob data with the standard Blob methods:

```javascript
let buffer = await blob.bytes();
```

If you are creating a resource method, you can return a `Response` object with a blob as the body:

```javascript
export class MyEndpoint extends MyTable {
async get() {
return {
status: 200,
headers: {},
body: this.data, // this.data is a blob
});
}
}
```
One of the important characteristics of blobs is they natively support asynchronous streaming of data. This is important for both creation and retrieval of large data. When create a blob with `createBlob`, the returned blob will create the storage entry, but the data will be streamed to storage. This means that you can create a blob from a buffer or from a stream. You can also create a record that references a blob before the blob is fully written to storage. For example, you can create a blob from a stream:

```javascript
let blob = await createBlob(stream);
// at this point the blob exists, but the data is still being written to storage
await MyTable.put({ id: 'my-record', data: blob });
// we now have written a record that references the blob
let record = await MyTable.get('my-record');
// we now have a record that gives us access to the blob. We can asynchronously access the blob's data or stream the data, and it will be available as blob the stream is written to the blob.
let stream = record.data.stream();
```
This can be powerful functionality for large media content, where content can be streamed into storage as it streamed out in real-time to users as it is received.
Alternately, we can also wait for the blob to be fully written to storage before creating a record that references the blob:

```javascript
let blob = await createBlob(stream);
// at this point the blob exists, but the data is still being written to storage
await blob.finished;
// we now know the blob is fully written to storage
await MyTable.put({ id: 'my-record', data: blob });
```

### Error Handling
Because blobs can be streamed and referenced prior to their completion, there is a chance that an error or interruption could occur while streaming data to the blob (after the record is committed). We can create an error handler for the blob to handle the case of an interrupted blob:

```javascript
export class MyEndpoint extends MyTable {
let blob = this.data;
blob.on('error', () => {
// if this was a caching table, we may want to invalidate or delete this record:
this.invalidate();
});
async get() {
return {
status: 200,
headers: {},
body: blob
});
}
}
```