Feat/non blocking writes #44

Nuhvi · 2024-10-16T15:09:59Z

Note: This PR causes a breaking change without migrating script, but Pubky app and Nexus shouldn't be affected much. That being said, try on staging and let me know if anything broke please.

What does this do?

Before this PR, we are currently loading the entire blob of data in memory, both while downloading (PUT) or uploading (GET) requests. That was fine while the data was kept very small (16kb) but since we changed that to 100mb limit, it is not a good idea to hold that much data in memory, even worse a client streaming in (or reading) blobs can be very slow, causing issues either in Tokio runtime, or worse blocking LMDB with a write transaction for long time.

In this PR, we change things so that we buffer incoming data to a temporary file first, then once that is done, we start a write transaction, and load the data from the file system to LMDB.

Secondly, this PR breaks down the blob into chunks that are optimized to fit two of which in each OS page size, this way LMDB doesn't waste pages on unaligned data.

We now have new structs to make writing an Entry as well as reading the content of an entry iteratable.

Finally, we return the content hash as an Etag among other headers. More work on headers will come in other PRs, including work on content_type sniffing.

What does this PR Break

Previously Blobs were just saved as one entry in the blobs table, and it was identified by its hash, and has a reference count, now we give up on this deduplication by hash, and just use the timestamp + chunk index as keys of chunks, that saves space on keys, but means that two people uploading the same content will be stored twice... I think that is ok and we shouldn't worry about deduplication since it rarely is useful, and adds too much complexity. I would rather sacrifice disk space for simplicity for now.

What is this PR NOT?

This PR does NOT add range queries, you can't (yet) request a frame in a video or anything like that, but it certainly makes it easier in the future.

Nuhvi · 2024-10-16T18:37:36Z

Seems to work ... send it

Nuhvi added 3 commits October 16, 2024 16:26

feat(homeserver): handle streaming large files in and out

3c5b123

feat(homeserver): add Entry::read_content() method

8dde7b1

feat(homeserver): optimize chunk size for LMDB details

fd88986

Nuhvi requested review from MiguelMedeiros, amirRamirfatahi and SHAcollision October 16, 2024 15:09

Nuhvi added 3 commits October 16, 2024 18:11

chore(homeserver): remove debug_handler

aace1aa

fix(homeserver): fix chunk size calculation

060acb0

test(homeserver): remove system dependent test

ffd3cfb

Nuhvi merged commit eb5beea into main Oct 16, 2024
1 check passed

Nuhvi deleted the feat/non-blocking-writes branch November 15, 2024 12:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/non blocking writes #44

Feat/non blocking writes #44

Nuhvi commented Oct 16, 2024

Nuhvi commented Oct 16, 2024

Feat/non blocking writes #44

Feat/non blocking writes #44

Conversation

Nuhvi commented Oct 16, 2024

What does this do?

What does this PR Break

What is this PR NOT?

Nuhvi commented Oct 16, 2024