Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptor caching for value log #106

Open
marvin-j97 opened this issue Feb 1, 2025 · 0 comments
Open

File descriptor caching for value log #106

marvin-j97 opened this issue Feb 1, 2025 · 0 comments
Labels
blocked enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed performance

Comments

@marvin-j97
Copy link
Contributor

marvin-j97 commented Feb 1, 2025

Currently, when doing a range read over an uncached bunch of blobs, each blob will incur an fopen() syscall.
If the blobs sit in the same blob file, the repeated calls of fopen() can be cached away.

Blob files should also be cached like Segment files are, this needs an adjustment of the descriptor table to allow both LSM segment files and blob files to be stored (probably some kind of compound key), perhaps simply using quick-cache. Because we want to globally cap file descriptor usage, there needs to be a single descriptor cache that houses both segment and blob files. I would recommend rewriting the DescriptorTable, because it's bad.

Blocked by fjall-rs/value-log#9 because the value log needs a new generic parameter to acquire a file descriptor (using a compound key ValueLogId + BlobFileId).

Benchmark of current behaviour, scanning over 5 x 4K blobs.

40% of the runtime sits in fopen(). One fopen() sits at around ~1µs per call.

Image

Reproduction

use lsm_tree::{AbstractTree, BlobCache, BlockCache};
use std::{path::Path, sync::Arc};

fn main() -> lsm_tree::Result<()> {
    let path = Path::new(".lsmdata");
    if path.try_exists()? {
        std::fs::remove_dir_all(path)?;
    }

    let tree = lsm_tree::Config::new(path)
        .compression(lsm_tree::CompressionType::None)
        .blob_compression(lsm_tree::CompressionType::None)
        .block_cache(Arc::new(BlockCache::with_capacity_bytes(1_000_000_000)))
        .blob_cache(Arc::new(BlobCache::with_capacity_bytes(0)))
        .blob_file_separation_threshold(1)
        .open_as_blob_tree()?;

    {
        let value = "a".repeat(4_096);

        for k in 'a'..='e' {
            tree.insert((k as u8).to_be_bytes(), &value, 0);
        }
        tree.flush_active_memtable(0)?;
    }

    for _ in 0..1_000_000 {
        assert_eq!(5, tree.values(None, None).count());
    }

    Ok(())
}
@marvin-j97 marvin-j97 added blocked enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed performance labels Feb 1, 2025
@marvin-j97 marvin-j97 pinned this issue Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed performance
Projects
None yet
Development

No branches or pull requests

1 participant