-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Conversation
Can you create an issue (maybe several) to track this work and add it to your PR description? |
@garious on it |
3bc54e4
to
8dd82a5
Compare
src/blob_store/store_impl.rs
Outdated
let mut buf = [0u8; INDEX_RECORD_SIZE as usize]; | ||
while let Ok(_) = index_file.read_exact(&mut buf) { | ||
let index = BigEndian::read_u64(&buf[0..8]); | ||
if index == blob_index { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code takes N/2 to find the index record, it could be constant time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't I have to ensure that indexes were written in-order to make that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, you can seek() to the spot you need to write() at in the index file. the file will grow automatically, and will be filled with zeros (which you can call "invalid" offsets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'm already doing this in another place, will make this change tomorrow
// Generate a num_reads sized random sample of indexes in range [0, total_blobs - 1], | ||
// simulating random reads | ||
let mut rng = rand::thread_rng(); | ||
let indexes: Vec<usize> = (0..num_reads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you intending to verify that caching helps, once implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I've also experimented with parallelizing writing (partitioned based on slots) but I couldn't get it to work safely without needing to copy so much memory that it was much slower overall.
The little caching I've done so far made benchmark ns/iter ~70-80% of db_ledger
benchmarks when before it was about ~110-120% .
I tried several different ways of exploiting concurrency including using tokio
and futures, creating a separate writer thread that communicated over std::sync::mpsc
and crossbeam-channel::
channels, etc.
But I could not do it both 1) safely 2) without so much copying that it made everything much slower.
can you guys sync up with @sakridge on sakridge@aea2639 |
pub struct RecordFile<T> { | ||
file: File, | ||
current_offset: u64, | ||
buf: RefCell<Vec<u8>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is buf
in the RecordFile struct and not just a stack variable in the calling functions if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my original goal, but I ran into the fact that there's no way to use an associated consts from a trait in an array expression currently [1] so I decided to use a single vector that was sized appropriately.
@MarkJr94 , I am trying to build this PR locally, and getting these build errors. Does it need rebase?
|
@pgarg66 I'm looking into it right now, issue is likely because I have been using the nightly compiler locally. |
…nd blob accessors unimplemented
only accepted as individual raw Vec<u8> buffers for now
about 35% faster now, faster than DbLedger in benchmarks.
blob.index and blob.slot no longer return `Result`s
@pgarg66 should now build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkJr94 , I think I am making assumptions while reviewing the code. Do you happen to have a small writeup/diagram that explains how different pieces interact with each other? My current understanding is this:
- BlobStore is the top level interface, and it exports functionality to store blobs per slot. Also it tracks the meta information for the slot (received, consumed, num blobs etc)
- Store provides get/put APIs for blobs and meta
- StoreImpl is providing caching and storing functionality
- RecordFile is filesystem wrapper
Also, there's a ton of use of Generics. Wondering if it's really needed?
I think some sort of documentation will help me understand the code better.
} | ||
|
||
#[inline] | ||
pub fn put_dyn<T>(&mut self, column: &str, key: Key, obj: T) -> Result<()> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods (gets/puts) are mostly called with the same type of T. Any good reason to use generics here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was just to make things generic so that I wasn't duplicating code. The traits are definitely excessive I think. They'll be replaced with methods that just accept any T: Serialize
for put and returns bytes or any T: Deserialize
for gets.
@MarkJr94 Can we also try to split this PR into something smaller? Looks like RecordFile can be a separate PR, then look at the layers on top of that that can be merged. |
This is a work in progress replacement for the storage functionality of DbLedger.
See Issue #2566
upcoming subtasks:
root/0x11/22/33/44/55/66/77/88
solana::db_ledger::SlotMeta
)