|
| 1 | +// Copyright 2024 The LevelDB-Go and Pebble Authors. All rights reserved. Use |
| 2 | +// of this source code is governed by a BSD-style license that can be found in |
| 3 | +// the LICENSE file. |
| 4 | + |
| 5 | +// Package base defines fundamental types used across Pebble, including keys, |
| 6 | +// iterators, etc. |
| 7 | +// |
| 8 | +// # Iterators |
| 9 | +// |
| 10 | +// The [InternalIterator] interface defines the iterator interface implemented |
| 11 | +// by all iterators over point keys. Internal iterators are composed to form an |
| 12 | +// "iterator stack," resulting in a single internal iterator (see mergingIter in |
| 13 | +// the pebble package) that yields a merged view of the LSM. |
| 14 | +// |
| 15 | +// The SeekGE and SeekPrefixGE positioning methods take a set of flags |
| 16 | +// [SeekGEFlags] allowing the caller to provide additional context to iterator |
| 17 | +// implementations. |
| 18 | +// |
| 19 | +// ## TrySeekUsingNext |
| 20 | +// |
| 21 | +// The TrySeekUsingNext flag is set when the caller has knowledge that no action |
| 22 | +// has been performed to move this iterator beyond the first key that would be |
| 23 | +// found if this iterator were to honestly do the intended seek. This allows a |
| 24 | +// class of optimizations where an internal iterator may avoid a full naive |
| 25 | +// repositioning if the iterator is already at a proximate position. |
| 26 | +// |
| 27 | +// Let [s] be the seek key of an InternalIterator.Seek[Prefix]GE operation with |
| 28 | +// TrySeekSeekUsingNext()=true on an internal iterator positioned at the key k_i |
| 29 | +// among k_0, k_1, ..., k_n keys known to the internal iterator. We maintain the |
| 30 | +// following universal invariants: |
| 31 | +// |
| 32 | +// U1: For all the internal iterators' keys k_j st j<i [all keys before its |
| 33 | +// current key k_i], one or more of the following hold: |
| 34 | +// |
| 35 | +// - (a) k_j < s |
| 36 | +// - (b) k_j is invisible at the iterator's sequence number |
| 37 | +// - (c) k_j is deleted by a visible range tombstone |
| 38 | +// - (d) k_j is deleted by a visible point tombstone |
| 39 | +// - (e) k_j is excluded by a block property filter, range key masking, etc. |
| 40 | +// |
| 41 | +// This contract must hold for every call passing TrySeekUsingNext, including |
| 42 | +// calls within the interior of the iterator stack. It's the responsibility of |
| 43 | +// each caller to preserve this relationship. Intuitively, the caller is |
| 44 | +// promising that nothing behind the iterator's current position is relevant and |
| 45 | +// the callee may search in the forward direction only. Note that there is no |
| 46 | +// universal responsibility on the callee's behavior outside the ordinary seek |
| 47 | +// operation's contract, and the callee may freely ignore the flag entirely. |
| 48 | +// |
| 49 | +// In addition to the universal invariants, the merging iterator and level |
| 50 | +// iterator impose additional invariants on TrySeekUsingNext due to their |
| 51 | +// responsibilities of applying range deletions and surfacing files' range |
| 52 | +// deletions respectively. |
| 53 | +// |
| 54 | +// Let [s] be the seek key of a Seek[Prefix]GE operation on a merging iterator, |
| 55 | +// and [s2] be the seek key of the resulting Seek[Prefix]GE operation on a level |
| 56 | +// iterator at level l_i among levels l_0, l_1, ..., l_n, positioned at the file |
| 57 | +// f_i among files f_0, f_1, ..., f_n and the key k_i among keys k_0, k_1, ..., |
| 58 | +// k_n known to the internal iterator. We maintain the following merging |
| 59 | +// iterator invariants: |
| 60 | +// |
| 61 | +// M1: Cascading: If TrySeekUsingNext is propagated to the level iterator at |
| 62 | +// level l_i, TrySeekUsingNext must be propagated to all the merging iterator's |
| 63 | +// iterators at levels j > i. |
| 64 | +// M2: File monotonicity: If TrySeekUsingNext is propagated to a level iterator, |
| 65 | +// the level iterator must return not return a key from a file f_j where j < i, |
| 66 | +// even if file f_j includes a key k_j such that s2 ≤ k_j < k_i. |
| 67 | +// |
| 68 | +// Together, these invariants ensure that any range deletions relevant to |
| 69 | +// lower-levelled keys are either in currently open files or future files. |
| 70 | +// |
| 71 | +// Description of TrySeekUsingNext mechanics across the iterator stack: |
| 72 | +// |
| 73 | +// As the top-level entry point of user seeks, the [pebble.Iterator] is |
| 74 | +// responsible for detecting when consecutive user-initiated seeks move |
| 75 | +// monotonically forward. It saves seek keys and compares consecutive seek keys |
| 76 | +// to decide whether to propagate the TrySeekUsingNext flag to its |
| 77 | +// [InternalIterator]. |
| 78 | +// |
| 79 | +// The [pebble.Iterator] also has its own TrySeekUsingNext optimization in |
| 80 | +// SeekGE: Above the [InternalIterator] interface, the [pebble.Iterator]'s |
| 81 | +// SeekGE method detects consecutive seeks to monotonically increasing keys and |
| 82 | +// examines the current key. If the iterator is already positioned appropriately |
| 83 | +// (at a key ≥ the seek key), it elides the entire seek of the internal |
| 84 | +// iterator. |
| 85 | +// |
| 86 | +// The pebble mergingIter does not perform any TrySeekUsingNext optimization |
| 87 | +// itself, but it must preserve the universal U1 invariant, as well as the M1 |
| 88 | +// invariant specific to the mergingIter. It does both by always translating |
| 89 | +// calls to its SeekGE and SeekPrefixGE methods as equivalent calls to every |
| 90 | +// child iterator. There are subtleties: |
| 91 | +// |
| 92 | +// - The mergingIter takes care to avoid ever advancing a child iterator |
| 93 | +// that's already positioned beyond the current iteration prefix. During |
| 94 | +// prefix iteration, some levels may omit keys that don't match the |
| 95 | +// prefix. Meanwhile the merging iterator sometimes skips keys (eg, due to |
| 96 | +// visibility filtering). If we did not guard against iterating beyond the |
| 97 | +// iteration prefix, this key skipping could move some iterators beyond the |
| 98 | +// keys that were omitted due to prefix mismatch. A subsequent |
| 99 | +// TrySeekUsingNext could surface the omitted keys, but not relevant range |
| 100 | +// deletions that deleted them. |
| 101 | +// |
| 102 | +// The pebble levelIter makes use of the TrySeekUsingNext flag to avoid a naive |
| 103 | +// seek within the level's B-Tree of files. When TrySeekUsingNext is passed by |
| 104 | +// the caller, the relevant key must fall within the current file or a later |
| 105 | +// file. The search space is reduced from (-∞,+∞) to [current file, +∞). If the |
| 106 | +// current file's bounds overlap the key, the levelIter propagates the |
| 107 | +// TrySeekUsingNext to the current sstable iterator. If the levelIter must |
| 108 | +// advance to a new file, it drops the flag because the new file's sstable |
| 109 | +// iterator is still unpositioned. |
| 110 | +// |
| 111 | +// In-memory iterators arenaskl.Iterator and batchskl.Iterator make use of the |
| 112 | +// TrySeekUsingNext flag, attempting a fixed number of Nexts before falling back |
| 113 | +// to performing a seek using skiplist structures. |
| 114 | +// |
| 115 | +// The sstable iterators use the TrySeekUsingNext flag to avoid naive seeks |
| 116 | +// through a table's index structures. See the long comment in |
| 117 | +// sstable/reader_iter.go for more details: |
| 118 | +// - If an iterator is already exhausted, either because there are no |
| 119 | +// subsequent point keys or because the upper bound has been reached, the |
| 120 | +// iterator uses TrySeekUsingNext to avoid any repositioning at all. |
| 121 | +// - Otherwise, a TrySeekUsingNext flag causes the sstable Iterator to Next |
| 122 | +// forward a capped number of times, stopping as soon as a key ≥ the seek key |
| 123 | +// is discovered. |
| 124 | +// - The sstable iterator does not always position itself in response to a |
| 125 | +// SeekPrefixGE even when TrySeekUsingNext()=false, because bloom filters may |
| 126 | +// indicate the prefix does not exist within the file. The sstable iterator |
| 127 | +// takes care to remember when it didn't position itself, so that a |
| 128 | +// subsequent seek using TrySeekUsingNext does NOT try to reuse the current |
| 129 | +// iterator position. |
| 130 | +package base |
0 commit comments