Perf: Add BufferedIoRead for ~28.6% faster from_reader parsing #1294
      
        
          +742
        
        
          −3
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This PR introduces a new, high-performance
Readimplementation,BufferedIoRead, to significantly speed up deserialization fromstd::io::Readsources.The Problem
The current
IoReadimplementation is a simple byte-by-byte iterator over anio::Readsource. Because it operates one byte at a time, it has two major performance drawbacks:SliceRead's powerful,memchr-based optimizations (likeskip_to_escape) because it doesn't have a slice to operate on.This remains true even when a user manually wraps their reader in a
std::io::BufReader.IoReadis unaware of the underlying buffer and cannot take advantage of it, so the per-byte overhead remains.The Solution:
BufferedIoReadThis new implementation,
BufferedIoRead<R, B>, wraps anio::Readsource and uses an internal buffer (generic overAsMut<[u8]>).Its core optimization is simple but powerful: it creates a temporary
SliceReadover its internal buffer.This allows the deserializer to use the hyper-optimized
SliceReadpaths (likeskip_to_escape) for large chunks of data at a time. The per-byte bookkeeping logic is now deferred and runs only once per buffer refill, rather than once per byte.The implementation intelligently handles all parsing logic (including strings, escape sequences, and
raw_value) across buffer boundaries, ensuring correctness while maximizing performance.Performance Benchmarks
The results speak for themselves. Benchmarking against the
canada.json(2.2MB) file shows a ~28.6% speedup for streaming deserialization compared to the currentfrom_reader+BufReaderapproach.BufferedIoReadcloses a significant portion of the gap between streaming parsing (from_reader) and non-streaming parsing (from_slice).from_sliceread_to_end_then_slicefrom_reader(stdBufReader8k)BufferedIoRead(ours, 8k buffer)BufferedIoRead(ours, 16k buffer)Testing
Benchmark Methodology
Details
Correctness Tests
This PR includes extensive new integration tests (
tests/buffered_io.rs) that "torture test" the buffer boundary logic. These tests use a customSlowReaderto force buffer refills at critical parsing points (e.g., in the middle of a string, during a\uescape sequence, and while parsing aRawValue) to ensure correctness.