Skip to content

decoder slowness when reading a zlib/zstd .NewReader() due to small .Read() buffers #635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
folays opened this issue Mar 16, 2025 · 2 comments
Assignees

Comments

@folays
Copy link

folays commented Mar 16, 2025

cbor decoder is slow WHEN reading from a decompressor returning small buffers.

This seems to cause the cbor validation/"wellformed" code to do a lot of probably "restart" or something like that.

When we interpose a "greedy" reader (cbor decoder <- greedyReader <- decompressor) its fast again.

Go's default json.Unmarshal() does not expose this behavior.
(the tests below show that it is undisturbed by io.Reader returning lot of small .Read() buffers)

The aforementioned slowness occurs with :

  • compress/zlib (and compress/gzip) // zlib generally returns >= 31 kB .Read()
  • compress/lzw
  • github.com/klauspost/compress/zstd // zstd generally returns >= 493 kB .Read()

Albeit zstd is slightly less worse, because zstd's Reader returns bigger .Read().

Here below are 2 * 3 * 2 combination lines of tests bench :

  • 2x : JSON / CBOR
  • 3x : no-op decompressor / zlib / zstd
  • 2x : ungreedy / greedy (reading of decompressor .Read())

The tests encode/decode an approximate of map[128]*struct { map[128*1024]int },
so roughly 256 MB of Go memory data, translating to (uncompressed) 239 MBs of JSON / 134 MBs of CBOR.

All tests takes 1-3s, except in the degenerate case of CBOR+decompression-reader,
which takes 24s with zstd, and 357s with zlib (on a 7950X CPU single thread). (Go 1.24)

Be cautious of not running it in "debug" mode or whatnot, I observed that in GoLand it would run a lot slower in "Debug" than in "Run".

*json.Decoder/io.nopCloserWriterTo/greedy=false: 239/239 MBs raw/compressed, 3.4 seconds (decompressor .Read() calls 19 average return 12335 kB)
*json.Decoder/io.nopCloserWriterTo/greedy=true : 239/239 MBs raw/compressed, 3.6 seconds (decompressor .Read() calls 19 average return 12335 kB)
*json.Decoder/*zlib.reader        /greedy=false: 239/ 72 MBs raw/compressed, 4.3 seconds (decompressor .Read() calls 7343 average return 31 kB)
*json.Decoder/*zlib.reader        /greedy=true : 239/ 72 MBs raw/compressed, 4.3 seconds (decompressor .Read() calls 19 average return 12335 kB)
*json.Decoder/main.zstdDecoder    /greedy=false: 239/  2 MBs raw/compressed, 3.6 seconds (decompressor .Read() calls 475 average return 493 kB)
*json.Decoder/main.zstdDecoder    /greedy=true : 239/  2 MBs raw/compressed, 3.9 seconds (decompressor .Read() calls 19 average return 12335 kB)
*cbor.Decoder/io.nopCloserWriterTo/greedy=false: 134/134 MBs raw/compressed, 1.6 seconds (decompressor .Read() calls 18 average return 7277 kB)
*cbor.Decoder/io.nopCloserWriterTo/greedy=true : 134/134 MBs raw/compressed, 1.7 seconds (decompressor .Read() calls 18 average return 7277 kB)
*cbor.Decoder/*zlib.reader        /greedy=false: 134/ 60 MBs raw/compressed, 357.4 seconds (decompressor .Read() calls 4111 average return 31 kB)
*cbor.Decoder/*zlib.reader        /greedy=true : 134/ 60 MBs raw/compressed, 2.2 seconds (decompressor .Read() calls 18 average return 7277 kB)
*cbor.Decoder/main.zstdDecoder    /greedy=false: 134/ 70 MBs raw/compressed, 24.1 seconds (decompressor .Read() calls 273 average return 479 kB)
*cbor.Decoder/main.zstdDecoder    /greedy=true : 134/ 70 MBs raw/compressed, 1.8 seconds (decompressor .Read() calls 18 average return 7277 kB)

Above code is here https://go.dev/play/p/8eiq0ZpZLBG but I will add a comment with the code.

A Go CPU profile run showed me that 99% of time is passed inside cbor.(*decoder).wellformedinternal.

@folays
Copy link
Author

folays commented Mar 16, 2025

here below is a copy of the test code https://go.dev/play/p/8eiq0ZpZLBG :

package main

import (
	"bytes"
	"compress/zlib"
	"encoding/json"
	"fmt"
	"io"
	"runtime"
	"time"

	"github.com/fxamacker/cbor/v2"
	"github.com/klauspost/compress/zstd"
)

// go get github.com/fxamacker/cbor/v2
// go get github.com/klauspost/compress@latest

type State struct {
	A map[int]*A
}

type A struct {
	B map[int]int
}

var state State

func init() {
	state.A = make(map[int]*A)
	for i := 0; i < 128; i++ {
		state.A[i] = &A{make(map[int]int)}
		for j := 0; j < 128*1024; j++ {
			state.A[i].B[j] = j
		}
	}
}

func main() {
	test_serializer(
		func(w io.Writer) interface{ Encode(v any) error } { return json.NewEncoder(w) },
		func(r io.Reader) interface{ Decode(v any) error } { return json.NewDecoder(r) },
	)
	test_serializer(
		func(w io.Writer) interface{ Encode(v any) error } { return cbor.NewEncoder(w) },
		func(r io.Reader) interface{ Decode(v any) error } { return cbor.NewDecoder(r) },
	)
}

type plain struct {
	io.Writer
}

func (plain) Close() error { return nil }

type zstdDecoder struct {
	*zstd.Decoder
}

func (z zstdDecoder) Close() error { z.Decoder.Close(); return nil }

func test_serializer(
	encoder func(io.Writer) interface{ Encode(v any) error },
	decoder func(io.Reader) interface{ Decode(v any) error },
) {
	var data bytes.Buffer
	encoder(&data).Encode(state)

	test_compressor(data.Bytes(),
		func(w io.Writer) io.WriteCloser { return plain{w} },
		func(r io.Reader) io.ReadCloser { return io.NopCloser(r) },
		decoder,
	)
	test_compressor(data.Bytes(),
		func(w io.Writer) io.WriteCloser { return zlib.NewWriter(w) },
		func(r io.Reader) io.ReadCloser { r2, _ := zlib.NewReader(r); return r2 },
		decoder,
	)
	test_compressor(data.Bytes(),
		func(w io.Writer) io.WriteCloser { w2, _ := zstd.NewWriter(w); return w2 },
		func(r io.Reader) io.ReadCloser { r2, _ := zstd.NewReader(r); return zstdDecoder{r2} },
		decoder,
	)
}

func test_compressor(data []byte,
	compressor func(w io.Writer) io.WriteCloser,
	decompressor func(r io.Reader) io.ReadCloser,
	decoder func(io.Reader) interface{ Decode(v any) error },
) {
	var zData bytes.Buffer
	compressWriter := compressor(&zData)
	compressWriter.Write(data)
	compressWriter.Close()

	for _, greedy := range []bool{false, true} {
		state = State{}
		runtime.GC()
		runtime.GC()

		start := time.Now()
		compressReader := decompressor(bytes.NewReader(zData.Bytes()))
		greeder := &greedyReader{
			r:      compressReader,
			greedy: greedy,
		}
		decoder(greeder).Decode(&state)
		compressReader.Close()
		elapsed := time.Since(start).Seconds()

		fmt.Printf("%T/%-20T/greedy=%-5t: %3d/%3d MBs raw/compressed, %.1f seconds (decompressor .Read() calls %d average return %d kB)\n",
			decoder(greeder), compressReader, greedy, len(data)/1e6, zData.Len()/1e6, elapsed, greeder.nbCalls, greeder.sumBytesRead/greeder.nbCalls/1024)
	}
}

type greedyReader struct {
	r            io.Reader
	greedy       bool
	nbCalls      int
	sumBytesRead int
}

// Read(): if `greedy` is false, it will behave as if its underlying io.Reader.Read() were called (+ some stats)
func (g *greedyReader) Read(p []byte) (n int, err error) {
	for len(p) > 0 && err == nil {
		var n2 int
		n2, err = g.r.Read(p)
		n += n2
		p = p[n2:]
		if g.greedy == false {
			break
		}
	}

	g.nbCalls++
	g.sumBytesRead += n

	return
}

@fxamacker
Copy link
Owner

@folays Thanks for opening this issue and providing detailed information!

I'll take a closer look (most likely next weekend).

@fxamacker fxamacker self-assigned this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants