Skip to content
This repository was archived by the owner on May 8, 2019. It is now read-only.

Getting Started

dchenk edited this page Dec 21, 2017 · 5 revisions

This repository includes two things: A tool that generates code and a Go library that is used by the generated code to serialize and unserialize MessagePack data. You need to use both.

The primary difference between msgp and most other serialization libraries for Go is that msgp doesn't perform runtime reflection. Instead, you first use the code generator to automatically generate the appropriate methods for the types you want to serialize or unserialize, and at runtime the provided library will know exactly how to deal with the types you have.

Download and install the code generator and library:

$ go get -u -t github.com/dchenk/msgp

Hello World Example

First, create a new directory in your GOPATH with a main.go file inside.

$ mkdir -p $GOPATH/src/msgp-demo
$ cd $GOPATH/src/msgp-demo
$ touch main.go

Then open main.go in your editor and add the following:

package main

import (
    "fmt"
)

//go:generate msgp

type Foo struct {
    Bar string  `msg:"bar"`
    Baz float64 `msg:"baz"`
}

func main() {
    fmt.Println("Nothing to see here yet!")
}

(You can verify that this builds and runs with $ go build && ./msgp-demo.)

Now let's add some methods to Foo by running go generate:

$ go generate
======== MessagePack Code Generator =======
>>> Input: "main.go"
>>> Wrote and formatted "main_gen.go"
>>> Wrote and formatted "main_gen_test.go"
$ ls
main.go			main_gen.go		main_gen_test.go
$ go test -v -bench .
=== RUN   TestMarshalUnmarshalFoo
--- PASS: TestMarshalUnmarshalFoo (0.00s)
=== RUN   TestEncodeDecodeFoo
--- PASS: TestEncodeDecodeFoo (0.00s)
PASS
BenchmarkMarshalMsgFoo-8	20000000	        97.9 ns/op	      32 B/op	       1 allocs/op
BenchmarkAppendMsgFoo-8 	30000000	        41.4 ns/op	 458.43 MB/s	       0 B/op	       0 allocs/op
BenchmarkUnmarshalFoo-8 	20000000	        94.7 ns/op	 200.57 MB/s	       0 B/op	       0 allocs/op
BenchmarkEncodeFoo-8    	20000000	        57.2 ns/op	 332.15 MB/s	       0 B/op	       0 allocs/op
BenchmarkDecodeFoo-8    	10000000	       135 ns/op	 140.50 MB/s	       0 B/op	       0 allocs/op
ok  	msgp-demo	9.712s

Let's break down what happened here:

  • go generate scanned each file in msgp-demo for a go:generate directive.
  • //go:generate msgp was found in main.go, which caused $GOFILE to be set to main.go
  • msgp was invoked by go generate, and it parsed $GOFILE and extracted type declarations.
  • msgp created main_gen.go, which contains all of the generated methods, and main_gen_test.go, which has tests and benchmarks for each generated method.

The key takeaway here is that msgp works on a per-file, not a per-package basis. (You can, however, invoke the code generator on an entire directory at once by passing a directory path using the -file flag.)

There are a couple reasons why we designed msgp to operate on files rather than on go packages:

  • Integration with build tools like make is dead simple.
  • Reading one file is much faster than reading a whole directory. The msgp tool itself typically runs in less time than the go generate tool takes just to find the directive.

Our suggestion is that users put types requiring code generation in their own file (say, wiretypes.go), and put //go:generate msgp at the top. However, other workflows are possible.

Let's look at the generated code in main_gen.go:

(Note: the interfaces that the code generator implements are stable, but the code that it generates in order to implement those interfaces has changed over time in order to provide performance and stability improvements. Don't be alarmed if you see output that's different from what is listed below.)

package main

// NOTE: THIS FILE WAS PRODUCED BY THE
// MSGP CODE GENERATION TOOL (github.com/dchenk/msgp)
// DO NOT EDIT

import (
	"github.com/dchenk/msgp/msgp"
)

// DecodeMsg implements msgp.Decoder
func (z *Foo) DecodeMsg(dc *msgp.Reader) (err error) {
	var field []byte
	_ = field
	var isz uint32
	isz, err = dc.ReadMapHeader()
	if err != nil {
		return
	}
	for isz > 0 {
		isz--
		field, err = dc.ReadMapKeyPtr()
		if err != nil {
			return
		}
		switch msgp.UnsafeString(field) {
		case "bar":
			z.Bar, err = dc.ReadString()
			if err != nil {
				return
			}
		case "baz":
			z.Baz, err = dc.ReadFloat64()
			if err != nil {
				return
			}
		default:
			err = dc.Skip()
			if err != nil {
				return
			}
		}
	}
	return
}

// EncodeMsg implements msgp.Encoder
func (z Foo) EncodeMsg(en *msgp.Writer) (err error) {
	// map header, size 2
	// write "bar"
	err = en.Append(0x82, 0xa3, 0x62, 0x61, 0x72)
	if err != nil {
		return err
	}
	err = en.WriteString(z.Bar)
	if err != nil {
		return
	}
	// write "baz"
	err = en.Append(0xa3, 0x62, 0x61, 0x7a)
	if err != nil {
		return err
	}
	err = en.WriteFloat64(z.Baz)
	if err != nil {
		return
	}
	return
}

// MarshalMsg implements msgp.Marshaler
func (z Foo) MarshalMsg(b []byte) (o []byte, err error) {
	o = msgp.Require(b, z.Msgsize())
	// map header, size 2
	// string "bar"
	o = append(o, 0x82, 0xa3, 0x62, 0x61, 0x72)
	o = msgp.AppendString(o, z.Bar)
	// string "baz"
	o = append(o, 0xa3, 0x62, 0x61, 0x7a)
	o = msgp.AppendFloat64(o, z.Baz)
	return
}

// UnmarshalMsg implements msgp.Unmarshaler
func (z *Foo) UnmarshalMsg(bts []byte) (o []byte, err error) {
	var field []byte
	_ = field
	var isz uint32
	isz, bts, err = msgp.ReadMapHeaderBytes(bts)
	if err != nil {
		return
	}
	for isz > 0 {
		isz--
		field, bts, err = msgp.ReadMapKeyZC(bts)
		if err != nil {
			return
		}
		switch msgp.UnsafeString(field) {
		case "bar":
			z.Bar, bts, err = msgp.ReadStringBytes(bts)
			if err != nil {
				return
			}
		case "baz":
			z.Baz, bts, err = msgp.ReadFloat64Bytes(bts)
			if err != nil {
				return
			}
		default:
			bts, err = msgp.Skip(bts)
			if err != nil {
				return
			}
		}
	}
	o = bts
	return
}

func (z Foo) Msgsize() (s int) {
	s = 1 + 4 + msgp.StringPrefixSize + len(z.Bar) + 4 + msgp.Float64Size
	return
}

As we just saw, by default there are 5 methods implemented by the code generator:

Each of those methods is actually an implementation of an interface defined in the msgp library. In effect, the library at github.com/dchenk/msgp/msgp contains everything we need to encode and decode MessagePack, and the code generator exists simply to write boilerplate code using that library. We could, of course, implement all of these interfaces ourselves, but that would be unnecessarily laborious and error-prone. (Plus, the code generator can perform optimizations like pre-encoding static strings, like the example above. This would be especially cumbersome to write by hand!)

Memory Interfaces

The "memory interfaces" are interfaces through which chunks of memory ([]byte, in this case) are written or read as MessagePack.

Go veterans will notice that msgp.Marshaler differs slightly from the conventional Marshaler interfaces in the standard library (json.Marshaler and friends) in that it takes a []byte as its first and only argument. The semantics of msgp.Marshaler dictate that it return a slice that is the concatenation of the input slice and the body of the object itself, and that it is allowed to use the memory between len and cap if at all possible. In practice, this allows for zero-allocation marshaling. (If you don't happen to have a slice lying around that you can use, you can always pass a nil slice, and a new slice will be allocated for you.) There is a similar set of zero-allocation APIs in the standard library's strconv package.

foo1 := Foo{ /* ... */ }
foo2 := Foo{ /* ... */ }

// data contains the body of foo1
data, _ := foo1.MarshalMsg(nil)

fmt.Printf("foo1 is encoded as %x\n", data)

// data is overwritten with the body of foo2. If it fits within
// the old slice, no new memory is allocated.
data, _ = foo2.MarshalMsg(data[:0])

fmt.Printf("foo2 is encoded as %x\n", data)

As you may have already guessed, the msgp.Unmarshaler interface is simply the inverse of the msgp.Marshaler interface. The returned []byte should be a sub-slice of the argument slice pointing to the memory not yet consumed.

For example, here's a convoluted way to switch the values contained in two structs:

foo1 := Foo{ /* ... */ }
foo2 := Foo{ /* ... */ }

fmt.Printf("foo1: %v\n", foo1)
fmt.Printf("foo2: %v\n", foo2)

// Append two messages to the same slice.
data, _ := foo1.MarshalMsg(nil)
data, _ = foo2.MarshalMsg(data)

// Now just decode them in reverse:
data, _ = foo2.UnmarshalMsg(data)
data, _ = foo1.UnmarshalMsg(data)

// At this point, len(data) should be 0.
fmt.Println("len(data) =", len(data))

fmt.Printf("foo1: %v", foo1)
fmt.Printf("foo2: %v", foo2)

Because MessagePack is self-describing, we can interleave it with other pieces of data without framing and still re-construct the original input. (Notably, the same cannot be said of a number of other popular protocols, including Protocol Buffers.)

Streaming Interfaces

Streaming interfaces are interfaces through which MessagePack can be written to an io.Writer or read from an io.Reader.

msgp handles streaming a little differently than the Go standard library. The msgp.Writer and msgp.Reader types are MessagePack-aware versions of bufio.Writer and bufio.Reader, respectively.

The implementation of msgp.Encoder writes the object to the msgp.Writer. Since the buffered writer maintains its own buffer, no memory allocation is performed.

foo := Foo{ /* ... */ }

w := msgp.NewWriter(os.Stdout)
foo.EncoodeMsg(w)
w.Flush()

msgp.Decoder, as you may have already guessed, is the converse of msgp.Encoder. It is the interface through which objects read themselves out of a msgp.Reader.

pr, pw := io.Pipe()

go func() {
    w := msgp.NewWriter(pw)
    fooIn := Foo{ /* ... */ }
    fmt.Printf("fooIn is %v\n", fooIn)
    fooIn.EncodeMsg(w)
    w.Flush()
}()

var fooOut Foo
fooOut.DecodeMsg(msgp.NewReader(pr))

fmt.Printf("fooOut is %v\n", fooOut)

Helper Methods

msgp.Sizer is a helper interface used in a couple places inside the msgp library, as well as in the implementation of msgp.Marshaler. Its purpose is to estimate the amount of memory needed to allocate to fit a particular type of object. (In practice, it systematically over-estimates the encoded size of the object.)

Clone this wiki locally