Skip to content

Commit 74c247c

Browse files
committed
Initial commit
0 parents  commit 74c247c

File tree

7 files changed

+468
-0
lines changed

7 files changed

+468
-0
lines changed

.github/workflows/ci.yml

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
name: CI
2+
on:
3+
push:
4+
branches:
5+
- main
6+
pull_request:
7+
8+
jobs:
9+
build:
10+
name: CI
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- name: Log
15+
env:
16+
CI_EVENT_ACTION: ${{ github.event.action }}
17+
CI_PR_TITLE: ${{ github.event.pull_request.title }}
18+
CI_PR_PREV_TITLE: ${{ github.event.changes.title.from }}
19+
run: |
20+
echo github.event.action=$CI_EVENT_ACTION
21+
echo github.event.pull_request.title=$CI_PR_TITLE
22+
echo github.event.changes.title.from=$CI_PR_PREV_TITLE
23+
24+
- name: Set up Go
25+
uses: actions/setup-go@v2
26+
with:
27+
go-version: '~1.16.6'
28+
id: go
29+
30+
- name: Install utilities
31+
run: |
32+
go install golang.org/x/lint/golint@latest
33+
go install golang.org/x/tools/cmd/goimports@latest
34+
go install honnef.co/go/tools/cmd/staticcheck@latest
35+
# display Go environment for reference
36+
go env
37+
38+
- name: Check out code
39+
uses: actions/checkout@v2
40+
41+
- uses: actions/cache@v2
42+
with:
43+
path: ~/go/pkg/mod
44+
key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
45+
restore-keys: |
46+
${{ runner.os }}-go-
47+
48+
- name: Get dependencies
49+
run: |
50+
go mod tidy
51+
/usr/bin/git diff --exit-code
52+
53+
- name: Build
54+
run: |
55+
go build -v ./...
56+
57+
- name: Check
58+
run: |
59+
go vet ./...
60+
golint ./...
61+
staticcheck ./...
62+
goimports -w .
63+
/usr/bin/git diff --exit-code
64+
65+
- name: Test
66+
run: |
67+
go test -v ./...

LICENSE

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Copyright 2021 The Sensible Code Company Ltd
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
4+
associated documentation files (the "Software"), to deal in the Software without restriction,
5+
including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
6+
and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
7+
subject to the following conditions:
8+
9+
The above copyright notice and this permission notice shall be included in all copies
10+
or substantial portions of the Software.
11+
12+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
13+
NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
14+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
15+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH
16+
THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# faststringmap
2+
3+
`faststringmap` is a fast read-only string keyed map for Go (golang).
4+
For our use case it is approximately 5 times faster than using Go's
5+
built-in map type with a string key. It also has the following advantages:
6+
7+
* look up strings and byte slices without use of the `unsafe` package
8+
* minimal impact on GC due to lack of pointers in the data structure
9+
* data structure can be trivially serialized to disk or network
10+
11+
The code provided implements a map from string to `uint32` which fits our
12+
use case, but you can easily substitute other value types.
13+
14+
`faststringmap` is a variant of a data structure called a [Trie](https://en.wikipedia.org/wiki/Trie).
15+
At each level we use a slice to hold the next possible byte values.
16+
This slice is of length one plus the difference between the lowest and highest
17+
possible next bytes of strings in the map. Not all the entries in the slice are
18+
valid next bytes. `faststringmap` is thus more space efficient for keys using a
19+
small set of nearby runes, for example those using a lot of digits.
20+
21+
## Example
22+
23+
Example usage can be found in [``uint32_store_example_test.go``](uint32_store_example_test.go).
24+
25+
## Motivation
26+
27+
I created `faststringmap` in order to improve the speed of parsing CSV
28+
where the fields were category codes from survey data. The majority of these
29+
were numeric (`"1"`, `"2"`, `"3"`...) plus a distinct code for "not applicable".
30+
I was struck that in the simplest possible cases (e.g. `"1"` ... `"5"`) the map
31+
should be a single slice lookup.
32+
33+
Our fast CSV parser provides fields as byte slices into the read buffer to
34+
avoid creating string objects. So I also wanted to facilitate key lookup from a
35+
`[]byte` rather than a string. This is not possible using a built-in Go map without
36+
use of the `unsafe` package.
37+
38+
## Benchmarks
39+
40+
Example benchmarks from my laptop:
41+
```
42+
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
43+
BenchmarkUint32Store
44+
BenchmarkUint32Store-8 218463 4959 ns/op
45+
BenchmarkGoStringToUint32
46+
BenchmarkGoStringToUint32-8 49279 24483 ns/op
47+
```
48+
49+
## Improvements
50+
51+
You can improve the performance further by using a slice for the ``next`` fields.
52+
This avoids a bounds check when looking up the entry for a byte. However, it
53+
comes at the cost of easy serialization and introduces a lot of pointers which
54+
will have impact on GC. It is not possible to directly construct the slice version
55+
in the same way so that the whole store is one block of memory. Either create as in
56+
this code and then derive the slice version or create distinct slice objects at each level.

go.mod

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
module github.com/sensiblecodeio/faststringmap
2+
3+
go 1.16

uint32_store.go

+107
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
// Copyright 2021 The Sensible Code Company Ltd
2+
// Author: Duncan Harris
3+
4+
package faststringmap
5+
6+
import (
7+
"sort"
8+
)
9+
10+
type (
11+
// Uint32Store is a fast read only map from string to uint32
12+
// Lookups are about 5x faster than the built-in Go map type
13+
Uint32Store struct {
14+
store []byteValue
15+
}
16+
17+
byteValue struct {
18+
nextLo uint32 // index in store of next byteValues
19+
nextLen byte // number of byteValues in store used for next possible bytes
20+
nextOffset byte // offset from zero byte value of first element of range of byteValues
21+
valid bool // is the byte sequence with no more bytes in the map?
22+
value uint32 // value for byte sequence with no more bytes
23+
}
24+
25+
// Uint32Source is for supplying data to initialise Uint32Store
26+
Uint32Source interface {
27+
// AppendKeys should append the keys of the maps to the supplied slice and return the resulting slice
28+
AppendKeys([]string) []string
29+
// Get should return the value for the supplied key
30+
Get(string) uint32
31+
}
32+
)
33+
34+
// NewUint32Store creates from the data supplied in srcMap
35+
func NewUint32Store(srcMap Uint32Source) Uint32Store {
36+
m := Uint32Store{store: make([]byteValue, 1)}
37+
if keys := srcMap.AppendKeys([]string(nil)); len(keys) > 0 {
38+
sort.Strings(keys)
39+
m.makeByteValue(&m.store[0], keys, 0, srcMap)
40+
}
41+
return m
42+
}
43+
44+
// makeByteValue will initialise the supplied byteValue for
45+
// the sorted strings in slice a considering bytes at byteIndex in the strings
46+
func (m *Uint32Store) makeByteValue(bv *byteValue, a []string, byteIndex int, srcMap Uint32Source) {
47+
// if there is a string with no more bytes then it is always first because they are sorted
48+
if len(a[0]) == byteIndex {
49+
bv.valid = true
50+
bv.value = srcMap.Get(a[0])
51+
a = a[1:]
52+
}
53+
if len(a) == 0 {
54+
return
55+
}
56+
bv.nextOffset = a[0][byteIndex] // lowest value for next byte
57+
bv.nextLen = a[len(a)-1][byteIndex] - // highest value for next byte
58+
bv.nextOffset + 1 // minus lowest value +1 = number of possible next bytes
59+
bv.nextLo = uint32(len(m.store)) // first byteValue struct to use
60+
61+
// allocate enough byteValue structs - they default to "not valid"
62+
m.store = append(m.store, make([]byteValue, bv.nextLen)...)
63+
64+
for i, n := 0, len(a); i < n; {
65+
// find range of strings starting with the same byte
66+
iSameByteHi := i + 1
67+
for iSameByteHi < n && a[iSameByteHi][byteIndex] == a[i][byteIndex] {
68+
iSameByteHi++
69+
}
70+
nextStoreIndex := bv.nextLo + uint32(a[i][byteIndex]-bv.nextOffset)
71+
m.makeByteValue(&m.store[nextStoreIndex], a[i:iSameByteHi], byteIndex+1, srcMap)
72+
i = iSameByteHi
73+
}
74+
}
75+
76+
// LookupString looks up the supplied string in the map
77+
func (m *Uint32Store) LookupString(s string) (uint32, bool) {
78+
bv := &m.store[0]
79+
for i, n := 0, len(s); i < n; i++ {
80+
b := s[i]
81+
if b < bv.nextOffset {
82+
return 0, false
83+
}
84+
ni := b - bv.nextOffset
85+
if ni >= bv.nextLen {
86+
return 0, false
87+
}
88+
bv = &m.store[bv.nextLo+uint32(ni)]
89+
}
90+
return bv.value, bv.valid
91+
}
92+
93+
// LookupBytes looks up the supplied byte slice in the map
94+
func (m *Uint32Store) LookupBytes(s []byte) (uint32, bool) {
95+
bv := &m.store[0]
96+
for _, b := range s {
97+
if b < bv.nextOffset {
98+
return 0, false
99+
}
100+
ni := b - bv.nextOffset
101+
if ni >= bv.nextLen {
102+
return 0, false
103+
}
104+
bv = &m.store[bv.nextLo+uint32(ni)]
105+
}
106+
return bv.value, bv.valid
107+
}

uint32_store_example_test.go

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
package faststringmap_test
2+
3+
import (
4+
"fmt"
5+
"sort"
6+
"strings"
7+
8+
"github.com/sensiblecodeio/faststringmap"
9+
)
10+
11+
func Example() {
12+
m := exampleSource{
13+
"key1": 42,
14+
"key2": 27644437,
15+
"l": 2,
16+
}
17+
18+
fm := faststringmap.NewUint32Store(m)
19+
20+
// add an entry that is not in the fast map
21+
m["m"] = 4
22+
23+
// sort the keys so output is the same for each test run
24+
keys := make([]string, 0, len(m))
25+
for k := range m {
26+
keys = append(keys, k)
27+
}
28+
sort.Strings(keys)
29+
30+
// lookup every key in the fast map and print the corresponding value
31+
for _, k := range keys {
32+
v, ok := fm.LookupString(k)
33+
fmt.Printf("%q: %d, %v\n", k, v, ok)
34+
}
35+
36+
// Dump out the store to aid in understanding the implementation
37+
fmt.Println()
38+
dump := fmt.Sprintf("%+v", fm)
39+
dump = strings.ReplaceAll(dump, "}", "}\n")
40+
dump = strings.ReplaceAll(dump, "[", "[\n ")
41+
fmt.Println(dump)
42+
43+
// Output:
44+
//
45+
// "key1": 42, true
46+
// "key2": 27644437, true
47+
// "l": 2, true
48+
// "m": 0, false
49+
//
50+
// {store:[
51+
// {nextLo:1 nextLen:2 nextOffset:107 valid:false value:0}
52+
// {nextLo:3 nextLen:1 nextOffset:101 valid:false value:0}
53+
// {nextLo:0 nextLen:0 nextOffset:0 valid:true value:2}
54+
// {nextLo:4 nextLen:1 nextOffset:121 valid:false value:0}
55+
// {nextLo:5 nextLen:2 nextOffset:49 valid:false value:0}
56+
// {nextLo:0 nextLen:0 nextOffset:0 valid:true value:42}
57+
// {nextLo:0 nextLen:0 nextOffset:0 valid:true value:27644437}
58+
// ]}
59+
}
60+
61+
type exampleSource map[string]uint32
62+
63+
func (s exampleSource) AppendKeys(a []string) []string {
64+
for k := range s {
65+
a = append(a, k)
66+
}
67+
return a
68+
}
69+
70+
func (s exampleSource) Get(k string) uint32 {
71+
return s[k]
72+
}

0 commit comments

Comments
 (0)