Skip to content

Commit

Permalink
Merge pull request #3 from Dyalog/Sharding
Browse files Browse the repository at this point in the history
Sharding
  • Loading branch information
mkromberg committed Jan 4, 2015
2 parents d1f7874 + 8801f05 commit 44398b5
Show file tree
Hide file tree
Showing 6 changed files with 506 additions and 237 deletions.
47 changes: 33 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,57 @@
# README #

`vecdb`
Current version: 0.1.3
Current version: 0.2.0

**Warning: Version 0.2** adds support for SHARDING, and introduces changes to the database format which are not upwards compatible.

### What is this repository for? ###
`vecdb` is a simple "columnar database": each column in the database is stored in a single memory-mapped files. It is written in and for Dyalog APL as a tool on which to base new applications which need to generate and query very large amounts of data, but do not need a "transactional" storage mechanism.
`vecdb` is a simple "columnar database": each column in the database is stored in a single memory-mapped files. It is written in and for Dyalog APL as a tool on which to base new applications which need to generate and query very large amounts of data and do a large number of high performance reads, but do not need a full set of RDBMS features. In particuler, there is no "transactional" storage mechanism, and no ability to join tables built-in to the database.

### Features

The current version supports the following data types:
#### Supported data types: ####

* 1, 2 and 4 byte integers
* 8-byte IEEE double-precision floats
* Boolean
* Char (via a "symbol table" of up to 32,767 unique strings indexed by 2-byte integers)

Database modification can only be done using Append and Update operations (no Delete).
#### Sharding ####

`vecdb` databases can be *sharded*, or *horizontally partitioned*. Each shard is a separate folder, named when the database is created (by default, there is a single shard). Each folder contains a file for each database column - which is memory mapped to an APL vector when the database is opened. A list of *sharding columns* is defined when the db is created; the values of these columns are passed as the argument to a user-defined *sharding function*, which has to return an origin-1 index into the list of shards, for each record.

#### Supported Operations ####

**Query**: At the moment, the `Query` function takes a constraint in the form of a list of (column_name values) pairs. Each one represents the relation which can be expressed in APL as (column_data∊values). If more than constraint is provided, they are AND-ed together.
Query also takes a list of column names to be retrieved for records which match the constraint.

Query results are returned as a vector with one element per database column, each item containing a vector of values for that column.

The `Query` function takes a constraint in the form of a list of (column_name values) pair. Each one represents the relation which can be expressed in APL as (column_data∊values); if there is more than one constraint they are AND-ed together. Query also accepts a list of column names to be retrieve for records which match the constraint; if no columns are requested, row indices are returned.
**Search** If the `Query`function is called with an empty list of columns, record identifiers are returned as a 2-column matrix of (shard) (record index) pairs.

A `Read` function takes a list of column names and row indices and returns the requested data.
**Read**: The `Read` function accepts a matrix in the format returned by a search query and a list of column names, and returns a vector per column.

### Goals
**Update**: The `Update` function also takes as input a search query result, a list of columns, and a vector of vectors containing new data values.

The intention is to extend `vecdb` with the following functionality. Much of this is still half-baked, discussion is welcome. owever, the one application that is being built upon `vecdb` and is driving the initial development requires the following items.
**Append**: Takes a list of column names and a vector of data vectors, one per named column. The columns involved in the Shard selection must always be included.

1. "Sharding": This idea needs to be developed, but the current thinking is that one or more key fields are identified, and a function is defined to map distinct key tuples to a "shard". A list of folder names points to the folders that will contain the mapped columns for each shard. The result of Query (and argument to Read) will become a 2-row (2-column?) matrix containing shard numbers and record offsets within the shard.
1. Parallel database queries: For a sharded database, an isolate process will be spun up to perform queries and updates on one or more shards (each shard only being handles by a single process).
1. A front-end server will allow RESTful database access (this item is perhaps optional). As it stands, `vecdb` is effectively an embedded database engine which does not support data sharing between processes on the same or on separate machines.
**Delete**: Deletion is not currently supported.

### Longer Term (Dreams)
### Short-Term Goals ###

1. Enhance the query function to accept enhanced queries consisting of column names, comparison functions and values - and support AND/OR. If possible, optimise queries to be sensitive to sharding.
1. Parallel database queries: For a sharded database: Spin a number of isolate processes up and distribute the shards between them, so that each shard is handled by a single process. Enhance the database API functions to use these processes to perform searches, reads and writes in parallel.
1. Add a front-end server with a RESTful database API. As it stands, `vecdb` is effectively an embedded database engine which does not support data sharing between processes on the same or on separate machines.

### Longer Term (Dreams) ###

There are ideas to add support for timeseries and versioning. This would include:

1. Add a single-byte indexed Char type (perhaps denoted lowercase "c"), indexing up to 127 unique strings
1. Support for deleting records
2. Performing all updates without overwriting data, and tagging old data with the timestamps defining its lifetime, allowing efficient queries on the database as it appeared at any given time in the past.
3. Built-in support for the computation of aggregate values as part of the parallel query mechanism, based on timeseries or other key values.
1. Performing all updates without overwriting data, and tagging old data with the timestamps defining its lifetime, allowing efficient queries on the database as it appeared at any given time in the past.
1. Built-in support for the computation of aggregate values as part of the parallel query mechanism, based on timeseries or other key values.

### How do I get set up? ###

Expand All @@ -55,10 +70,14 @@ The full system test creates a database containing all supported data types, ins
#.TestVecdb.RunAll
```

See doc\Usage.md for more information on usage.

### Contribution guidelines ###

At this early stage, until the project acquires a bit more direction, we ask you to contact one of the key collaborators to discuss your ideas.

Please read doc\Implementation.md before continuing.

### Key Collaborators ###

* [email protected]
Expand Down
14 changes: 14 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# TODO #

1. Enhance queries to support conditional functions... Eg. ('price' '>' 100)('Name' 'like' 'A%')
1. Beef up error checking on file creation
1. Database status reporting function (# shards, records in each, statistics, etc)
1. Parallel queries using isolates
1. User Guide
1. "c" data type (single byte indices)
1. Timestamped non-overwriting updates
1. Delete records (AFTER non-overwriting updates)
1. Database cleanup (throw away history)
1. TimeStamp columns
1. Aggregations in queries
1. Add support for noFiles switch: run entirely in memory with no backing storage
141 changes: 93 additions & 48 deletions TestVecdb.dyalog
Original file line number Diff line number Diff line change
@@ -1,59 +1,76 @@
:Namespace TestVecdb
:Namespace TestVecdb

Updated to version 0.1.3 with char support
Call TestVecdb.RunAll to run a full system test
Updated to version 0.2.0 with sharding
Call TestVecdb.RunAll to run a full system test
assumes vecdb is loaded in #.vecdb
returns memory usage statistics (result of "memstats 0")
(⎕IO ⎕ML)1 1
LOG1

(⎕IO ⎕ML)1 1
LOG1

zRunAll;path;source
⎕FUNTIE ⎕FNUMS ⎕NUNTIE ⎕NNUMS
:Trap 6 sourceSALT_Data.SourceFile
:Else source⎕WSID
:EndTrap
path{(-/()'\/')}source
'Testing vecdb version ',#.vecdb.Version
Basic

xoutput x
:If LOG x :EndIf

rfmtnum x
Nice formatting of large integers
r(((x),20)'CI20'⎕FMT,x)~¨' '

rmemstats reset;maxws;z
:If reset=1
z0(2000)14 Reset high water mark
:Else
maxws2 ⎕NQ'.' 'GetEnvironment' 'MAXWS'
r⎕WA
r'MAXWS' '⎕WA' 'WS Used' 'Allocated' 'High Water Mark',¯20¨maxws,fmtnum r,(2000)1 13 14
:EndIf
Sharding

assert{'Assertion failed'⎕SIGNAL(=0)/11}
zSharding;columns;data;options;params;folder;types;name;db;ix;rotate
Test database with 2 shards

folderpath,'\',(name'shardtest'),'\'

:For rotate :In 0 1 2 Test with shard key in all positions

'Clearing: ',folder
:Trap 22 #.vecdb.Delete folder :EndTrap

columnsrotate'Name' 'BlockSize' 'Flag'
typesrotate,¨'C' 'F' 'C'
datarotate('IBM' 'AAPL' 'MSFT' 'GOOG' 'DYALOG')(160.97 112.6 47.21 531.23 999.99)(5'Buy' 'Sell')

options⎕NS''
options.BlockSize10000
options.ShardFolders(folder,'Shard'),¨'12'
options.(ShardFn ShardCols)'{2-2|⎕UCS ⊃¨⊃⍵}'(rotate1 3 2)

paramsname folder columns types options(3¨data)
TEST'Create sharded database (rotate=',(rotate),')'
db⎕NEW time #.vecdb params
assert 3=db.Count
assert(3¨data)db.Read(1 21(1 2 3))columns All went into shard #1

TEST'Append last 2 records'
db.Append time columns(3¨data)

assert 5=db.Count
assert(1 2,¨4 1)ixdb.Query('Name'((columns'Name')data)) Should find everything
TEST'Read it all back'
assert datadb.Read time ix columns

TEST'Erase database'
assert 0={db.Erase}time

:EndFor rotate

z'Sharding Tests Completed'

time{ t⎕AI[3]
ooutput TEST,' ... '
z ⍺⍺
ooutput(⎕AI[3]-t),'ms',⎕UCS 10
z
}

zBasic;columns;types;folder;name;db;tnms;data;numrecs;recs;select;where;expect;indices;options;params;range;rcols;rcoli;newvals;i;t;vals
zBasic;columns;types;folder;name;db;tnms;data;numrecs;recs;select;where;expect;indices;options;params;range;rcols;rcoli;newvals;i;t;vals;ix
Create and delete some tables

numrecs5000000 5 million records
memstats 1 Clear memory statistics

'Creating: ',folderpath,'\',(name'testdb1'),'\'
:Trap 11 {}(⎕NEW #.vecdb(,folder)).Erase :EndTrap
folderpath,'\',(name'testdb1'),'\'
'Clearing: ',folder
:Trap 22 #.vecdb.Delete folder :EndTrap

'Creating: ',folderpath,'\',(name'testdb1'),'\'
columns'col_',¨types#.vecdb.TypeNames
assert #.vecdb.TypeNamestnms'I1' 'I2' 'I4',,¨'FBC' Types have been added?
range2*¯1+8×1 2 4 6 0.25
Expand All @@ -68,14 +85,14 @@
paramsname folder columns types options(recs¨data)
TEST'Creating db & inserting ',(fmtnum recs),' records'
db⎕NEW time #.vecdb params
assert db.Open
assert options.BlockSize.=db.(BlockSize,Size)
assert db.isOpen
assert db.Count=recs
assert 0=db.Close
assert 0=db.Open
assert 0=db.isOpen

TEST'Reopen database'
db⎕NEW time #.vecdb(,folder) Open it again
assert db.Open
assert db.isOpen
assert db.Count=recs
TEST'Reading them back:'
assert(recs¨data)db.Read time(recs)columns
Expand All @@ -101,16 +118,16 @@
Test vecdb.Replace
indicesdb.Query where
rcolscolumns[rcolitypes,¨'I2' 'B' 'C']
TEST'Updating ',(fmtnumindices),' records'
newvals0 1-(indices)¨data[2rcoli] Update with 0-data or ~data
newvals,(indices)'changed' And new char values
TEST'Updating ',(fmtnumix2,indices),' records'
newvals0 1-(ix)¨data[2rcoli] Update with 0-data or ~data
newvals,(ix)'changed' And new char values
assert 0=db.Update time indices rcols newvals
expectdata[rcoli]
:For i :In rcoli
tiexpect t[indices]inewvals (iexpect)t
tiexpect t[ix]inewvals (iexpect)t
:EndFor
TEST'Reading two column for all ',(numrecs),' records'
assert expectdb.Read time(numrecs)rcols
TEST'Reading two columns for all ',(numrecs),' records'
assert expectdb.Read time(1,numrecs)rcols

:If LOG
'Basic tests: memstats before db.Erase:'
Expand All @@ -122,5 +139,33 @@

z'Creation Tests Completed'


xoutput x
:If LOG x :EndIf

rfmtnum x
Nice formatting of large integers
r(((x),20)'CI20'⎕FMT,x)~¨' '

rmemstats reset;maxws;z
:If reset=1
z0(2000)14 Reset high water mark
:Else
maxws2 ⎕NQ'.' 'GetEnvironment' 'MAXWS'
r⎕WA
r'MAXWS' '⎕WA' 'WS Used' 'Allocated' 'High Water Mark',¯20¨maxws,fmtnum r,(2000)1 13 14
:EndIf

assert{'Assertion failed'⎕SIGNAL(=0)/11}

time{ t⎕AI[3]
ooutput TEST,' ... '
z ⍺⍺
ooutput(⎕AI[3]-t),'ms',⎕UCS 10
z
}

:EndNamespace
74 changes: 74 additions & 0 deletions doc/Implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# IMPLEMENTATION #

`vecdb`is an "inverted" database written in and for Dyalog APL, based on the ⎕MAP facility for mapping APL arrays to "flat" files.

Data Types supported are

| Name | Description |
|------|------------------------------------------------------------|
| B | Boolean; 1 bit per item (0 or 1) |
| I1 | 1-Byte Integer (-128 to +127) |
| I2 | 2-Byte Integer (-32,768 to +32,767) |
| I4 | 4-Byte Integer (+/- 2,147,483,647/8) |
| F | IEEE Double-Precision Floating Point (+/- ~1.797E308) |
| C | VarChar - up to 32,767 different strings indexed by an I2 |

A single-byte character type is planned - as C but indexed by an I1, allowing only 127 different strings. The current proposal is to denote this type "c".

### File Formats ###
Data is stored as APL vectors, each one mapping to a single file representing a numeric vector of uniform type, with the file extension *.vector*. In the case of the "C" type which contains variable length characters arrays, the serialised form (created using 220⌶) of a vector of character vectors is stored in a file with extension *.symbol*, and a 2-byte integer of indices into this is stored in the corresponding *.vector* file.

#### Blocking ####
Since mapped arrays cannot grow dynamically, the files are over-allocated using a configurable *BlockSize* (*NumBlocks* tracks the number of blocks in use). All vectors have the same length; the number of records actually in used is tracked separately. When a new block is required, all maps are expunged, ⎕NAPPEND is used to add a block to each file, and the maps are re-created.

#### Meta Data ####
Meta-data is stored in a Dyalog Component File "meta.vecdb", which contains data which does not change during normal operation of the database:

| Cn# | Contents | Example |
|-----|----------------------|------------------------------------------------|
| 1 | Version number | 'vecdb 0.2.0 |
| 2 | Description | a char vec |
| 3 | Unused | Used to contain number of records |
| 4 | Properties | ('Name' 'BlockSize')('TestDB1' 10000) |
| 5 | Col names & types | ('Stock' 'Price') ((,'C') 'F') |
| 6 | Shard folders | 'c:\mydb\shard1\' 'c:\mydb\shard2\ |
| 7 | Shard fn and cols | '{1+2|⎕UCS ⊃¨⍵}' (,1) |

#### Sharding ####

`vecdb` allows the database to be *horizontally partitioned* into *shards*, based on the values of any selection of fields. If a database is created without sharding, data files are created in the same folder that the meta.vecdb file is in. Sharding is specified by passing suitable options to the vecdb constructor. The above example was set up using the following code. For a more advanced example, see the function `TestVecdb.Sharding`:

columns←'Name' 'BlockSize' ⋄ types←,¨'C' 'F'
data←('IBM' 'AAPL' 'MSFT' 'DYALOG')(160.97 112.6 47.21 999.99)
options←⎕NS''
options.BlockSize←10000
options.ShardFolders←'c:\mydb\shard'∘,¨'12'
options.(ShardFn ShardCols)←'{1+2|⎕UCS ⊃¨⍵}' 1
params←'TestDB1' 'c:\mydb' columns types options data
mydb←⎕NEW vecdb params

In the above example, the database has two shards, based on whether the first character of the Stock name has an odd or even Unicode number.

Each shard is stored in a separate folder which contains the *.vector* files described above, plus a file "counters.vecdb" which currently contains a single 8-byte floating-point value which is the number of active records in the shard (the maximum number of records in a shard is limited to 2*48).

Note that the *.symbol* files are not sharded: The complete list of unique strings for a column is shared between the shards, and resides in main database folder.

The complete set of files which would be created by the above example would be along the lines of:

Directory of c:\mydb\shardtest

04/01/2015 21:35 122 1.symbol // Symbols for Name column
04/01/2015 21:35 2,576 meta.vecdb // Meta data

Directory of c:\mydb\db\shardtest\Shard1

04/01/2015 21:35 20,000 1.vector // 1 block of I2 symbol pointers
04/01/2015 21:35 80,000 2.vector // 1 block of Floating-point prices
04/01/2015 21:35 8 counters.vecdb // Used record counter (contains 3)

Directory of c:\mydb\vecdb\shardtest\Shard2

04/01/2015 21:35 20,000 1.vector // As Shard1
04/01/2015 21:35 80,000 2.vector // As Shard1
04/01/2015 21:35 8 counters.vecdb // record counter (1)

6 changes: 6 additions & 0 deletions doc/Usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# User Guide #

At the moment, the only "documentation" is the test suite. Load the code and trace through or inspect the function `TestVecdb.RunAll` and its subfunctions.

]load \vecdbfolder\*.dyalog
TestVecdb.RunAll
Loading

0 comments on commit 44398b5

Please sign in to comment.