Merge pull request #3 from Dyalog/Sharding

Sharding
Dyalog · Jan 4, 2015 · 44398b5 · 44398b5
2 parents d1f7874 + 8801f05
commit 44398b5
Show file tree

Hide file tree

Showing 6 changed files with 506 additions and 237 deletions.
diff --git a/README.md b/README.md
@@ -1,42 +1,57 @@
 # README #
 
 `vecdb`
-Current version: 0.1.3
+Current version: 0.2.0
+
+**Warning: Version 0.2** adds support for SHARDING, and introduces changes to the database format which are not upwards compatible.
 
 ### What is this repository for? ###
-`vecdb` is a simple "columnar database": each column in the database is stored in a single memory-mapped files. It is written in and for Dyalog APL as a tool on which to base new applications which need to generate and query very large amounts of data, but do not need a "transactional" storage mechanism.
+`vecdb` is a simple "columnar database": each column in the database is stored in a single memory-mapped files. It is written in and for Dyalog APL as a tool on which to base new applications which need to generate and query very large amounts of data and do a large number of high performance reads, but do not need a full set of RDBMS features. In particuler, there is no "transactional" storage mechanism, and no ability to join tables built-in to the database.
 
 ### Features
 
-The current version supports the following data types:
+#### Supported data types: ####
 
 * 1, 2 and 4 byte integers
 * 8-byte IEEE double-precision floats
 * Boolean
 * Char (via a "symbol table" of up to 32,767 unique strings indexed by 2-byte integers)
 
-Database modification can only be done using Append and Update operations (no Delete).
+#### Sharding ####
+
+`vecdb` databases can be *sharded*, or *horizontally partitioned*. Each shard is a separate folder, named when the database is created (by default, there is a single shard). Each folder contains a file for each database column - which is memory mapped to an APL vector when the database is opened. A list of *sharding columns* is defined when the db is created; the values of these columns are passed as the argument to a user-defined *sharding function*, which has to return an origin-1 index into the list of shards, for each record.
+
+#### Supported Operations ####
+
+**Query**: At the moment, the `Query` function takes a constraint in the form of a list of (column_name values) pairs. Each one represents the relation which can be expressed in APL as (column_data∊values). If more than constraint is provided, they are AND-ed together. 
+Query also takes a list of column names to be retrieved for records which match the constraint.
+
+Query results are returned as a vector with one element per database column, each item containing a vector of values for that column.
 
-The `Query` function takes a constraint in the form of a list of (column_name values) pair. Each one represents the relation which can be expressed in APL as (column_data∊values); if there is more than one constraint they are AND-ed together. Query also accepts a list of column names to be retrieve for records which match the constraint; if no columns are requested, row indices are returned.
+**Search** If the `Query`function is called with an empty list of columns, record identifiers are returned as a 2-column matrix of (shard) (record index) pairs.
 
-A `Read` function takes a list of column names and row indices and returns the requested data.
+**Read**: The `Read` function accepts a matrix in the format returned by a search query and a list of column names, and returns a vector per column.
 
-### Goals
+**Update**: The `Update` function also takes as input a search query result, a list of columns, and a vector of vectors containing new data values.
 
-The intention is to extend `vecdb` with the following functionality. Much of this is still half-baked, discussion is welcome. owever, the one application that is being built upon `vecdb` and is driving the initial development requires the following items.
+**Append**: Takes a list of column names and a vector of data vectors, one per named column. The columns involved in the Shard selection must always be included.
 
-1. "Sharding": This idea needs to be developed, but the current thinking is that one or more key fields are identified, and a function is defined to map distinct key tuples to a "shard". A list of folder names points to the folders that will contain the mapped columns for each shard. The result of Query (and argument to Read) will become a 2-row (2-column?) matrix containing shard numbers and record offsets within the shard.
-1. Parallel database queries: For a sharded database, an isolate process will be spun up to perform queries and updates on one or more shards (each shard only being handles by a single process).
-1. A front-end server will allow RESTful database access (this item is perhaps optional). As it stands, `vecdb` is effectively an embedded database engine which does not support data sharing between processes on the same or on separate machines.
+**Delete**: Deletion is not currently supported.
 
-### Longer Term (Dreams)
+### Short-Term Goals ###
+
+1. Enhance the query function to accept enhanced queries consisting of column names, comparison functions and values - and support AND/OR. If possible, optimise queries to be sensitive to sharding.
+1. Parallel database queries: For a sharded database: Spin a number of isolate processes up and distribute the shards between them, so that each shard is handled by a single process. Enhance the database API functions to use these processes to perform searches, reads and writes in parallel.
+1. Add a front-end server with a RESTful database API. As it stands, `vecdb` is effectively an embedded database engine which does not support data sharing between processes on the same or on separate machines.
+
+### Longer Term (Dreams) ###
 
 There are ideas to add support for timeseries and versioning. This would include:
 
 1. Add a single-byte indexed Char type (perhaps denoted lowercase "c"), indexing up to 127 unique strings
 1. Support for deleting records
-2. Performing all updates without overwriting data, and tagging old data with the timestamps defining its lifetime, allowing efficient queries on the database as it appeared at any given time in the past.
-3. Built-in support for the computation of aggregate values as part of the parallel query mechanism, based on timeseries or other key values.
+1. Performing all updates without overwriting data, and tagging old data with the timestamps defining its lifetime, allowing efficient queries on the database as it appeared at any given time in the past.
+1. Built-in support for the computation of aggregate values as part of the parallel query mechanism, based on timeseries or other key values.
 
 ### How do I get set up? ###
 
@@ -55,10 +70,14 @@ The full system test creates a database containing all supported data types, ins
     #.TestVecdb.RunAll
 ```
 
+See doc\Usage.md for more information on usage.
+
 ### Contribution guidelines ###
 
 At this early stage, until the project acquires a bit more direction, we ask you to contact one of the key collaborators to discuss your ideas.
 
+Please read doc\Implementation.md before continuing.
+
 ### Key Collaborators ###
 
 * [email protected]

diff --git a/TODO.md b/TODO.md
@@ -0,0 +1,14 @@
+# TODO #
+
+1. Enhance queries to support conditional functions... Eg. ('price' '>' 100)('Name' 'like' 'A%')
+1. Beef up error checking on file creation
+1. Database status reporting function (# shards, records in each, statistics, etc)
+1. Parallel queries using isolates
+1. User Guide
+1. "c" data type (single byte indices)
+1. Timestamped non-overwriting updates
+1. Delete records (AFTER non-overwriting updates)
+1. Database cleanup (throw away history)
+1. TimeStamp columns
+1. Aggregations in queries
+1. Add support for noFiles switch: run entirely in memory with no backing storage
diff --git a/TestVecdb.dyalog b/TestVecdb.dyalog
@@ -1,59 +1,76 @@
-:Namespace TestVecdb
+:Namespace TestVecdb
 
-    ⍝ Updated to version 0.1.3 with char support
-    ⍝ Call TestVecdb.RunAll to run a full system test 
+    ⍝ Updated to version 0.2.0 with sharding
+    ⍝ Call TestVecdb.RunAll to run a full system test
     ⍝   assumes vecdb is loaded in #.vecdb
     ⍝   returns memory usage statistics (result of "memstats 0")
-    
-    (⎕IO ⎕ML)←1 1    
-    LOG←1      
-    
+
+    (⎕IO ⎕ML)←1 1
+    LOG←1
+
     ∇ z←RunAll;path;source
       ⎕FUNTIE ⎕FNUMS ⋄ ⎕NUNTIE ⎕NNUMS
       :Trap 6 ⋄ source←SALT_Data.SourceFile
       :Else ⋄ source←⎕WSID
       :EndTrap
       path←{(-⌊/(⌽⍵)⍳'\/')↓⍵}source
+      ⎕←'Testing vecdb version ',#.vecdb.Version
       ⎕←Basic
-    ∇
-
-    ∇ x←output x
-      :If LOG ⋄ ⍞←x ⋄ :EndIf
-    ∇
-
-    ∇ r←fmtnum x
-    ⍝ Nice formatting of large integers
-      r←(↓((⍴x),20)⍴'CI20'⎕FMT⍪,x)~¨' '
-    ∇
-
-    ∇ r←memstats reset;maxws;z
-      :If reset=1
-          z←0(2000⌶)14 ⍝ Reset high water mark
-      :Else
-          maxws←⊂⍕2 ⎕NQ'.' 'GetEnvironment' 'MAXWS'
-          r←⎕WA
-          r←'MAXWS' '⎕WA' 'WS Used' 'Allocated' 'High Water Mark',⍪¯20↑¨maxws,fmtnum r,(2000⌶)1 13 14
-      :EndIf
+      ⎕←Sharding
     ∇
 
-    assert←{'Assertion failed'⎕SIGNAL(⍵=0)/11}  
+    ∇ z←Sharding;columns;data;options;params;folder;types;name;db;ix;rotate
+     ⍝ Test database with 2 shards
+
+      folder←path,'\',(name←'shardtest'),'\'
+
+      :For rotate :In 0 1 2 ⍝ Test with shard key in all positions
+
+          ⎕←'Clearing: ',folder
+          :Trap 22 ⋄ #.vecdb.Delete folder ⋄ :EndTrap
+
+          columns←rotate⌽'Name' 'BlockSize' 'Flag'
+          types←rotate⌽,¨'C' 'F' 'C'
+          data←rotate⌽('IBM' 'AAPL' 'MSFT' 'GOOG' 'DYALOG')(160.97 112.6 47.21 531.23 999.99)(5⍴'Buy' 'Sell')
+
+          options←⎕NS''
+          options.BlockSize←10000
+          options.ShardFolders←(folder,'Shard')∘,¨'12'
+          options.(ShardFn ShardCols)←'{2-2|⎕UCS ⊃¨⊃⍵}'(⊃rotate⌽1 3 2)
+
+          params←name folder columns types options(3↑¨data)
+          TEST←'Create sharded database (rotate=',(⍕rotate),')'
+          db←⎕NEW time #.vecdb params
+          assert 3=db.Count
+          assert(3↑¨data)≡db.Read(1 2⍴1(1 2 3))columns ⍝ All went into shard #1
+
+          TEST←'Append last 2 records'
+          db.Append time columns(3↓¨data)
+
+          assert 5=db.Count
+          assert(1 2,⍪⍳¨4 1)≡ix←db.Query('Name'((columns⍳⊂'Name')⊃data))⍬ ⍝ Should find everything
+          TEST←'Read it all back'
+          assert data≡db.Read time ix columns
+
+          TEST←'Erase database'
+          assert 0={db.Erase}time ⍬
+
+      :EndFor ⍝ rotate
+
+      z←'Sharding Tests Completed'
+    ∇
 
-      time←{⍺←⊣ ⋄ t←⎕AI[3]
-          o←output TEST,' ... '
-          z←⍺ ⍺⍺ ⍵
-          o←output(⍕⎕AI[3]-t),'ms',⎕UCS 10
-          z
-      }
-
-    ∇ z←Basic;columns;types;folder;name;db;tnms;data;numrecs;recs;select;where;expect;indices;options;params;range;rcols;rcoli;newvals;i;t;vals
+    ∇ z←Basic;columns;types;folder;name;db;tnms;data;numrecs;recs;select;where;expect;indices;options;params;range;rcols;rcoli;newvals;i;t;vals;ix
      ⍝ Create and delete some tables
 
       numrecs←5000000 ⍝ 5 million records
       memstats 1      ⍝ Clear memory statistics
 
-      ⎕←'Creating: ',folder←path,'\',(name←'testdb1'),'\'
-      :Trap 11 ⋄ {}(⎕NEW #.vecdb(,⊂folder)).Erase ⋄ :EndTrap
+      folder←path,'\',(name←'testdb1'),'\'
+      ⎕←'Clearing: ',folder
+      :Trap 22 ⋄ #.vecdb.Delete folder ⋄ :EndTrap
 
+      ⎕←'Creating: ',folder←path,'\',(name←'testdb1'),'\'
       columns←'col_'∘,¨types←#.vecdb.TypeNames
       assert #.vecdb.TypeNames≡tnms←'I1' 'I2' 'I4',,¨'FBC' ⍝ Types have been added?
       range←2*¯1+8×1 2 4 6 0.25
@@ -68,14 +85,14 @@
       params←name folder columns types options(recs↑¨data)
       TEST←'Creating db & inserting ',(fmtnum recs),' records'
       db←⎕NEW time #.vecdb params
-      assert db.Open
-      assert options.BlockSize∧.=db.(BlockSize,Size)
+      assert db.isOpen
+      assert db.Count=recs
       assert 0=db.Close
-      assert 0=db.Open
+      assert 0=db.isOpen
 
       TEST←'Reopen database'
       db←⎕NEW time #.vecdb(,⊂folder) ⍝ Open it again
-      assert db.Open
+      assert db.isOpen
       assert db.Count=recs
       TEST←'Reading them back:'
       assert(recs↑¨data)≡db.Read time(⍳recs)columns
@@ -101,16 +118,16 @@
       ⍝ Test vecdb.Replace
       indices←db.Query where ⍬
       rcols←columns[rcoli←types⍳,¨'I2' 'B' 'C']
-      TEST←'Updating ',(fmtnum≢indices),' records'
-      newvals←0 1-(⊂indices)∘⌷¨data[2↑rcoli] ⍝ Update with 0-data or ~data
-      newvals,←⊂(≢indices)⍴⊂'changed'        ⍝ And new char values
+      TEST←'Updating ',(fmtnum≢ix←2⊃,indices),' records'
+      newvals←0 1-(⊂ix)∘⌷¨data[2↑rcoli] ⍝ Update with 0-data or ~data
+      newvals,←⊂(≢ix)⍴⊂'changed'        ⍝ And new char values
       assert 0=db.Update time indices rcols newvals
       expect←data[rcoli]
       :For i :In ⍳⍴rcoli
-          t←i⊃expect ⋄ t[indices]←i⊃newvals ⋄ (i⊃expect)←t
+          t←i⊃expect ⋄ t[ix]←i⊃newvals ⋄ (i⊃expect)←t
       :EndFor
-      TEST←'Reading two column for all ',(⍕numrecs),' records'
-      assert expect≡db.Read time(⍳numrecs)rcols
+      TEST←'Reading two columns for all ',(⍕numrecs),' records'
+      assert expect≡db.Read time(1,⍪⊂⍳numrecs)rcols
 
       :If LOG
           ⎕←'Basic tests: memstats before db.Erase:'
@@ -122,5 +139,33 @@
 
       z←'Creation Tests Completed'
     ∇
-
+
+    ∇ x←output x
+      :If LOG ⋄ ⍞←x ⋄ :EndIf
+    ∇
+
+    ∇ r←fmtnum x
+    ⍝ Nice formatting of large integers
+      r←(↓((⍴x),20)⍴'CI20'⎕FMT⍪,x)~¨' '
+    ∇
+
+    ∇ r←memstats reset;maxws;z
+      :If reset=1
+          z←0(2000⌶)14 ⍝ Reset high water mark
+      :Else
+          maxws←⊂⍕2 ⎕NQ'.' 'GetEnvironment' 'MAXWS'
+          r←⎕WA
+          r←'MAXWS' '⎕WA' 'WS Used' 'Allocated' 'High Water Mark',⍪¯20↑¨maxws,fmtnum r,(2000⌶)1 13 14
+      :EndIf
+    ∇
+
+    assert←{'Assertion failed'⎕SIGNAL(⍵=0)/11}
+
+      time←{⍺←⊣ ⋄ t←⎕AI[3]
+          o←output TEST,' ... '
+          z←⍺ ⍺⍺ ⍵
+          o←output(⍕⎕AI[3]-t),'ms',⎕UCS 10
+          z
+      }
+
 :EndNamespace
diff --git a/doc/Implementation.md b/doc/Implementation.md
@@ -0,0 +1,74 @@
+# IMPLEMENTATION #
+
+`vecdb`is an "inverted" database written in and for Dyalog APL, based on the ⎕MAP facility for mapping APL arrays to "flat" files. 
+
+Data Types supported are
+
+| Name | Description                                                |
+|------|------------------------------------------------------------|
+| B    | Boolean; 1 bit per item (0 or 1)                           |
+| I1   | 1-Byte Integer (-128 to +127)                              |
+| I2   | 2-Byte Integer (-32,768 to +32,767)                        |
+| I4   | 4-Byte Integer (+/- 2,147,483,647/8)                       |
+| F    | IEEE Double-Precision Floating Point (+/- ~1.797E308)      |
+| C    | VarChar - up to 32,767 different strings indexed by an I2  |
+
+A single-byte character type is planned - as C but indexed by an I1, allowing only 127 different strings. The current proposal is to denote this type "c".
+
+### File Formats ###
+Data is stored as APL vectors, each one mapping to a single file representing a numeric vector of uniform type, with the file extension *.vector*. In the case of the "C" type which contains variable length characters arrays, the serialised form (created using 220⌶) of a vector of character vectors is stored in a file with extension *.symbol*, and a 2-byte integer of indices into this is stored in the corresponding *.vector* file. 
+
+#### Blocking ####
+Since mapped arrays cannot grow dynamically, the files are over-allocated using a configurable *BlockSize* (*NumBlocks* tracks the number of blocks in use). All vectors have the same length; the number of records actually in used is tracked separately. When a new block is required, all maps are expunged, ⎕NAPPEND is used to add a block to each file, and the maps are re-created.
+
+#### Meta Data ####
+Meta-data is stored in a Dyalog Component File "meta.vecdb", which contains data which does not change during normal operation of the database:
+
+| Cn# | Contents             | Example                                        |
+|-----|----------------------|------------------------------------------------|
+|   1 | Version number       | 'vecdb 0.2.0                                   |
+|   2 | Description          | a char vec                                     |
+|   3 | Unused               | Used to contain number of records              |
+|   4 | Properties           | ('Name' 'BlockSize')('TestDB1' 10000)          |
+|   5 | Col names & types    | ('Stock' 'Price') ((,'C') 'F')                |
+|   6 | Shard folders        | 'c:\mydb\shard1\' 'c:\mydb\shard2\             |
+|   7 | Shard fn and cols    | '{1+2|⎕UCS ⊃¨⍵}'  (,1)                        |
+
+#### Sharding ####
+
+`vecdb` allows the database to be *horizontally partitioned* into *shards*, based on the values of any selection of fields. If a database is created without sharding, data files are created in the same folder that the meta.vecdb file is in. Sharding is specified by passing suitable options to the vecdb constructor. The above example was set up using the following code. For a more advanced example, see the function `TestVecdb.Sharding`:
+
+      columns←'Name' 'BlockSize' ⋄ types←,¨'C' 'F'
+      data←('IBM' 'AAPL' 'MSFT' 'DYALOG')(160.97 112.6 47.21 999.99)
+      options←⎕NS''
+      options.BlockSize←10000
+      options.ShardFolders←'c:\mydb\shard'∘,¨'12'
+      options.(ShardFn ShardCols)←'{1+2|⎕UCS ⊃¨⍵}' 1 
+      params←'TestDB1' 'c:\mydb' columns types options data
+      mydb←⎕NEW vecdb params
+
+In the above example, the database has two shards, based on whether the first character of the Stock name has an odd or even Unicode number.
+
+Each shard is stored in a separate folder which contains the *.vector* files described above, plus a file "counters.vecdb" which currently contains a single 8-byte floating-point value which is the number of active records in the shard (the maximum number of records in a shard is limited to 2*48).
+
+Note that the *.symbol* files are not sharded: The complete list of unique strings for a column is shared between the shards, and resides in main database folder.
+
+The complete set of files which would be created by the above example would be along the lines of:
+
+    Directory of c:\mydb\shardtest
+
+    04/01/2015  21:35     122 1.symbol         // Symbols for Name column
+    04/01/2015  21:35   2,576 meta.vecdb       // Meta data
+
+     Directory of c:\mydb\db\shardtest\Shard1
+
+    04/01/2015  21:35  20,000 1.vector         // 1 block of I2 symbol pointers
+    04/01/2015  21:35  80,000 2.vector         // 1 block of Floating-point prices
+    04/01/2015  21:35       8 counters.vecdb   // Used record counter (contains 3)
+
+     Directory of c:\mydb\vecdb\shardtest\Shard2
+
+    04/01/2015  21:35  20,000 1.vector         // As Shard1
+    04/01/2015  21:35  80,000 2.vector         // As Shard1
+    04/01/2015  21:35       8 counters.vecdb   // record counter (1)
+
diff --git a/doc/Usage.md b/doc/Usage.md
@@ -0,0 +1,6 @@
+# User Guide #
+
+At the moment, the only "documentation" is the test suite. Load the code and trace through or inspect the function `TestVecdb.RunAll` and its subfunctions.
+
+    ]load \vecdbfolder\*.dyalog
+    TestVecdb.RunAll