Skip to content

Commit

Permalink
Document CLI args, rename --pg-include/exclude-tables, add auth user
Browse files Browse the repository at this point in the history
  • Loading branch information
exAspArk committed Nov 29, 2024
1 parent 1e6239c commit 7aab0ce
Show file tree
Hide file tree
Showing 7 changed files with 49 additions and 74 deletions.
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ build:
devbox run "./scripts/build-darwin.sh" && \
./scripts/build-linux.sh

build-local:
rm -rf build/bemidb-* && \
cd src && go build -o ../build/bemidb-darwin-arm64

sync:
devbox run --env-file .env "cd src && go run . sync"

Expand Down
93 changes: 31 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,41 +76,17 @@ Here is an example of running BemiDB with default settings and storing data in a

```sh
./bemidb \
--port 54321 \
--database bemidb \
--user= \ # Allow any credentials
--password= \
--storage-type LOCAL \
--storage-path ./iceberg \ # $PWD/iceberg/*
--init-sql ./init.sql \
--log-level INFO \
start
```

To run BemiDB with environment variables:

```sh
# Default settings
export BEMIDB_PORT=54321
export BEMIDB_DATABASE=bemidb
export BEMIDB_USER=
export BEMIDB_PASSWORD=
export BEMIDB_STORAGE_TYPE=LOCAL
export BEMIDB_STORAGE_PATH=./iceberg
export BEMIDB_INIT_SQL=./init.sql
export BEMIDB_LOG_LEVEL=INFO

./bemidb start
```

### S3 block storage

BemiDB natively supports S3 storage. You can specify the S3 settings using the following flags:

```sh
./bemidb \
--port 54321 \
--database bemidb \
--storage-type S3 \
--storage-path iceberg \ # s3://[AWS_S3_BUCKET]/iceberg/*
--aws-region [AWS_REGION] \
Expand All @@ -120,23 +96,6 @@ BemiDB natively supports S3 storage. You can specify the S3 settings using the f
start
```

To run BemiDB with environment variables:

```sh
export BEMIDB_PORT=54321
export BEMIDB_DATABASE=bemidb
export BEMIDB_STORAGE_TYPE=S3
export BEMIDB_STORAGE_PATH=iceberg
export AWS_REGION=[AWS_REGION]
export AWS_S3_BUCKET=[AWS_S3_BUCKET]
export AWS_ACCESS_KEY_ID=[AWS_ACCESS_KEY_ID]
export AWS_SECRET_ACCESS_KEY=[AWS_SECRET_ACCESS_KEY]

./bemidb start
```

CLI arguments take precedence over environment variables. I.e. you can override the environment variables with CLI arguments.

Here is the minimal IAM policy required for BemiDB to work with S3:

```json
Expand Down Expand Up @@ -176,41 +135,26 @@ Note that incremental real-time replication is not supported yet (WIP). Please s

### Syncing from selective tables

You can sync only specific tables from your Postgres database using the `--include-tables` or `--exclude-tables` options.

To include specific tables during the sync:
You can sync only specific tables from your Postgres database. To include specific tables during the sync:

```sh
./bemidb \
--pg-sync-interval 1h \
--include-tables public.users,public.transactions \
--pg-database-url postgres://postgres:postgres@localhost:5432/dbname \
--include-tables schema.table1,public.users \
sync
```

To exclude specific tables during the sync:

```sh
./bemidb \
--pg-sync-interval 1h \
--exclude-tables public.cache,public.logs \
--pg-database-url postgres://postgres:postgres@localhost:5432/dbname \
--exclude-tables schema.table3,public.cache \
sync
```

Note: You cannot use `--include-tables` and `--exclude-tables` simultaneously.

Alternatively, you can set the interval and table inclusion/exclusion using environment variables:

```sh
export PG_SYNC_INTERVAL=1h
export PG_DATABASE_URL=postgres://postgres:postgres@localhost:5432/dbname
export PG_INCLUDE_TABLES=schema.table1,schema.table2
export PG_EXCLUDE_TABLES=schema.table3,schema.table4

./bemidb sync
```

### Syncing from multiple Postgres databases

BemiDB supports syncing data from multiple Postgres databases into the same BemiDB database by allowing prefixing schemas.
Expand All @@ -219,12 +163,12 @@ For example, if two Postgres databases `db1` and `db2` contain `public` schemas,

```sh
./bemidb \
--pg-schema-prefix db1_ \ # or PG_SCHEMA_PREFIX=db1_ using an env variable
--pg-schema-prefix db1_ \
--pg-database-url postgres://postgres:postgres@localhost:5432/db1 \
sync

./bemidb \
--pg-schema-prefix db2_ \ # or PG_SCHEMA_PREFIX=db2_ using an env variable
--pg-schema-prefix db2_ \
--pg-database-url postgres://postgres:postgres@localhost:5432/db2 \
sync
```
Expand All @@ -234,9 +178,34 @@ Then you can query and join tables from both Postgres databases in the same Bemi
```sh
./bemidb start

psql postgres://localhost:54321/bemidb -c "SELECT * FROM db1_public.[TABLE] JOIN db2_public.[TABLE] ON ..."
psql postgres://localhost:54321/bemidb -c \
"SELECT * FROM db1_public.[TABLE] JOIN db2_public.[TABLE] ON ..."
```

### Configuration options

| CLI argument | Environment variable | Default value | Description |
|---------------------------|-------------------------|----------------|---------------------------------------------------------------------------|
| `--port` | `BEMIDB_PORT` | `54321` | Port for BemiDB to listen on |
| `--database` | `BEMIDB_DATABASE` | `bemidb` | Database name |
| `--storage-type` | `BEMIDB_STORAGE_TYPE` | `LOCAL` | Storage type: `LOCAL` or `S3` |
| `--storage-path` | `BEMIDB_STORAGE_PATH` | `iceberg` | Path to the storage folder |
| `--log-level` | `BEMIDB_LOG_LEVEL` | `INFO` | Log level: `DEBUG`, `INFO`, `WARN`, or `ERROR` |
| `--init-sql ` | `BEMIDB_INIT_SQL` | `./init.sql` | Path to the initialization SQL file |
| `--user` | `BEMIDB_USER` | | Database user. Allows any if empty |
| `--password` | `BEMIDB_PASSWORD` | | Database password. Allows any if empty |
| `--aws-region` | `AWS_REGION` | | AWS region. Required if storage type is `S3` |
| `--aws-s3-bucket` | `AWS_S3_BUCKET` | | AWS S3 bucket name. Required if storage type is `S3` |
| `--aws-access-key-id` | `AWS_ACCESS_KEY_ID` | | AWS access key ID. Required if storage type is `S3` |
| `--aws-secret-access-key` | `AWS_SECRET_ACCESS_KEY` | | AWS secret access key. Required if storage type is `S3` |
| `--pg-database-url` | `PG_DATABASE_URL` | | PostgreSQL database URL to sync |
| `--pg-sync-interval` | `PG_SYNC_INTERVAL` | | Interval between syncs. Valid units: `ns`, `us`/`µs`, `ms`, `s`, `m`, `h` |
| `--pg-exclude-tables` | `PG_EXCLUDE_TABLES` | | List of tables to exclude from sync. Comma-separated `schema.table` |
| `--pg-include-tables` | `PG_INCLUDE_TABLES` | | List of tables to include in sync. Comma-separated `schema.table` |
| `--pg-schema-prefix` | `PG_SCHEMA_PREFIX` | | Prefix for PostgreSQL schema names |

Note that CLI arguments take precedence over environment variables. I.e. you can override the environment variables with CLI arguments.

## Architecture

BemiDB consists of the following main components:
Expand Down
2 changes: 1 addition & 1 deletion scripts/install.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

VERSION="0.14.4"
VERSION="0.15.0"

# Detect OS and architecture
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
Expand Down
8 changes: 4 additions & 4 deletions src/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,9 @@ func registerFlags() {
flag.StringVar(&_config.StorageType, "storage-type", os.Getenv(ENV_STORAGE_TYPE), "Storage type: \"LOCAL\", \"S3\". Default: \""+DEFAULT_DB_STORAGE_TYPE+"\"")
flag.StringVar(&_config.Pg.SchemaPrefix, "pg-schema-prefix", os.Getenv(ENV_PG_SCHEMA_PREFIX), "(Optional) Prefix for PostgreSQL schema names")
flag.StringVar(&_config.Pg.SyncInterval, "pg-sync-interval", os.Getenv(ENV_PG_SYNC_INTERVAL), "(Optional) Interval between syncs. Valid units: \"ns\", \"us\" (or \"µs\"), \"ms\", \"s\", \"m\", \"h\"")
flag.StringVar(&_pgIncludeTables, "include-tables", os.Getenv(ENV_PG_INCLUDE_TABLES), "(Optional) Comma-separated list of tables to include in sync (format: schema.table)")
flag.StringVar(&_pgExcludeTables, "exclude-tables", os.Getenv(ENV_PG_EXCLUDE_TABLES), "(Optional) Comma-separated list of tables to exclude from sync (format: schema.table)")
flag.StringVar(&_config.Pg.DatabaseUrl, "pg-database-url", os.Getenv(ENV_PG_DATABASE_URL), "PostgreSQL database URL")
flag.StringVar(&_pgIncludeTables, "pg-include-tables", os.Getenv(ENV_PG_INCLUDE_TABLES), "(Optional) Comma-separated list of tables to include in sync (format: schema.table)")
flag.StringVar(&_pgExcludeTables, "pg-exclude-tables", os.Getenv(ENV_PG_EXCLUDE_TABLES), "(Optional) Comma-separated list of tables to exclude from sync (format: schema.table)")
flag.StringVar(&_config.Pg.DatabaseUrl, "pg-database-url", os.Getenv(ENV_PG_DATABASE_URL), "PostgreSQL database URL to sync")
flag.StringVar(&_config.Aws.Region, "aws-region", os.Getenv(ENV_AWS_REGION), "AWS region")
flag.StringVar(&_config.Aws.S3Bucket, "aws-s3-bucket", os.Getenv(ENV_AWS_S3_BUCKET), "AWS S3 bucket name")
flag.StringVar(&_config.Aws.AccessKeyId, "aws-access-key-id", os.Getenv(ENV_AWS_ACCESS_KEY_ID), "AWS access key ID")
Expand Down Expand Up @@ -150,7 +150,7 @@ func parseFlags() {
}
}
if _pgIncludeTables != "" && _pgExcludeTables != "" {
panic("Cannot specify both --include-tables and --exclude-tables")
panic("Cannot specify both --pg-include-tables and --pg-exclude-tables")
}
if _pgIncludeTables != "" {
_config.Pg.IncludeTables = NewSet(strings.Split(_pgIncludeTables, ","))
Expand Down
10 changes: 5 additions & 5 deletions src/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ func TestLoadConfig(t *testing.T) {
"--pg-database-url", "postgres://user:password@localhost:5432/db",
"--pg-sync-interval", "2h30m",
"--pg-schema-prefix", "mydb_",
"--include-tables", "public.users",
"--pg-include-tables", "public.users",
})

config := LoadConfig()
Expand Down Expand Up @@ -189,10 +189,10 @@ func TestLoadConfig(t *testing.T) {
}
})

t.Run("Handles exclude-tables configuration", func(t *testing.T) {
t.Run("Handles pg-exclude-tables configuration", func(t *testing.T) {
setTestArgs([]string{
"--pg-database-url", "postgres://user:password@localhost:5432/db",
"--exclude-tables", "public.secrets,public.cache",
"--pg-exclude-tables", "public.secrets,public.cache",
})
config := LoadConfig(true)

Expand Down Expand Up @@ -239,8 +239,8 @@ func TestLoadConfig(t *testing.T) {
"--pg-database-url", "postgres://user:password@localhost:5432/db",
"--pg-sync-interval", "2h30m",
"--pg-schema-prefix", "mydb_",
"--include-tables", "public.users",
"--exclude-tables", "public.orders",
"--pg-include-tables", "public.users",
"--pg-exclude-tables", "public.orders",
})

defer func() {
Expand Down
2 changes: 1 addition & 1 deletion src/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (
"time"
)

const VERSION = "0.14.4"
const VERSION = "0.15.0"

func main() {
config := LoadConfig()
Expand Down
4 changes: 3 additions & 1 deletion src/postgres.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ const (
PG_VERSION = "17.0"
PG_ENCODING = "UTF8"
PG_TX_STATUS_IDLE = 'I'

SYSTEM_AUTH_USER = "bemidb"
)

type Postgres struct {
Expand Down Expand Up @@ -173,7 +175,7 @@ func (postgres *Postgres) handleStartup() error {
return errors.New("Database does not exist")
}

if postgres.config.User != "" && params["user"] != postgres.config.User {
if postgres.config.User == "" && params["user"] != postgres.config.User && params["user"] != SYSTEM_AUTH_USER {
postgres.writeError("role \"" + params["user"] + "\" does not exist")
return errors.New("Role does not exist")
}
Expand Down

0 comments on commit 7aab0ce

Please sign in to comment.