Skip to content

workos/pgslice

 
 

Repository files navigation

pgslice

Postgres partitioning as easy as pie. Works great for both new and existing tables, with zero downtime and minimal app changes. No need to install anything on your database server. Archive older data on a rolling basis to keep your database size under control.

🍊 Battle-tested at Instacart

Build Status

Install

pgslice is a command line tool. To install, run:

gem install pgslice

This will give you the pgslice command. If installation fails, you may need to install dependencies.

You can also install it with Homebrew or Docker.

Global Options

All commands support these global options:

  • --url: Database URL (can also be set via PGSLICE_URL environment variable)
  • --dry-run: Print SQL statements without executing them

Steps

  1. Ensure the table you want to partition has been created. We’ll refer to this as <table>.

  2. Specify your database credentials

export PGSLICE_URL=postgres://localhost/myapp_development
  1. Create an intermediate table
pgslice prep <table> <column> <period>

The column should be a timestamp, timestamptz, or date column and period can be day, month, or year.

This creates a partitioned table named <table>_intermediate using range partitioning.

Options:

  • --no-partition: Create a non-partitioned intermediate table (useful for one-off tasks)
  • --trigger-based: Use trigger-based partitioning (for PostgreSQL 9.x)
  1. Add partitions to the intermediate table
pgslice add_partitions <table> --intermediate --past 3 --future 3

Use the --past and --future options to control the number of partitions.

  1. Enable mirroring triggers for live data changes
pgslice enable_mirroring <table>

This enables triggers that automatically mirror INSERT, UPDATE, and DELETE operations from the original table to the intermediate table during the partitioning process. This ensures that any data changes made after you start the partitioning process are captured in both tables.

  1. Optional, for tables with data - Fill the partitions in batches with data from the original table
pgslice fill <table>

Options:

  • --batch-size: Number of rows per batch (default: 10000)
  • --sleep: Seconds to sleep between batches (default: 0)
  • --swapped: Fill from retired table to partitioned table (after swap)
  • --source-table: Source table name (default: original table or retired table if --swapped)
  • --dest-table: Destination table name (default: intermediate table or partitioned table if --swapped)
  • --start: Primary key value to start from (numeric or ULID)
  • --where: Additional WHERE conditions to filter rows

To sync data across different databases, check out pgsync.

  1. Analyze tables
pgslice analyze <table>

Options:

  • --swapped: Analyze the partitioned table (after swap)
  1. Sync/Validate the tables

This will ensure the two tables are definitely in sync. It should be a no-op, but will generate INSERT, UPDATE, and DELETE statements if discrepencies are discovered. On a production system, ensure you understand the --window-size, --delay, and --delay-multiplier options.

pgslice synchronize <table> [options]

Options:

  • --source-table: Source table to compare (default: <table>)
  • --target-table: Target table to compare (default: <table>_intermediate)
  • --primary-key: Primary key column name (default: detected from table)
  • --start: Primary key value to start synchronization at
  • --window-size: Number of rows to synchronize per batch (default: 1000)
  • --delay: Base delay in seconds between batches (default: 0)
  • --delay-multiplier: Delay multiplier for batch time (default: 0)
  • --dry-run: Print statements without executing
  1. Swap the intermediate table with the original table
pgslice swap <table>

The original table is renamed <table>_retired and the intermediate table is renamed <table>.

Options:

  • --lock-timeout: Lock timeout for the swap operation (default: 5s)
  1. Disable mirroring triggers
pgslice disable_mirroring <table>

After the swap, the original mirroring triggers are no longer needed since the tables have been swapped.

  1. Enable Reverse Mirroring (now-partitioned table to retired table)

This will make unswapping later less problematic as the two tables are kept in sync. Note that the tables will be slightly out of sync. Find some ID from before the swap, and run the table synchronize commands from Step 8 on the table to be sure to catch those rows.

pgslice enable_retired_mirroring <table>  # undo with pgslice disable_retired_mirroring <table>
  1. Fill the rest (rows inserted between the first fill and the swap)

This step should not be needed if you did the pgslice synchronize in step 8.

pgslice fill <table> --swapped
  1. Disable retired mirroring triggers
pgslice disable_retired_mirroring <table>

Once you're confident the retired table is no longer needed and you're ready to drop it, disable the retired mirroring triggers.

  1. Back up the retired table with a tool like pg_dump and drop it
pg_dump -c -Fc -t <table>_retired $PGSLICE_URL > <table>_retired.dump
psql -c "DROP TABLE <table>_retired" $PGSLICE_URL

Sample Output

pgslice prints the SQL commands that were executed on the server. To print without executing, use the --dry-run option.

pgslice prep visits created_at month
BEGIN;

CREATE TABLE "public"."visits_intermediate" (LIKE "public"."visits" INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING STORAGE INCLUDING COMMENTS INCLUDING STATISTICS INCLUDING GENERATED INCLUDING COMPRESSION) PARTITION BY RANGE ("created_at");

CREATE INDEX ON "public"."visits_intermediate" USING btree ("created_at");

COMMENT ON TABLE "public"."visits_intermediate" is 'column:created_at,period:month,cast:date,version:3';

COMMIT;
pgslice add_partitions visits --intermediate --past 1 --future 1
BEGIN;

CREATE TABLE "public"."visits_202408" PARTITION OF "public"."visits_intermediate" FOR VALUES FROM ('2024-08-01') TO ('2024-09-01');

ALTER TABLE "public"."visits_202408" ADD PRIMARY KEY ("id");

CREATE TABLE "public"."visits_202409" PARTITION OF "public"."visits_intermediate" FOR VALUES FROM ('2024-09-01') TO ('2024-10-01');

ALTER TABLE "public"."visits_202409" ADD PRIMARY KEY ("id");

CREATE TABLE "public"."visits_202410" PARTITION OF "public"."visits_intermediate" FOR VALUES FROM ('2024-10-01') TO ('2024-11-01');

ALTER TABLE "public"."visits_202410" ADD PRIMARY KEY ("id");

COMMIT;
pgslice fill visits
/* 1 of 3 */
INSERT INTO "public"."visits_intermediate" ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM "public"."visits"
    WHERE "id" > 0 AND "id" <= 10000 AND "created_at" >= '2024-08-01'::date AND "created_at" < '2024-11-01'::date

/* 2 of 3 */
INSERT INTO "public"."visits_intermediate" ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM "public"."visits"
    WHERE "id" > 10000 AND "id" <= 20000 AND "created_at" >= '2024-08-01'::date AND "created_at" < '2024-11-01'::date

/* 3 of 3 */
INSERT INTO "public"."visits_intermediate" ("id", "user_id", "ip", "created_at")
    SELECT "id", "user_id", "ip", "created_at" FROM "public"."visits"
    WHERE "id" > 20000 AND "id" <= 30000 AND "created_at" >= '2024-08-01'::date AND "created_at" < '2024-11-01'::date
pgslice analyze visits
ANALYZE VERBOSE "public"."visits_202408";

ANALYZE VERBOSE "public"."visits_202409";

ANALYZE VERBOSE "public"."visits_202410";

ANALYZE VERBOSE "public"."visits_intermediate";
pgslice swap visits
BEGIN;

SET LOCAL lock_timeout = '5s';

ALTER TABLE "public"."visits" RENAME TO "visits_retired";

ALTER TABLE "public"."visits_intermediate" RENAME TO "visits";

ALTER SEQUENCE "public"."visits_id_seq" OWNED BY "public"."visits"."id";

COMMIT;

Adding Partitions

To add partitions, use:

pgslice add_partitions <table> --future 3

Add this as a cron job to create a new partition each day, month, or year.

# day
0 0 * * * pgslice add_partitions <table> --future 3 --url ...

# month
0 0 1 * * pgslice add_partitions <table> --future 3 --url ...

# year
0 0 1 1 * pgslice add_partitions <table> --future 3 --url ...

Add a monitor to ensure partitions are being created.

SELECT 1 FROM
    pg_catalog.pg_class c
INNER JOIN
    pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE
    c.relkind = 'r' AND
    n.nspname = 'public' AND
    c.relname = '<table>_' || to_char(NOW() + INTERVAL '3 days', 'YYYYMMDD')
    -- for months, use to_char(NOW() + INTERVAL '3 months', 'YYYYMM')
    -- for years, use to_char(NOW() + INTERVAL '3 years', 'YYYY')

Archiving Partitions

Back up and drop older partitions each day, month, or year.

pg_dump -c -Fc -t <table>_202409 $PGSLICE_URL > <table>_202409.dump
psql -c "DROP TABLE <table>_202409" $PGSLICE_URL

If you use Amazon S3 for backups, s3cmd is a nice tool.

s3cmd put <table>_202409.dump s3://<s3-bucket>/<table>_202409.dump

Schema Updates

Once a table is partitioned, make schema updates on the master table only (not partitions). This includes adding, removing, and modifying columns, as well as adding and removing indexes and foreign keys.

Additional Commands

To undo prep (which will delete partitions), use:

pgslice unprep <table>

To undo swap, use:

pgslice unswap <table>

To enable mirroring triggers for live data changes during partitioning (before swap), use:

pgslice enable_mirroring <table>

To disable mirroring triggers after partitioning is complete, use:

pgslice disable_mirroring <table>

To show the version, use:

pgslice version
# or
pgslice --version

Additional Options

Set the tablespace when adding partitions

pgslice add_partitions <table> --tablespace fastspace

App Considerations

This set up allows you to read and write with the original table name with no knowledge it’s partitioned. However, there are a few things to be aware of.

Reads

When possible, queries should include the column you partition on to limit the number of partitions the database needs to check. For instance, if you partition on created_at, try to include it in queries:

SELECT * FROM
    visits
WHERE
    user_id = 123 AND
    -- for performance
    created_at >= '2024-09-01' AND created_at < '2024-09-02'

For this to be effective, ensure constraint_exclusion is set to partition (the default value) or on.

SHOW constraint_exclusion;

Frameworks

Rails

Specify the primary key for partitioned models to ensure it’s returned.

class Visit < ApplicationRecord
  self.primary_key = "id"
end

Other Frameworks

Please submit a PR if additional configuration is needed.

One Off Tasks

You can also use pgslice to reduce the size of a table without partitioning by creating a new table, filling it with a subset of records, and swapping it in.

pgslice prep <table> --no-partition
pgslice fill <table> --where "id > 1000" # use any conditions
pgslice analyze <table>
pgslice swap <table>
pgslice fill <table> --where "id > 1000" --swapped

Triggers

Triggers aren’t copied from the original table. You can set up triggers on the intermediate table if needed.

Data Protection

Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.

Additional Installation Methods

Homebrew

With Homebrew, you can use:

brew install pgslice

Docker

Get the Docker image with:

docker pull ankane/pgslice
alias pgslice="docker run --rm -e PGSLICE_URL ankane/pgslice"

This will give you the pgslice command.

Dependencies

If installation fails, your system may be missing Ruby or libpq.

On Mac, run:

brew install libpq

On Ubuntu, run:

sudo apt-get install ruby-dev libpq-dev build-essential

Upgrading

Run:

gem install pgslice

To use master, run:

gem install specific_install
gem specific_install https://github.com/ankane/pgslice.git

Creating Releases

Releases are automatically created when a version tag is pushed to the repository. The release workflow builds the gem and attaches it to a GitHub release.

Steps to Create a Release

  1. Create a new branch with name v0.7.1.workos3

  2. Update the version in lib/pgslice/version.rb:

    VERSION = "0.7.1.workos3"
  3. Commit and push the version change:

    git add lib/pgslice/version.rb
    git commit -m "Bump version to v0.7.1.workos3"
    git push
  4. Start a PR, get it reviewed and merged: Pull Request Example Commit Example

  5. Create a release through github UI:

Draft a new release

  1. Choose to create a new tag

Create a new tag Create new tag on publish

  1. Make sure to match version number with tag and hit generate release notes

Prepare release notes

  1. Publish

Prepare release notes

The release workflow is configured in .github/workflows/release.yml.

Reference

Related Projects

Also check out:

  • Dexter - The automatic indexer for Postgres
  • PgHero - A performance dashboard for Postgres
  • pgsync - Sync Postgres data to your local machine

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/pgslice.git
cd pgslice
bundle install
createdb pgslice_test
bundle exec rake test

To test against different versions of Postgres with Docker, use:

docker run -p=8000:5432 postgres:16
TZ=Etc/UTC PGSLICE_URL=postgres://postgres@localhost:8000/postgres bundle exec rake

On Mac, you must use Docker Desktop for the port mapping to localhost to work.

About

Postgres partitioning as easy as pie

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 99.6%
  • Dockerfile 0.4%