Skip to content

Slow Symbol Table vs ASOF JOIN Table #4816

Open
@nicholas-a-guerra

Description

@nicholas-a-guerra

To reproduce

When deciding on creating a schema for multiple tables that are very similar, it seems there are two key options in QuestDB.

Either create a separate table for each and name them accordingly:

table_1

  • timestamp
  • state

table_2

  • timestamp
  • state

Or create one table with an id symbol:

table

  • id
  • timestamp
  • state

It was expected when trying to gather all the data, that the symbol table would most often produce faster execution times. But this seems not to be the case. In cases where the sample by is a small time frame, the asof join across all the separate tables seems to perform significantly better. And more so, as the tables and dataset grows so too does the disparity between execution time. Below is an minimum reproducible set to begin to see the difference. Adding more tables and data makes the difference worse. Also, attached is images of difference queries using the rest api to show the execution time differences. These were collected and printed via the /timings option in the rest api.

CREATE TABLE table_1 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_1
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_2 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_2
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_3 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_3
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_4 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_4
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_5 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_5
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_6 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_6
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_7 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_7
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_8 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_8
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_9 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_9
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;

CREATE TABLE table_10 (
  timestamp TIMESTAMP,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO table_10
    SELECT
        timestamp_sequence('2024-01-01T00:00:00', 1000000L) timestamp,
        rnd_long() state, 
    FROM long_sequence(86400) x;


CREATE TABLE symbol_table (
  timestamp TIMESTAMP,
  id SYMBOL capacity 256 CACHE,
  state LONG
) timestamp (timestamp) PARTITION BY DAY WAL;

INSERT INTO symbol_table
    SELECT timestamp, CAST(1 AS SYMBOL), state FROM table_1;

INSERT INTO symbol_table
    SELECT timestamp, CAST(2 AS SYMBOL), state FROM table_2;

INSERT INTO symbol_table
    SELECT timestamp, CAST(3 AS SYMBOL), state FROM table_3;

INSERT INTO symbol_table
    SELECT timestamp, CAST(4 AS SYMBOL), state FROM table_4;

INSERT INTO symbol_table
    SELECT timestamp, CAST(5 AS SYMBOL), state FROM table_5;

INSERT INTO symbol_table
    SELECT timestamp, CAST(6 AS SYMBOL), state FROM table_6;

INSERT INTO symbol_table
    SELECT timestamp, CAST(7 AS SYMBOL), state FROM table_7;

INSERT INTO symbol_table
    SELECT timestamp, CAST(8 AS SYMBOL), state FROM table_8;

INSERT INTO symbol_table
    SELECT timestamp, CAST(9 AS SYMBOL), state FROM table_9;

INSERT INTO symbol_table
    SELECT timestamp, CAST(10 AS SYMBOL), state FROM table_10;

QuestDB version:

8.1.0

OS, in case of Docker specify Docker and the Host OS:

Ubuntu 22.04.4 LTS Docker

File System, in case of Docker specify Host File System:

ext4

Full Name:

Nick Guerra

Affiliation:

Kronus Engineering

Have you followed Linux, MacOs kernel configuration steps to increase Maximum open files and Maximum virtual memory areas limit?

  • Yes, I have

Additional context

image
image
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformancePerformance improvementsSQLIssues or changes relating to SQL execution

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions