Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.
This repository was archived by the owner on May 9, 2024. It is now read-only.

Joins fail in heterogeneous mode #577

@kurapov-peter

Description

@kurapov-peter

To reproduce the case when join fails when enable_heterogeneous=True, run the following python script:

import pyhdk 

config = pyhdk.buildConfig(enable_heterogeneous=True,
                           force_heterogeneous_distribution=False)
pyhdk.initLogger(log_severity="DEBUG2")
storage = pyhdk.storage.ArrowStorage(1)
data_mgr = pyhdk.storage.DataMgr(config)
data_mgr.registerDataProvider(storage)
calcite = pyhdk.sql.Calcite(storage, config)
executor = pyhdk.Executor(data_mgr, config)

table_1_name = "taxi"
table_2_name = "numbers"
# Assuming you are in hdk/examples/
storage.importCsvFile("../omniscidb/Tests/ArrowStorageDataFiles/taxi_sample_header.csv", table_1_name, pyhdk.storage.TableOptions(5))
storage.importCsvFile("../omniscidb/Tests/ArrowStorageDataFiles/numbers_header.csv", table_2_name, pyhdk.storage.TableOptions(2))

# Perfect hash table OneToOne
sql = f"SELECT * FROM {table_1_name} a JOIN {table_1_name} b ON a.trip_id = b.trip_id"
# sql = f"SELECT * FROM {table_1_name} a JOIN {table_2_name} b ON a.trip_id = b.col1"

ra = calcite.process(sql)
rel_alg_executor = pyhdk.sql.RelAlgExecutor(executor, storage, data_mgr, ra)
res = rel_alg_executor.execute().to_arrow()

A simple hash join on primary key that is done via a perfect hash table is crashing. The output of gdb is not very informative with regards to the location, but seems to nullptr related:

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352709952) at ./nptl/pthread_kill.c:44
...
#8  0x00007fff5ce5f0ae in VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*) ()
   from .../anaconda3/envs/omnisci-dev/lib/jvm/lib/server/libjvm.so
#9  0x00007fff5cd06d69 in JVM_handle_linux_signal ()
   from .../anaconda3/envs/omnisci-dev/lib/jvm/lib/server/libjvm.so
#10 <signal handler called>
#11 0x00007ffff7e3e354 in ?? ()
#12 0x00007ffff7e3e2a0 in ?? ()
#13 0x0000555557cd67d0 in ?? ()
#14 0x0000555557be5060 in ?? ()
#15 0x0000000000000002 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000000 in ?? ()

The last lines DEBUG2 logs show (shortened):

2023-07-12T08:08:41.525587 1 1573099 0 0 NvidiaKernel.cpp:154 Generated GPU binary code size: 469152 bytes
2023-07-12T08:08:41.526317 1 1573099 0 0 Execute.cpp:2881 Launching 1 kernels for query on: 
2023-07-12T08:08:41.526336 1 1573099 0 0 Execute.cpp:2883 	0 &CPU.
2023-07-12T08:08:41.526623 2 1573099 0 0 Execute.cpp:3476 bool(ra_exe_unit.union_all)=false ra_exe_unit.input_descs=(InputDescriptor(table_id(1),nest_level(0)) InputDescriptor(table_id(1),nest_level(1))) ra_exe_unit.input_col_descs=(InputColDescriptor(table_id=1, nest_level=0, col_id=1000) InputColDescriptor(table_id=1, nest_level=0, col_id=1001) ... MANY COL IDS ... 
ra_exe_unit.scan_limit=0 num_rows=((20 20)) frag_offsets=((0 0)) query_exe_context->query_buffers_->num_rows_=-1 query_exe_context->query_mem_desc_.getEntryCount()=1 device_id=0 outer_table_id=-1 scan_limit=-1 start_rowid=0 num_tables=2

The same error happens when we try to join on a different table (you can use the commented sql).


Interestingly, sometimes in ipython notebook the kernel crashes with the following last log lines:

2023-07-12T08:20:16.399125 W 1577588 0 0 Backend.cpp:833 Failed to generate PTX: NVVM IR ParseError: generatePTX: invalid redefinition of function 'pi'
declare double @pi();
               ^
. Switching to CPU execution target.
2023-07-12T08:20:16.399529 F 1577588 0 0 RelAlgExecutor.cpp:433 Check failed: co.device_type == ExecutorDeviceType::GPU 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions