[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution

Apparently plasticc no longer successfully completes when it is ran on the original plasticc dataset instead of the synthetic. This is true for both HDK version 0.6 and 0.7.  On some systems execution ends silently, on some systems there is an error message
```[2023-07-13 17:34:49.912154] [0x00005874] [info]    0 71 BufferMgr.cpp:720 Check failed: buffer_it->second->buffer```. Debugging shows that it happens on the line that triggers HDK execution ```df_meta.shape  # to trigger real execution```.

You can reproduce the problem by checking out the benchmarks repo https://github.com/gshimansky/data-science-processing-workload. 

To execute benchmark on the original dataset download `test_set.csv`, `training_set.csv`, `test_set_metadata.csv` and `training_set_metadata.csv` from modin datasets s3 bucket `s3://modin-datasets/plasticc`. You can execute `benchmarks/plasticc.py` directly like this:
```
set MODIN_STORAGE_FORMAT=hdk
set MODIN_ENGINE=native
set MODIN_EXPERIMENTAL=true
python benchmarks/plasticc.py training_set.csv test_set.csv training_set_metadata.csv test_set_metadata.csv
```
or you can rename these files into `plasticc_training_set.csv`, `plasticc_test_set.csv`, `plasticc_training_set_metadata.csv` and `plasticc_test_set_metadata.csv` respectively and running launcher.py with option `-ru` (reuse):
```
python launcher.py -m plasticc -ru --hdk
```
With `-ru` launcher skips generation stage and reuses dataset files already present in current directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution #581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[WIN] Plasticc benchmark crashes on the original plasticc dataset on HDK tasks execution #581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions