Skip to content

Commit de2d633

Browse files
pdabre12Pratik Joseph Dabre
authored and
Pratik Joseph Dabre
committed
Create RFC-0009-native-tpcds-connector.md
1 parent 9d62ffe commit de2d633

File tree

1 file changed

+46
-0
lines changed

1 file changed

+46
-0
lines changed

RFC-0009-native-tpcds-connector.md

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# **RFC-0009 for Presto**
2+
3+
## Presto - Native TPC-DS connector
4+
5+
Proposers
6+
7+
* Pratik Joseph Dabre
8+
* Pramod Satya
9+
10+
## [Related Issues]
11+
12+
Related issues: https://github.com/prestodb/presto/issues/22361
13+
14+
Related PRs: https://github.com/prestodb/presto/pull/23067
15+
16+
## Summary
17+
18+
A native TPC-DS connector capable of generating in-memory data on the fly is proposed.
19+
20+
## Background
21+
22+
Currently , Presto does not have a native implementation of the TPC-DS connector. This RFC proposes the addition of a new TPC-DS connector. The new connector can be used as a Presto - Native catalog.
23+
24+
### [Optional] Goals
25+
26+
1. Add a TPC-DS connector to generate TPC-DS data in Presto native.
27+
2. Write end-to-end tests in Presto native with TPC-DS tables and conduct microbenchmarks in Velox.
28+
29+
### [Optional] Non-goals
30+
31+
## Proposed Implementation
32+
33+
The Presto - Native TPC-DS connector will be a wrapper for the generator distributed (dsdgen) by the TPC organization from C. This means we need our implementation to have the exact same behavior as the C implementation. DuckDB already has a TPC-DS connector of their own and they have wrapped the C files into C++ files, we are going to use these C++ files in our implementation.
34+
35+
In the C++ implementation, there are two types of tables: source tables and target tables used for generation. Source table files are prefixed with "s_", while target table files are prefixed with "w_". For instance, there may be files like "s_call_center.c" and "w_call_center.c". It appears that source tables are only utilized when running the "dsdgen" with an update flag, though the exact function of this flag and the purpose of the source tables have not yet been explored. Currently, our focus is solely on implementing functionalities for the target tables (w_ tables).
36+
37+
In the target table files prefixed with “w_”, there are some helper functions(need to be implemented by us) precisely called as “append_row_start“ and “append_row_end“ which help in the row generation. Depending on the schema of the table, there will be “append_ “ functions depending on the data type to be appended.
38+
39+
A new TPC-DS config `tpcds.toggle-char-to-varchar` will be added to toggle the char columns to varchar, addressing the lack of support for the char data type in Presto - Native. This config allows the toggling of the char to varchar when required, ensuring consistency between Presto - Java and Presto - Native.
40+
41+
## Adoption Plan
42+
43+
## Test Plan
44+
45+
Native end-to-end tests are added in https://github.com/prestodb/presto/pull/23067.
46+
Future enhancements will include adding SpeedTest and ConnectorTest to the Velox repository.

0 commit comments

Comments
 (0)