Description
We have supported loading sparse data in the index-value form (e.g. 0:3.2 3:-4.1 7:0.5
) in #2551 and #2611, but there are still some problems in TO EXPLAIN
clause on PAI.
When we use PAI to run TO EXPLAIN
clause, the procedure of the current implementation is:
- Go: create an ODPS table, which is ready for writing the SHAP explanation values.
- Python: write the SHAP explanation values into the create table on the Go side.
If there is a column named c
whose cell value is in the form of 0:3.2 3:-4.1 7:0.5
, we should think that the column c
contains more than one feature. The result table to write the SHAP explanation data should contain multiple columns named c_0
, c_1
, c_2
, etc. The columns c_0
, c_1
, c_2
, etc should store each feature value in the original column c
. In order to create multiple columns c_0
, c_1
, c_2
, etc in the result table on the Go side, we need the know the data length of the original column c
. That is to say, we need the know the feature derivation result of training on Go side.
However, in the current implementation, we cannot get the feature derivation result of training when the submitter is PAI on the Go side. We obtain the feature derivation result of training on the Python side currently:
sqlflow/go/codegen/pai/template_xgboost.go
Lines 108 to 114 in 483b867
Therefore, we may need to create the ODPS table on the Python side. We need to move the table creation codes from the Go side to the Python side during the refactoring process.