Support Explanation of XGBoost Model with Index-Value Sparse Data

We have supported loading sparse data in the index-value form (e.g. `0:3.2 3:-4.1 7:0.5`) in https://github.com/sql-machine-learning/sqlflow/pull/2551 and https://github.com/sql-machine-learning/sqlflow/pull/2611, but there are still some problems in `TO EXPLAIN` clause on PAI.

When we use PAI to run `TO EXPLAIN` clause, the procedure of the current implementation is:

- Go: create an ODPS table, which is ready for writing the SHAP explanation values. 
- Python: write the SHAP explanation values into the create table on the Go side.

If there is a column named `c` whose cell value is in the form of `0:3.2 3:-4.1 7:0.5`, we should think that the column `c` contains more than one feature. The result table to write the SHAP explanation data should contain multiple columns named `c_0`, `c_1`, `c_2`, etc. The columns `c_0`, `c_1`, `c_2`, etc should store each feature value in the original column `c`. In order to create multiple columns `c_0`, `c_1`, `c_2`, etc in the result table on the Go side, we need the know the data length of the original column `c`. That is to say, **we need the know the feature derivation result of training on Go side**. 

However, in the current implementation, we cannot get the feature derivation result of training when the submitter is PAI on the Go side. We obtain the feature derivation result of training on the Python side currently:

https://github.com/sql-machine-learning/sqlflow/blob/483b8676cf93f373d5073d84b0bee311bb122012/go/codegen/pai/template_xgboost.go#L108-L114

Therefore, we may need to create the ODPS table on the Python side. We need to move the table creation codes from the Go side to the Python side during the refactoring process.

	(estimator,
	model_params,
	train_params,
	feature_field_meta,
	feature_column_names,
	label_field_meta,
	feature_column_code) = model.load_metas("{{.OSSModelDir}}", "xgboost_model_desc")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Explanation of XGBoost Model with Index-Value Sparse Data #2645

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Explanation of XGBoost Model with Index-Value Sparse Data #2645

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions