Skip to content

Support Explanation of XGBoost Model with Index-Value Sparse Data #2645

Open
@sneaxiy

Description

@sneaxiy

We have supported loading sparse data in the index-value form (e.g. 0:3.2 3:-4.1 7:0.5) in #2551 and #2611, but there are still some problems in TO EXPLAIN clause on PAI.

When we use PAI to run TO EXPLAIN clause, the procedure of the current implementation is:

  • Go: create an ODPS table, which is ready for writing the SHAP explanation values.
  • Python: write the SHAP explanation values into the create table on the Go side.

If there is a column named c whose cell value is in the form of 0:3.2 3:-4.1 7:0.5, we should think that the column c contains more than one feature. The result table to write the SHAP explanation data should contain multiple columns named c_0, c_1, c_2, etc. The columns c_0, c_1, c_2, etc should store each feature value in the original column c. In order to create multiple columns c_0, c_1, c_2, etc in the result table on the Go side, we need the know the data length of the original column c. That is to say, we need the know the feature derivation result of training on Go side.

However, in the current implementation, we cannot get the feature derivation result of training when the submitter is PAI on the Go side. We obtain the feature derivation result of training on the Python side currently:

(estimator,
model_params,
train_params,
feature_field_meta,
feature_column_names,
label_field_meta,
feature_column_code) = model.load_metas("{{.OSSModelDir}}", "xgboost_model_desc")

Therefore, we may need to create the ODPS table on the Python side. We need to move the table creation codes from the Go side to the Python side during the refactoring process.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions