Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zq/add specified autocompare #785

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 29 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 0 additions & 15 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -353,21 +353,6 @@ jobs:
source scripts/ci/ascend/ci_ascend_env.sh
bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
|| ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )

Build-Ascend-910b-with-autocompare:
name: Build-dipu-ascend-910b-with-autocompare
needs: [Build-PyTorch-For-Ascend-910b]
runs-on: tps-ascend-ci-910b
steps:
- name: Build dipu
run: |
set -ex
export USE_COVERAGE=ON
export USE_AUTOCOMPARE=ON
cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && cp -R source ${GITHUB_JOB} && cd ${GITHUB_JOB}/dipu
source scripts/ci/ascend/ci_ascend_env.sh
bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
|| ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )

Test-Ascend-910b:
name: Test-dipu-ascend-910b
Expand Down
50 changes: 33 additions & 17 deletions dipu/QuickStart.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,9 +158,10 @@ sh ./tests/python/run_tests.sh

### 算子库拓展功能

#### 算子 Fallback
#### 算子Fallback功能

Fallback 给定算子:
Fallback指的是使用算子的CPU实现,而非设备实现。
Fallback给定算子:

```bash
export DIPU_FORCE_FALLBACK_OPS_LIST=add.out,conv2d
Expand All @@ -181,20 +182,13 @@ export DIPU_FORCE_FALLBACK_OPS_LIST='.*'
python -c "import torch_dipu"
```

#### 算子精度自动对比功能介绍
#### 算子精度自动对比功能

由于该功能默认不开启,使用该功能时需要打开该功能并重新编译DIPU。

可以通过设置环境变量USE_AUTOCOMPARE=ON,来开启该功能,然后需要重新编译DIPU。

```shell
export USE_AUTOCOMPARE=ON
```

以上方法是对所有算子开启自动精度对比。如果只需要对特定算子做精度对比,也可只给需要的算子做精度对比,只需要在相关的配置文件(如 `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml`)给相应的算子添加 `autocompare: True` 即可。
算子精度自动对比功能(autocompare)用于确保算子计算结果的正确性,通过将设备参数拷贝到CPU上,对比CPU和设备的计算结果来判断精度是否达标。以下是算子精度自动对比功能的使用例子:

```shell
$ unset DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 cpu, 可选
$ unset DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 CPU, 可选
$ export DIPU_AUTOCOMPARE_OPS_LIST=add.out # 对add.out算子开启autocompare功能
$ python
>>> import torch
>>> import torch_dipu
Expand All @@ -220,11 +214,33 @@ autocompare: add.out other: allclose
>>>
```

可以看到,CPU 计算结果与设备计算结果 `allclose`,也能看到 CPU 和设备计算结果的 `shape`、`dtype` 等信息。特别的,需要注意以下几个问题:
可以看到,输出包括 CPU 和设备计算结果的 `shape`、`stride`、`dtype` 等信息, 最终结果是CPU和设备的self和out都是allclose的。

##### 算子精度自动对比功能的设置

算子精度自动对比功能默认不开启,可以设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST`来控制该功能,在开启算子自动对比功能前,必须unset `DIPU_FORCE_FALLBACK_OPS_LIST`

- 可以通过设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST='.*'`,开启全局的精度对比,这种情况下所有调用的算子都会进行精度对比。

```shell
# 开启全局的算子精度自动对比功能
export DIPU_AUTOCOMPARE_OPS_LIST='.*'
```

- 可以设置`DIPU_AUTOCOMPARE_OPS_LIST`来指定算子开启自动精度对比,支持正则表达式匹配,也可以指定多个算子开启自动精度对比。算子名可以参考[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml)。

```shell
# 指定匹配add.*?的算子进行自动精度对比
export DIPU_AUTOCOMPARE_OPS_LIST=add.*?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我还是很奇怪,add.*? 是个啥匹配

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image add.*?非贪婪匹配所有add开头的,匹配结果是add 比如"add.out, add.scalar", 匹配结果就是“add”, “add”,只要匹配结果不为空,isOpMatch都返回True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

问号是对匹配结果造成影响是吧,这么看来这这里是多余的?

# 指定add.out、sub.out算子进行自动精度对比
export DIPU_AUTOCOMPARE_OPS_LIST="add.out, sub.out"
```

NOTE:

1. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss`、`conv2d`、`dropout`、`dropout_`、`linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐,可根据需要先将这几个算子 fallback 到 CPU 来确定问题
2. 随机数生成相关的算子(`dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autocompare:False`)没有做 `autocompare`,因为结果总是 `not_allclose`
3. 对输入做检查是确保算子输入不被意外修改
1. 部分算子并不支持自动精度对比功能,可以查看[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml),其中的`autocompare`配置项为`disable`即不支持自动精度对比功能,同时也可以修改`diopi_functions.yaml`,将某些算子的`autocompare`配置项设置为`disable`来禁用自动对比功能
2. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss`、`conv2d`、`dropout`、`dropout_`、`linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐,可根据需要先将这几个算子 fallback 到 CPU 来确定问题
3. 对输入参数(self)做检查是确保算子的输入不被意外修改

#### 抓取算子参数

Expand Down
61 changes: 15 additions & 46 deletions dipu/scripts/autogen_diopi_wrapper/autogen_diopi_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
op_register_template_content,
custom_autograd_template_content,
autocompare_template_content,
op_with_custom_fallback_register_template_content,
)


Expand Down Expand Up @@ -458,6 +457,9 @@ def create_call_aten_cpu_cpp_function_code_from_config(fun_config):
opname = re.sub("\.correction", "", opname)
opname = re.sub("\.input", "", opname)
opname = re.sub("\.dim_IntList", "", opname)
opname = re.sub("\.dim", "", opname)
opname = re.sub("\.mode", "", opname)

opname = opname.replace(".", "_")
opname = opname.split(".")[0]
if opname[-1] == "_" and len(get_function_return_param_from_schema(schema)) > 0:
Expand Down Expand Up @@ -673,10 +675,6 @@ def create_optional_generator_process_code(arg_name):

op_register_template = CodeTemplate(op_register_template_content)

op_with_custom_fallback_register_template = CodeTemplate(
op_with_custom_fallback_register_template_content
)

custom_autograd_template = CodeTemplate(custom_autograd_template_content)

autocompare_template = CodeTemplate(autocompare_template_content)
Expand Down Expand Up @@ -906,7 +904,7 @@ def functions_code_gen(fun_config):
fbody += custom_autograd_function_code
fun_name = wrapper_fun_name

if fun_config.get("autocompare", False) in [True, "True"] and fun_config.get(
if fun_config.get("autocompare") not in [False] and fun_config.get(
"register_op", True
) in [True, "True"]:
auto_compare_fun_name = fun_name + "_autocompare"
Expand Down Expand Up @@ -940,40 +938,17 @@ def functions_code_gen(fun_config):
],
)
fbody += autocompare_code
fun_name = auto_compare_fun_name

if fun_config.get("custom_fallback", False) in ["False", False]:
register_body = op_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],
)
else:
register_body = op_with_custom_fallback_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],
force_fallback=[
(
"false"
if fun_config.get("force_fallback", False) in [False, "False"]
else "true"
)
],
fallbackFunc=[
"dipu::native::"
+ "custom_fallback_"
+ fun_name.replace("_autocompare", "")
],
)

# generate the op_register code
register_body = op_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace("diopi", "::diopi")
],
custom_fallback_config=str(fun_config.get("custom_fallback", False)).lower(),
autocompare_config=str(fun_config.get("autocompare", True)).lower(),
)
return fbody, register_body


Expand Down Expand Up @@ -1039,12 +1014,6 @@ def parse_args():
type=boolean_string,
help="whether generate code that prints op args",
)
parser.add_argument(
"--autocompare",
default=False,
type=boolean_string,
help="whether generate code that compare device calculation results with cpu calculation results",
)
parser.add_argument(
"--fun_config_dict",
type=json.loads,
Expand Down
13 changes: 6 additions & 7 deletions dipu/scripts/autogen_diopi_wrapper/autogen_wrapped_code.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,16 @@
DIPU_DIR=$(readlink -f $(dirname $(readlink -f "$0"))/../..)
AUTOGEN_DIOPI_WRAPPER=$DIPU_DIR/scripts/autogen_diopi_wrapper

USE_AUTOCOMPARE=${1:-OFF}
UsedVendor=${2:-cuda}
Torch_VERSION=${3:-2.1.0}
GENERATED_KERNELS_SCRIPT=${4:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
GENERATED_KERNELS_CONFIG=${5:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
GENERATED_KERNELS=${6:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}
UsedVendor=${1:-cuda}
Torch_VERSION=${2:-2.1.0}
GENERATED_KERNELS_SCRIPT=${3:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
GENERATED_KERNELS_CONFIG=${4:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
GENERATED_KERNELS=${5:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}

GENERATED_KERNELS_VENDOR=${DIPU_DIR}/third_party/DIOPI/impl/${UsedVendor}/convert_config.yaml

PYTHON_CMD="python3 ${GENERATED_KERNELS_SCRIPT} --out=${GENERATED_KERNELS} --config=${GENERATED_KERNELS_CONFIG} \
--autocompare=${USE_AUTOCOMPARE} --print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
--print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
--fun_config_dict='{\"current_device\":\"${UsedVendor}\",\"current_torch_ver\":\"${Torch_VERSION}\"}'"

if [ -f "$GENERATED_KERNELS_VENDOR" ]; then
Expand Down
Loading
Loading