Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zq/add specified autocompare2 #794

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
60eb031
craft
NeosZhang Apr 17, 2024
76d96f6
draft
NeosZhang Apr 18, 2024
63a4ff7
fix
NeosZhang Apr 18, 2024
538cf2f
fix
NeosZhang Apr 18, 2024
5348043
update readme
NeosZhang Apr 19, 2024
6d8a8e3
fix md lint
NeosZhang Apr 19, 2024
4790315
fix cpp lint
NeosZhang Apr 19, 2024
ba73de3
fix readme
NeosZhang Apr 19, 2024
e97db80
fix lint
NeosZhang Apr 19, 2024
8e95f2f
fix
NeosZhang Apr 19, 2024
1ba0e04
rm autcompare CI
NeosZhang Apr 19, 2024
f27200b
add copyright
NeosZhang Apr 19, 2024
e534ce2
fix clang-format
NeosZhang Apr 19, 2024
f2d7ff9
fix clang-tidy
NeosZhang Apr 19, 2024
5598c1b
fix lint
NeosZhang Apr 22, 2024
3c63c9f
Update dipu/QuickStart.md
NeosZhang Apr 22, 2024
5cdba1a
remove ENV USE_GLOBAL_AUTOCOMPARE
NeosZhang Apr 22, 2024
9d9923e
fix
NeosZhang Apr 22, 2024
458233c
fix
NeosZhang Apr 22, 2024
389a267
fix
NeosZhang Apr 23, 2024
c8dbb34
fix readme
NeosZhang Apr 23, 2024
8d40cf9
fix
NeosZhang Apr 24, 2024
d5a38d0
add directMemCopyH2H
NeosZhang Apr 24, 2024
bc8bf8e
ceclear
NeosZhang Apr 25, 2024
ade8db5
reformat func register
NeosZhang Apr 25, 2024
9e18d08
fix clang-format
NeosZhang Apr 25, 2024
e66a97f
fix
NeosZhang Apr 25, 2024
d2b3038
fix
NeosZhang Apr 25, 2024
7061272
fix
NeosZhang Apr 25, 2024
2083444
fix
NeosZhang Apr 25, 2024
9f586ba
test
NeosZhang Apr 25, 2024
c8d9a25
fix
NeosZhang Apr 25, 2024
655332a
fix test_fallback
NeosZhang Apr 26, 2024
87497aa
fix py-lint
NeosZhang Apr 26, 2024
b1cf1f4
fix
NeosZhang Apr 26, 2024
f550add
fix const_var name
NeosZhang Apr 26, 2024
94a4000
fix register macro name, use CUSTOMFALLBACK, instead of FALLBACK"
NeosZhang Apr 26, 2024
f624f36
fix comment
NeosZhang Apr 26, 2024
95ac914
fix
NeosZhang Apr 26, 2024
d575f7f
fix autocompare for _amp_foreach_non_finite_check_and_unscale_
NeosZhang Apr 26, 2024
43e2e84
fix const var name with Google style
NeosZhang Apr 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 0 additions & 15 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -353,21 +353,6 @@ jobs:
source scripts/ci/ascend/ci_ascend_env.sh
bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
|| ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )

Build-Ascend-910b-with-autocompare:
name: Build-dipu-ascend-910b-with-autocompare
needs: [Build-PyTorch-For-Ascend-910b]
runs-on: tps-ascend-ci-910b
steps:
- name: Build dipu
run: |
set -ex
export USE_COVERAGE=ON
export USE_AUTOCOMPARE=ON
cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && cp -R source ${GITHUB_JOB} && cd ${GITHUB_JOB}/dipu
source scripts/ci/ascend/ci_ascend_env.sh
bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
|| ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )

Test-Ascend-910b:
name: Test-dipu-ascend-910b
Expand Down
50 changes: 33 additions & 17 deletions dipu/QuickStart.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,9 +158,10 @@ sh ./tests/python/run_tests.sh

### 算子库拓展功能

#### 算子 Fallback
#### 算子Fallback功能

Fallback 给定算子:
Fallback指的是使用算子的CPU实现,而非设备实现。
Fallback给定算子:

```bash
export DIPU_FORCE_FALLBACK_OPS_LIST=add.out,conv2d
Expand All @@ -181,20 +182,13 @@ export DIPU_FORCE_FALLBACK_OPS_LIST='.*'
python -c "import torch_dipu"
```

#### 算子精度自动对比功能介绍
#### 算子精度自动对比功能

由于该功能默认不开启,使用该功能时需要打开该功能并重新编译DIPU。

可以通过设置环境变量USE_AUTOCOMPARE=ON,来开启该功能,然后需要重新编译DIPU。

```shell
export USE_AUTOCOMPARE=ON
```

以上方法是对所有算子开启自动精度对比。如果只需要对特定算子做精度对比,也可只给需要的算子做精度对比,只需要在相关的配置文件(如 `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml`)给相应的算子添加 `autocompare: True` 即可。
算子精度自动对比功能(autocompare)用于确保算子计算结果的正确性,通过将设备参数拷贝到CPU上,对比CPU和设备的计算结果来判断精度是否达标。以下是算子精度自动对比功能的使用例子:

```shell
$ unset DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 cpu, 可选
$ unset DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 CPU, 可选
$ export DIPU_AUTOCOMPARE_OPS_LIST=add.out # 对add.out算子开启autocompare功能
$ python
>>> import torch
>>> import torch_dipu
Expand All @@ -220,11 +214,33 @@ autocompare: add.out other: allclose
>>>
```

可以看到,CPU 计算结果与设备计算结果 `allclose`,也能看到 CPU 和设备计算结果的 `shape`、`dtype` 等信息。特别的,需要注意以下几个问题:
可以看到,输出包括 CPU 和设备计算结果的 `shape`、`stride`、`dtype` 等信息, 最终结果是CPU和设备的self和out都是allclose的。

##### 算子精度自动对比功能的设置

算子精度自动对比功能默认不开启,可以设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST`来控制该功能,在开启算子自动对比功能前,必须unset `DIPU_FORCE_FALLBACK_OPS_LIST`

- 可以通过设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST='.*'`,开启全局的精度对比,这种情况下所有调用的算子都会进行精度对比。

```shell
# 开启全局的算子精度自动对比功能
export DIPU_AUTOCOMPARE_OPS_LIST='.*'
```

- 可以设置`DIPU_AUTOCOMPARE_OPS_LIST`来指定算子开启自动精度对比,支持正则表达式匹配,也可以指定多个算子开启自动精度对比。算子名可以参考[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml)。

```shell
# 指定匹配add.*?的算子进行自动精度对比
export DIPU_AUTOCOMPARE_OPS_LIST=add.*?
# 指定add.out、sub.out算子进行自动精度对比
export DIPU_AUTOCOMPARE_OPS_LIST="add.out, sub.out"
```

NOTE:

1. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss`、`conv2d`、`dropout`、`dropout_`、`linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐,可根据需要先将这几个算子 fallback 到 CPU 来确定问题
2. 随机数生成相关的算子(`dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autocompare:False`)没有做 `autocompare`,因为结果总是 `not_allclose`
3. 对输入做检查是确保算子输入不被意外修改
1. 部分算子并不支持自动精度对比功能,可以查看[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml),其中的`autocompare`配置项为`disable`即不支持自动精度对比功能,同时也可以修改`diopi_functions.yaml`,将某些算子的`autocompare`配置项设置为`disable`来禁用自动对比功能
2. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss`、`conv2d`、`dropout`、`dropout_`、`linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐,可根据需要先将这几个算子 fallback 到 CPU 来确定问题
3. 对输入参数(self)做检查是确保算子的输入不被意外修改

#### 抓取算子参数

Expand Down
142 changes: 98 additions & 44 deletions dipu/scripts/autogen_diopi_wrapper/autogen_diopi_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@
from diopi_wrapper_template import (
diopi_wrapper_file_template_content,
diopi_wrapper_function_template_content,
op_register_template_content,
op_no_customfallback_with_autocompare_register_template_content,
op_no_customfallback_no_autocompare_register_template_content,
custom_autograd_template_content,
autocompare_template_content,
op_with_custom_fallback_register_template_content,
op_with_customfallback_with_autocompare_register_template_content,
op_with_customfallback_no_autocompare_register_template_content,
)


Expand Down Expand Up @@ -671,10 +673,20 @@ def create_optional_generator_process_code(arg_name):

fun_template = CodeTemplate(diopi_wrapper_function_template_content)

op_register_template = CodeTemplate(op_register_template_content)
op_no_customfallback_with_autocompare_register_template = CodeTemplate(
op_no_customfallback_with_autocompare_register_template_content
)

op_no_customfallback_no_autocompare_register_template = CodeTemplate(
op_no_customfallback_no_autocompare_register_template_content
)

op_with_customfallback_with_autocompare_register_template = CodeTemplate(
op_with_customfallback_with_autocompare_register_template_content
)

op_with_custom_fallback_register_template = CodeTemplate(
op_with_custom_fallback_register_template_content
op_with_customfallback_no_autocompare_register_template = CodeTemplate(
op_with_customfallback_no_autocompare_register_template_content
)

custom_autograd_template = CodeTemplate(custom_autograd_template_content)
Expand Down Expand Up @@ -906,7 +918,7 @@ def functions_code_gen(fun_config):
fbody += custom_autograd_function_code
fun_name = wrapper_fun_name

if fun_config.get("autocompare", False) in [True, "True"] and fun_config.get(
if fun_config.get("autocompare") not in ["disable"] and fun_config.get(
"register_op", True
) in [True, "True"]:
auto_compare_fun_name = fun_name + "_autocompare"
Expand Down Expand Up @@ -940,40 +952,88 @@ def functions_code_gen(fun_config):
],
)
fbody += autocompare_code
fun_name = auto_compare_fun_name

if fun_config.get("custom_fallback", False) in ["False", False]:
register_body = op_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],

# generate the OP_register code
# case 1: custom_fallback=False and autocompare not disabled
register_body = ""
if fun_config.get("custom_fallback", False) in ["False", False] and fun_config.get(
"autocompare", True
) in ["True", True]:
register_body = (
op_no_customfallback_with_autocompare_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],
)
)
else:
register_body = op_with_custom_fallback_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],
force_fallback=[
(
"false"
if fun_config.get("force_fallback", False) in [False, "False"]
else "true"
)
],
fallbackFunc=[
"dipu::native::"
+ "custom_fallback_"
+ fun_name.replace("_autocompare", "")
],

# case2: custom_fallback=False and autocompare=disabled
elif fun_config.get("custom_fallback", False) in [
"False",
False,
] and fun_config.get("autocompare") in ["disable"]:
register_body = (
op_no_customfallback_no_autocompare_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],
)
)
# case3: custom_fallback=True and autocompare not disabled
elif fun_config.get("custom_fallback", False) in ["True", True] and fun_config.get(
"autocompare", True
) in ["True", True]:
register_body = (
op_with_customfallback_with_autocompare_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],
force_fallback=[
(
"false"
if fun_config.get("force_fallback", False) in [False, "False"]
else "true"
)
],
fallbackFunc=["dipu::native::" + "custom_fallback_" + fun_name],
)
)
# case4: custom_fallback=True and autocompare disabled
elif fun_config.get("custom_fallback", False) in ["True", True] and fun_config.get(
"autocompare", True
) in ["disable"]:
register_body = (
op_with_customfallback_no_autocompare_register_template.substitute(
register_name=[get_op_name_from_schema(fun_config["schema"])],
aten_fun_name=["dipu::native::" + fun_name],
diopi_fun_name=[
get_fun_name_from_cppsignature(diopi_interface).replace(
"diopi", "::diopi"
)
],
force_fallback=[
(
"false"
if fun_config.get("force_fallback", False) in [False, "False"]
else "true"
)
],
fallbackFunc=["dipu::native::" + "custom_fallback_" + fun_name],
)
)

return fbody, register_body


Expand Down Expand Up @@ -1039,12 +1099,6 @@ def parse_args():
type=boolean_string,
help="whether generate code that prints op args",
)
parser.add_argument(
"--autocompare",
default=False,
type=boolean_string,
help="whether generate code that compare device calculation results with cpu calculation results",
)
parser.add_argument(
"--fun_config_dict",
type=json.loads,
Expand Down
13 changes: 6 additions & 7 deletions dipu/scripts/autogen_diopi_wrapper/autogen_wrapped_code.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,16 @@
DIPU_DIR=$(readlink -f $(dirname $(readlink -f "$0"))/../..)
AUTOGEN_DIOPI_WRAPPER=$DIPU_DIR/scripts/autogen_diopi_wrapper

USE_AUTOCOMPARE=${1:-OFF}
UsedVendor=${2:-cuda}
Torch_VERSION=${3:-2.1.0}
GENERATED_KERNELS_SCRIPT=${4:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
GENERATED_KERNELS_CONFIG=${5:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
GENERATED_KERNELS=${6:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}
UsedVendor=${1:-cuda}
Torch_VERSION=${2:-2.1.0}
GENERATED_KERNELS_SCRIPT=${3:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
GENERATED_KERNELS_CONFIG=${4:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
GENERATED_KERNELS=${5:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}

GENERATED_KERNELS_VENDOR=${DIPU_DIR}/third_party/DIOPI/impl/${UsedVendor}/convert_config.yaml

PYTHON_CMD="python3 ${GENERATED_KERNELS_SCRIPT} --out=${GENERATED_KERNELS} --config=${GENERATED_KERNELS_CONFIG} \
--autocompare=${USE_AUTOCOMPARE} --print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
--print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
--fun_config_dict='{\"current_device\":\"${UsedVendor}\",\"current_torch_ver\":\"${Torch_VERSION}\"}'"

if [ -f "$GENERATED_KERNELS_VENDOR" ]; then
Expand Down
7 changes: 1 addition & 6 deletions dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2748,7 +2748,6 @@

# this copy_ aten op may use both diopiCastDtype and diopiCopyInp. it's a proxy/composite op
- schema: copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)
autocompare: disable
dummy_call_diopi: True
custom_fallback: True
device: [cuda, camb, ascend, droplet, supa, kunlunxin]
Expand All @@ -2760,16 +2759,14 @@

# vendor who has no fully implemented diopi and proper fallback DIPUCopy sub-class
- schema: copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)
autocompare: disable
custom_fallback: True
dummy_call_diopi: True
custom_code_at_the_beginning: |
return custom_fallback_dipu_copy_(self, src, non_blocking);
device: [topsrider]
interface: diopiCopyInp(ctx, src, self)

- schema: _amp_foreach_non_finite_check_and_unscale_(at::TensorList self, Tensor(b!) found_inf, Tensor inv_scale) -> void
autocompare: disable
- schema: _amp_foreach_non_finite_check_and_unscale_(at::TensorList self, Tensor(b!) found_inf, Tensor inv_scale) -> ()
custom_fallback: True
custom_code_at_the_beginning: |
std::vector<diopiTensorHandle_t> diopiTensorHandles(self.size(), nullptr);
Expand All @@ -2780,8 +2777,6 @@
});
// NOLINTEND(cppcoreguidelines-pro-type-const-cast)
interface: diopiAmpForeachNonFiniteCheckAndUnscaleInp(ctx, diopiTensorHandles.data(), static_cast<int64_t>(self.size()), found_inf, inv_scale)
# TODO(someone): fix this issue when `autocompare` is on
autocompare: disable

- schema: _amp_update_scale_(Tensor(a!) self, Tensor(b!) growth_tracker, Tensor found_inf, float scale_growth_factor, float scale_backoff_factor, int growth_interval) -> Tensor(a!)
custom_fallback: True
Expand Down
17 changes: 13 additions & 4 deletions dipu/scripts/autogen_diopi_wrapper/diopi_wrapper_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
#include "csrc_dipu/aten/ops/DIPUCopy.hpp"
#include "csrc_dipu/aten/ops/NodispatchUtils.hpp"
#include "csrc_dipu/aten/ops/OpUtils.hpp"
#include "csrc_dipu/aten/ops/OpRegexMatch.hpp"
#include "csrc_dipu/base/basedef.h"
#include "csrc_dipu/diopirt/diopirt_impl.h"
#include "csrc_dipu/profiler/profiler.h"
Expand Down Expand Up @@ -127,12 +128,20 @@
}
"""

op_register_template_content = """
DIOPI_ATEN_FUNC("$register_name", $diopi_fun_name, $aten_fun_name);
op_no_customfallback_with_autocompare_register_template_content = """
NO_CUSTOMFALLBACK_WITH_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $aten_fun_name);
"""

op_with_custom_fallback_register_template_content = """
DIOPI_ATEN_FUNC_CUSTOM_FALLBACK("$register_name", $diopi_fun_name, $force_fallback /*whether force fallback*/, $aten_fun_name, $fallbackFunc);
op_no_customfallback_no_autocompare_register_template_content = """
NO_CUSTOMFALLBACK_NO_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $aten_fun_name);
"""

op_with_customfallback_with_autocompare_register_template_content = """
WITH_CUSTOMFALLBACK_WITH_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $force_fallback /*whether force fallback*/, $aten_fun_name, $fallbackFunc);
"""

op_with_customfallback_no_autocompare_register_template_content = """
WITH_CUSTOMFALLBACK_NO_AUTOCOMPARE_REGISTER("$register_name", $diopi_fun_name, $force_fallback /*whether force fallback*/, $aten_fun_name, $fallbackFunc);
"""

custom_autograd_template_content = """
Expand Down
6 changes: 0 additions & 6 deletions dipu/scripts/ci/ascend/ci_ascend_script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,13 @@ function build_diopi_lib() {
function config_dipu_ascend_cmake() {
mkdir -p build && cd ./build
cmake_args="-DCMAKE_BUILD_TYPE=Release -DDEVICE=ascend -DWITH_DIOPI_LIBRARY=DISABLE"
if [ -n "$USE_AUTOCOMPARE" ]; then
cmake_args+=" -DUSE_AUTOCOMPARE=${USE_AUTOCOMPARE}"
fi
cmake ../ $cmake_args
cd ../
}

function config_all_ascend_cmake() {
mkdir -p build && cd ./build
cmake_args="-DCMAKE_BUILD_TYPE=Release -DDEVICE=ascend -DENABLE_COVERAGE=${USE_COVERAGE} -DWITH_DIOPI=INTERNAL"
if [ -n "$USE_AUTOCOMPARE" ]; then
cmake_args+=" -DUSE_AUTOCOMPARE=${USE_AUTOCOMPARE}"
fi
cmake ../ $cmake_args
cd ../
}
Expand Down
Loading