DeepLink-org · NeosZhang · Apr 17, 2024 · Apr 18, 2024 · Apr 18, 2024 · Apr 18, 2024
@@ -353,21 +353,6 @@ jobs:
           source scripts/ci/ascend/ci_ascend_env.sh
           bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
           || ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )
-
-  Build-Ascend-910b-with-autocompare:
-    name: Build-dipu-ascend-910b-with-autocompare
-    needs: [Build-PyTorch-For-Ascend-910b]
-    runs-on: tps-ascend-ci-910b
-    steps:
-      - name: Build dipu
-        run: |
-          set -ex
-          export USE_COVERAGE=ON
-          export USE_AUTOCOMPARE=ON
-          cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && cp -R source ${GITHUB_JOB}  && cd ${GITHUB_JOB}/dipu
-          source scripts/ci/ascend/ci_ascend_env.sh
-          bash scripts/ci/ascend/ci_ascend_script.sh build_dipu \
-          || ( cd ${DEEPLINK_PATH}/${GITHUB_RUN_NUMBER}/ && rm -rf ${GITHUB_JOB} && exit 1 )
 
   Test-Ascend-910b:
     name: Test-dipu-ascend-910b

@@ -158,9 +158,10 @@ sh ./tests/python/run_tests.sh
 
 ### 算子库拓展功能
 
-#### 算子 Fallback
+#### 算子Fallback功能
 
-Fallback 给定算子：
+Fallback指的是使用算子的CPU实现，而非设备实现。  
+Fallback给定算子：
 
 ```bash
 export DIPU_FORCE_FALLBACK_OPS_LIST=add.out,conv2d
@@ -181,20 +182,13 @@ export DIPU_FORCE_FALLBACK_OPS_LIST='.*'
 python -c "import torch_dipu"
 ```
 
-#### 算子精度自动对比功能介绍
+#### 算子精度自动对比功能
 
-由于该功能默认不开启，使用该功能时需要打开该功能并重新编译DIPU。
-
-可以通过设置环境变量USE_AUTOCOMPARE=ON，来开启该功能，然后需要重新编译DIPU。
-
-```shell
-export USE_AUTOCOMPARE=ON
-```
-
-以上方法是对所有算子开启自动精度对比。如果只需要对特定算子做精度对比，也可只给需要的算子做精度对比，只需要在相关的配置文件（如 `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml`）给相应的算子添加 `autocompare: True` 即可。
+算子精度自动对比功能(autocompare)用于确保算子计算结果的正确性，通过将设备参数拷贝到CPU上，对比CPU和设备的计算结果来判断精度是否达标。以下是算子精度自动对比功能的使用例子：
 
 ```shell
-$ unset  DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 cpu, 可选
+$ unset  DIPU_FORCE_FALLBACK_OPS_LIST # 主要是确保要比较的算子没有强制 fallback 到 CPU, 可选
+$ export DIPU_AUTOCOMPARE_OPS_LIST=add.out # 对add.out算子开启autocompare功能
 $ python
 >>> import torch
 >>> import torch_dipu
@@ -220,11 +214,33 @@ autocompare:    add.out other: allclose
 >>>
 ```
 
-可以看到，CPU 计算结果与设备计算结果 `allclose`，也能看到 CPU 和设备计算结果的 `shape`、`dtype` 等信息。特别的，需要注意以下几个问题：
+可以看到，输出包括 CPU 和设备计算结果的 `shape`、`stride`、`dtype` 等信息， 最终结果是CPU和设备的self和out都是allclose的。
+
+##### 算子精度自动对比功能的设置
+
+算子精度自动对比功能默认不开启，可以设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST`来控制该功能，在开启算子自动对比功能前，必须unset  `DIPU_FORCE_FALLBACK_OPS_LIST`
+
+- 可以通过设置环境变量`DIPU_AUTOCOMPARE_OPS_LIST='.*'`，开启全局的精度对比，这种情况下所有调用的算子都会进行精度对比。
+
+```shell
+# 开启全局的算子精度自动对比功能
+export DIPU_AUTOCOMPARE_OPS_LIST='.*'
+```
+
+- 可以设置`DIPU_AUTOCOMPARE_OPS_LIST`来指定算子开启自动精度对比，支持正则表达式匹配，也可以指定多个算子开启自动精度对比。算子名可以参考[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml)。
+
+```shell
+# 指定匹配add.*?的算子进行自动精度对比
+export DIPU_AUTOCOMPARE_OPS_LIST=add.*?
+# 指定add.out、sub.out算子进行自动精度对比
+export DIPU_AUTOCOMPARE_OPS_LIST="add.out, sub.out"
+```
+
+NOTE:
 
-1. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss`、`conv2d`、`dropout`、`dropout_`、`linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐，可根据需要先将这几个算子 fallback 到 CPU 来确定问题。
-2. 随机数生成相关的算子（`dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autocompare:False`）没有做 `autocompare`，因为结果总是 `not_allclose`。
-3. 对输入做检查是确保算子输入不被意外修改。
+1. 部分算子并不支持自动精度对比功能，可以查看[diopi_functions.yaml](https://github.com/DeepLink-org/deeplink.framework/blob/main/dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml)，其中的`autocompare`配置项为`disable`即不支持自动精度对比功能，同时也可以修改`diopi_functions.yaml`，将某些算子的`autocompare`配置项设置为`disable`来禁用自动对比功能。
+2. `dipu/scripts/autogen_diopi_wrapper/diopi_functions.yaml` 中配置了 `autograd:True` 的算子 (`cross_entropy_loss`、`conv2d`、`dropout`、`dropout_`、`linear`) 暂不支持 *backward* 的精度自动对比。如模型精度对不齐，可根据需要先将这几个算子 fallback 到 CPU 来确定问题。
+3. 对输入参数(self)做检查是确保算子的输入不被意外修改。
 
 #### 抓取算子参数
 

@@ -11,7 +11,6 @@
     op_register_template_content,
     custom_autograd_template_content,
     autocompare_template_content,
-    op_with_custom_fallback_register_template_content,
 )
 
 
@@ -458,6 +457,9 @@ def create_call_aten_cpu_cpp_function_code_from_config(fun_config):
     opname = re.sub("\.correction", "", opname)
     opname = re.sub("\.input", "", opname)
     opname = re.sub("\.dim_IntList", "", opname)
+    opname = re.sub("\.dim", "", opname)
+    opname = re.sub("\.mode", "", opname)
+
     opname = opname.replace(".", "_")
     opname = opname.split(".")[0]
     if opname[-1] == "_" and len(get_function_return_param_from_schema(schema)) > 0:
@@ -673,10 +675,6 @@ def create_optional_generator_process_code(arg_name):
 
 op_register_template = CodeTemplate(op_register_template_content)
 
-op_with_custom_fallback_register_template = CodeTemplate(
-    op_with_custom_fallback_register_template_content
-)
-
 custom_autograd_template = CodeTemplate(custom_autograd_template_content)
 
 autocompare_template = CodeTemplate(autocompare_template_content)
@@ -906,7 +904,7 @@ def functions_code_gen(fun_config):
         fbody += custom_autograd_function_code
         fun_name = wrapper_fun_name
 
-    if fun_config.get("autocompare", False) in [True, "True"] and fun_config.get(
+    if fun_config.get("autocompare") not in [False] and fun_config.get(
         "register_op", True
     ) in [True, "True"]:
         auto_compare_fun_name = fun_name + "_autocompare"
@@ -940,40 +938,17 @@ def functions_code_gen(fun_config):
             ],
         )
         fbody += autocompare_code
-        fun_name = auto_compare_fun_name
-
-    if fun_config.get("custom_fallback", False) in ["False", False]:
-        register_body = op_register_template.substitute(
-            register_name=[get_op_name_from_schema(fun_config["schema"])],
-            aten_fun_name=["dipu::native::" + fun_name],
-            diopi_fun_name=[
-                get_fun_name_from_cppsignature(diopi_interface).replace(
-                    "diopi", "::diopi"
-                )
-            ],
-        )
-    else:
-        register_body = op_with_custom_fallback_register_template.substitute(
-            register_name=[get_op_name_from_schema(fun_config["schema"])],
-            aten_fun_name=["dipu::native::" + fun_name],
-            diopi_fun_name=[
-                get_fun_name_from_cppsignature(diopi_interface).replace(
-                    "diopi", "::diopi"
-                )
-            ],
-            force_fallback=[
-                (
-                    "false"
-                    if fun_config.get("force_fallback", False) in [False, "False"]
-                    else "true"
-                )
-            ],
-            fallbackFunc=[
-                "dipu::native::"
-                + "custom_fallback_"
-                + fun_name.replace("_autocompare", "")
-            ],
-        )
+
+    # generate the op_register code
+    register_body = op_register_template.substitute(
+        register_name=[get_op_name_from_schema(fun_config["schema"])],
+        aten_fun_name=["dipu::native::" + fun_name],
+        diopi_fun_name=[
+            get_fun_name_from_cppsignature(diopi_interface).replace("diopi", "::diopi")
+        ],
+        custom_fallback_config=str(fun_config.get("custom_fallback", False)).lower(),
+        autocompare_config=str(fun_config.get("autocompare", True)).lower(),
+    )
     return fbody, register_body
 
 
@@ -1039,12 +1014,6 @@ def parse_args():
         type=boolean_string,
         help="whether generate code that prints op args",
     )
-    parser.add_argument(
-        "--autocompare",
-        default=False,
-        type=boolean_string,
-        help="whether generate code that compare device calculation results with cpu calculation results",
-    )
     parser.add_argument(
         "--fun_config_dict",
         type=json.loads,

@@ -5,17 +5,16 @@
 DIPU_DIR=$(readlink -f $(dirname $(readlink -f "$0"))/../..)
 AUTOGEN_DIOPI_WRAPPER=$DIPU_DIR/scripts/autogen_diopi_wrapper
 
-USE_AUTOCOMPARE=${1:-OFF}
-UsedVendor=${2:-cuda}
-Torch_VERSION=${3:-2.1.0}
-GENERATED_KERNELS_SCRIPT=${4:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
-GENERATED_KERNELS_CONFIG=${5:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
-GENERATED_KERNELS=${6:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}
+UsedVendor=${1:-cuda}
+Torch_VERSION=${2:-2.1.0}
+GENERATED_KERNELS_SCRIPT=${3:-$AUTOGEN_DIOPI_WRAPPER/autogen_diopi_wrapper.py}
+GENERATED_KERNELS_CONFIG=${4:-$AUTOGEN_DIOPI_WRAPPER/diopi_functions.yaml}
+GENERATED_KERNELS=${5:-$DIPU_DIR/torch_dipu/csrc_dipu/aten/ops/AutoGenedKernels.cpp}
 
 GENERATED_KERNELS_VENDOR=${DIPU_DIR}/third_party/DIOPI/impl/${UsedVendor}/convert_config.yaml
 
 PYTHON_CMD="python3 ${GENERATED_KERNELS_SCRIPT} --out=${GENERATED_KERNELS} --config=${GENERATED_KERNELS_CONFIG} \
-    --autocompare=${USE_AUTOCOMPARE} --print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
+    --print_op_arg=True --use_diopi_adapter=False --print_func_call_info=True \
     --fun_config_dict='{\"current_device\":\"${UsedVendor}\",\"current_torch_ver\":\"${Torch_VERSION}\"}'"
 
 if [ -f "$GENERATED_KERNELS_VENDOR" ]; then