Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(dipu): move some env vars to environs.hpp #911

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
27 changes: 27 additions & 0 deletions dipu/torch_dipu/csrc_dipu/base/environ.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,12 +86,39 @@ T getEnvOrDefault(const char* env_var, U&& default_value,
// applyDelayedRegister() is called.
DIPU_ENV_VAR(immediateRegisterOp, "DIPU_IMMEDIATE_REGISTER_OP", bool, false);
inline const std::string kTorchAllocatorName = "TORCH";

// Determine the name of the host memory cache algorithm
// based on the current environment configuration.
Comment on lines +90 to +91
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Determine the name of the host memory cache algorithm
// based on the current environment configuration.
// Host's memory caching algorithm.
// Candidates: TORCH, BF, BS, RAW

DIPU_ENV_VAR(hostMemCachingAlgorithm, "DIPU_HOST_MEMCACHING_ALGORITHM",
std::string, kTorchAllocatorName);

// Used to specify the name of the device memory cache algorithm.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Used to specify the name of the device memory cache algorithm.
// Devices' memory caching algorithm.
// Candidates: TORCH, BF, BS, RAW

DIPU_ENV_VAR(deviceMemCachingAlgorithm, "DIPU_DEVICE_MEMCACHING_ALGORITHM",
std::string, kTorchAllocatorName);

// Used to configure and initialize an instance of an object
// "CachingAllocatorConfig".
Comment on lines +99 to +100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Used to configure and initialize an instance of an object
// "CachingAllocatorConfig".
// Same as TORCH_ALLOCATOR_CONF in PyTorch.
// Only works with TORCH caching algorithm.

DIPU_ENV_VAR(torchAllocatorConf, "DIPU_TORCH_ALLOCATOR_CONF", std::string, "");

// maxExtendSize is used to limit the maximum size of an extension
// in the memory allocation in function of "extend()".
DIPU_ENV_VAR(maxExtendSize, "DIPU_MAX_EXTEND_SIZE", std::size_t, 1024);

// Configure a value to limit the maximum length of the asynchronous resource
// pool to avoid resource leakage and optimize resource management.
inline const std::size_t kDefaultMaxAsyncResourcePoolLength = 96;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline const std::size_t kDefaultMaxAsyncResourcePoolLength = 96;
constexpr std::size_t kDefaultMaxAsyncResourcePoolLength = 96;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有些地方定义了常量,有些地方直接写了默认值,建议统一起来

DIPU_ENV_VAR(maxAsyncResourcePoolLength, "DIPU_MAX_ASYNC_RESOURCE_POOL_LENGTH",
std::size_t, kDefaultMaxAsyncResourcePoolLength);

// Control whether to force the use of back-off mode for P2P copy operation
// between Ascend chips.
Comment on lines +113 to +114
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Control whether to force the use of back-off mode for P2P copy operation
// between Ascend chips.
// Whether to force the use of back-off mode for P2P copy operation between
// Ascend chips.

DIPU_ENV_VAR(forceFallbackP2pCopybetweenascends,
"DIPU_FORCE_FALLBACK_ASCEND_P2P_COPY", bool, false);

// Configure a numerical value to control the device 's affinity settings
// on the CPU to optimize thread scheduling during concurrent execution.
Comment on lines +118 to +119
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Configure a numerical value to control the device 's affinity settings
// on the CPU to optimize thread scheduling during concurrent execution.
// Devices' CPU affinity settings.
// >0: specifies the number of adjacent CPU cores bound to each device
// =0: auto determine
// <0: affinity disabled

DIPU_ENV_VAR(affinityCpuAffinit, "DIPU_CPU_AFFINITY", int, 0);

#undef DIPU_ENV_VAR

} // namespace dipu::environ
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include <utility>
#include <vector>

#include "csrc_dipu/base/environ.hpp"
#include "csrc_dipu/utils/env.hpp"

#include "DIPUCachingAllocator.h"
Expand All @@ -15,8 +16,7 @@
namespace dipu {

// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
const size_t kMaxExtendSize = get_env_or_default("DIPU_MAX_EXTEND_SIZE", 1024)
<< 20U;
const size_t kMaxExtendSize = environ::maxExtendSize() << 20U;

class BFCachingAllocatorImpl {
public:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,9 @@ namespace dipu {
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
std::mutex DIPURawDeviceAllocator::mutex_;

constexpr size_t kDefaultMaxAsyncResourcePoolLength = 96;
// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)
const size_t kMaxAsyncResourcePoolLength = get_env_or_default(
"DIPU_MAX_ASYNC_RESOURCE_POOL_LENGTH", kDefaultMaxAsyncResourcePoolLength);
const size_t kMaxAsyncResourcePoolLength =
environ::maxAsyncResourcePoolLength();

namespace {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

#include <c10/util/Exception.h>

#include "csrc_dipu/base/environ.hpp"
#include "csrc_dipu/runtime/core/DIPUEventPool.h"
#include "csrc_dipu/runtime/core/allocator/allocator_metrics.h"
#include "csrc_dipu/runtime/device/basedef.h"
Expand Down Expand Up @@ -64,7 +65,8 @@ deviceId_t current_device() {
}

void setCpuAffinity(const int device) {
static int affinity = get_env_or_default("DIPU_CPU_AFFINITY", 0);
static int affinity = environ::affinityCpuAffinit();

if (affinity < 0) {
return;
}
Expand Down
4 changes: 2 additions & 2 deletions dipu/torch_dipu/csrc_dipu/vendor/ascend/deviceimpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include <c10/util/Exception.h>
#include <torch/csrc/distributed/c10d/Work.hpp>

#include "csrc_dipu/base/environ.hpp"
#include "csrc_dipu/runtime/device/basedef.h"
#include "csrc_dipu/utils/env.hpp"
#include <csrc_dipu/common.h>
Expand All @@ -30,8 +31,7 @@ using AscendDeviceId = int32_t;

namespace {

const bool forceFallbackP2PCopy =
get_env_or_default("DIPU_FORCE_FALLBACK_ASCEND_P2P_COPY", false);
const bool forceFallbackP2PCopy = environ::forceFallbackP2pCopybetweenascends();

class NpuP2PInfo {
enum class P2pStatus : int8_t {
Expand Down
Loading