Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(dipu): add static_value_array and simplify code #920

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dipu/torch_dipu/csrc_dipu/base/DIPUGlobals.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ void releaseAllResourcesImpl() {
called = true;
releaseAllGenerator();
releaseAllDeviceMem();
releaseAllEvent();
event_pool_clear();
devproxy::finalizeVendor();
}

Expand Down
4 changes: 4 additions & 0 deletions dipu/torch_dipu/csrc_dipu/base/basedef.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,7 @@ const auto DIPU_BACKEND_SPARSE_TYPE =
const auto DICL_BACKEND_NAME = "dicl";

} // namespace dipu

namespace dipu {
auto inline constexpr kMaxDeviceNumber = 16;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥要新建个 ns, 这和上面的不是一样的吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我当时是看到两边命名风格不一样,所以隔开了一下

}
53 changes: 53 additions & 0 deletions dipu/torch_dipu/csrc_dipu/base/utility.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#pragma once

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否应该放到 utils/ 目录下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我也在犹豫放哪里,有没有啥规范或者惯例,还有就是文件名字有没有更好的

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

附议

#include <array>
#include <utility>

namespace dipu {

template <typename T, typename S>
struct make_static_value_array {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要是写新项目,我肯定不让你用下划线类名


template <typename T, std::size_t... I>
struct make_static_value_array<T, std::index_sequence<I...>> {
using value_type = std::decay_t<decltype(T::template value<0>)>;
using type = std::array<value_type, sizeof...(I)>;
type static inline constexpr value{T::template value<I>...};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static inline constexpr 为啥要放到 type 后面? 这是啥规范, 我咋觉得 放到前面更容易懂?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是个人习惯,C++ 的修饰词太多了,auto、static、inline、constexpr、const、thread_local……通常来说我就习惯写成“类型+其它类型标识”的风格,例如“char const *” 这样。

有个好处是“从右往左”看时顺序非常清晰,比如这种情况:char const * const。能很清楚知道最右侧 const 是修饰左边的 char const *,而左侧的 const 只修饰 char

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

类内定义方法和非特化模板函数都是隐式 inline,不需要加,下面也有几个 case 不重复提了

Copy link
Collaborator Author

@wiryls wiryls Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果不写 inline,编译器就会要求这里只声明,需要在类外定义静态成员变量,否则违反 odr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦哦 static 不是方法,下面几个应该是

};

// Create an array of static values. Each type T should provide a static
// 'value()' function to return a reference of the underlying static variable.
//
// In this way, value can be lazily initialized.
template <typename T, std::size_t N>
// Note: in C++17 "inline constexpr" (without "static") is necessary to make
// sure there is only one entity.
auto inline constexpr make_static_value_array_v =
make_static_value_array<T, std::make_index_sequence<N>>::value;

// A CRTP helper to simplify access method of static value holder.
//
// Usage:
//
// ```
// struct integer : static_value_array<foo, 16> {
// template <std::size_t I> auto static value() -> int & {
// auto static instance = 0;
// return instance;
// }
// };
//
// Then we can use `integer::get(1)` to access `integer::value<1>()`.
template <typename T, std::size_t N>
struct static_value_array {
auto static array() -> decltype(auto) {
auto static constexpr instance = make_static_value_array_v<T, N>;
return instance;
}
template <typename... A>
auto static get(std::size_t index, A&&... args) -> decltype(auto) {
return array()[index](std::forward<A>(args)...);
}
};

} // namespace dipu
127 changes: 51 additions & 76 deletions dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEventPool.cpp
Original file line number Diff line number Diff line change
@@ -1,102 +1,77 @@
#include "DIPUEventPool.h"

#include <deque>
#include <functional>
#include <iostream>
#include <mutex>

namespace dipu {
#include "csrc_dipu/base/basedef.h"
#include "csrc_dipu/base/utility.h"
#include "csrc_dipu/runtime/device/deviceapis.h"

template <typename T>
class EventPool final {
protected:
std::deque<T> event_pool_;
unsigned int allocate_num_ = 0;
namespace {

std::function<void(T&)> allocator_;
std::function<void(T&)> deleter_;
using mutex_t = std::recursive_mutex;
mutex_t event_mutex_;
class EventPool {
std::deque<dipu::deviceEvent_t> pool;
std::recursive_mutex mutex;

public:
EventPool(const std::function<void(T&)>& allocator,
const std::function<void(T&)>& deleter)
: allocator_(allocator), deleter_(deleter) {}

EventPool(const EventPool&) = delete;
EventPool(EventPool&&) = delete;
EventPool& operator=(const EventPool&) = delete;
EventPool& operator=(EventPool&&) = delete;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有锁作为成员,EventPool 自动变得无法拷贝和移动,所以我都删掉了


~EventPool() = default;

void release() {
std::lock_guard<mutex_t> _(event_mutex_);
for (auto& event : event_pool_) {
deleter_(event);
allocate_num_--;
}
event_pool_.clear();
}

void get(T& event) {
bool need_allocator = false;
void acquire(dipu::deviceEvent_t& event) {
{
std::lock_guard<mutex_t> _(event_mutex_);
if (event_pool_.empty()) {
need_allocator = true;
} else {
event = event_pool_.back();
event_pool_.pop_back();
std::scoped_lock _(mutex);
if (!pool.empty()) {
event = pool.back();
pool.pop_back();
return;
}
}
if (need_allocator) {
allocator_(event);
}

dipu::devapis::createEvent(&event);
}

void release(dipu::deviceEvent_t& event) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新的名字确实比原来的规范

std::scoped_lock _(mutex);
pool.emplace_back(event);
}

void restore(T& event) {
std::lock_guard<mutex_t> _(event_mutex_);
event_pool_.emplace_back(event);
void clear() { // should it called inside destructor?
std::scoped_lock _(mutex);
for (auto& event : pool) {
dipu::devapis::destroyEvent(event);
}
pool.clear();
}
};

EventPool<deviceEvent_t>* getEventPool() {
const int index = devproxy::current_device();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我是觉得 event pool 不应该调用 devproxy 命名空间的方法,所以把它移到外面了

// GlobalEventPool for different cards , construct when really needed
#define dispatch_event_pool(device_id) \
if (index == (device_id)) { \
static EventPool<deviceEvent_t> gDIPUEventPool( \
[](deviceEvent_t& event) { devapis::createEvent(&event); }, \
[](deviceEvent_t& event) { devapis::destroyEvent(event); }); \
return &gDIPUEventPool; \
struct StaticEventPoolArray
: dipu::static_value_array<StaticEventPoolArray, dipu::kMaxDeviceNumber> {
template <std::size_t I>
inline static auto& value() {
auto static instance = EventPool();
return instance;
}
};

dispatch_event_pool(0);
dispatch_event_pool(1);
dispatch_event_pool(2);
dispatch_event_pool(3);
dispatch_event_pool(4);
dispatch_event_pool(5);
dispatch_event_pool(6);
dispatch_event_pool(7);
dispatch_event_pool(8);
dispatch_event_pool(9);
dispatch_event_pool(10);
dispatch_event_pool(11);
dispatch_event_pool(12);
dispatch_event_pool(13);
dispatch_event_pool(14);
dispatch_event_pool(15);
TORCH_CHECK(false, "support up to 16 cards");
auto event_pool(int index) -> EventPool& {
TORCH_CHECK(0 <= index and index < dipu::kMaxDeviceNumber,
"device index out of range");
return StaticEventPoolArray::get(index);
}

void getEventFromPool(deviceEvent_t& event) { getEventPool()->get(event); }
} // namespace

namespace dipu {

void event_pool_acquire(int index, deviceEvent_t& event) {
event_pool(index).acquire(event);
}

void restoreEventToPool(deviceEvent_t& event) {
getEventPool()->restore(event);
void event_pool_release(int index, deviceEvent_t& event) {
event_pool(index).release(event);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参数加了 index, 上下层的职责划分也比原来明确些。

}

void releaseAllEvent() { getEventPool()->release(); }
void event_pool_clear() {
for (auto& getter : StaticEventPoolArray::array()) {
getter().clear();
}
}

} // namespace dipu
8 changes: 3 additions & 5 deletions dipu/torch_dipu/csrc_dipu/runtime/core/DIPUEventPool.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@

namespace dipu {

void getEventFromPool(deviceEvent_t& event);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实原来的驼峰也没哈问题,真没必要都改成下划线,纯个人习惯问题。。。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实,这里是习惯了

void restoreEventToPool(deviceEvent_t& event);

void releaseAllEvent();
void event_pool_acquire(int index, deviceEvent_t& event);
void event_pool_release(int index, deviceEvent_t& event);
void event_pool_clear();

} // namespace dipu
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
#include <c10/core/Device.h>
#include <c10/util/flat_hash_map.h>

#include "csrc_dipu/base/basedef.h"
#include "csrc_dipu/base/utility.h"
#include "csrc_dipu/runtime/core/DIPUEvent.h"

#include "DIPUAsyncResourcePool.h"
Expand Down Expand Up @@ -213,48 +215,33 @@ struct RawAllocator<at::DeviceType::CPU> {
using type = DIPURawHostAllocator;
};

template <typename AllocatorImpl, class AsyncMemPoolImpl, int device_id>
c10::Allocator* get_allocator_impl(c10::Allocator* raw_allocator) {
// Construct when really needed
// async_mem_pool is used when cache_allocator being destructed so it should
// be destructed after cache_allocator
static AsyncMemPoolImpl async_mem_pool;
static AllocatorImpl cache_allocator;
static int n = [&]() {
cache_allocator.set_raw_allocator(raw_allocator);
cache_allocator.set_async_mem_pool(&async_mem_pool);
return 0;
}();
return &cache_allocator;
}

template <class AllocatorImpl, class AsyncMemPoolImpl>
c10::Allocator* get_allocator(int device_id, c10::Allocator* raw_allocator) {
#define DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(id) \
if (device_id == (id)) { \
return get_allocator_impl<AllocatorImpl, AsyncMemPoolImpl, id>( \
raw_allocator); \
template <typename AllocatorImpl, class AsyncMemPoolImpl>
struct StaticAllocatorArray
: dipu::static_value_array<
StaticAllocatorArray<AllocatorImpl, AsyncMemPoolImpl>,
dipu::kMaxDeviceNumber> {
template <std::size_t I>
inline static c10::Allocator* value(c10::Allocator* raw_allocator) {
// Construct when really needed
// async_mem_pool is used when cache_allocator being destructed so it should
// be destructed after cache_allocator
static AsyncMemPoolImpl async_mem_pool;
static AllocatorImpl cache_allocator;
static int n = [&]() {
cache_allocator.set_raw_allocator(raw_allocator);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 precondition 好强啊,没有任何警示吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块没想到怎么改合适……索性直接复制之前的实现

cache_allocator.set_async_mem_pool(&async_mem_pool);
return 0;
}();
return &cache_allocator;
}
};

DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(0);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(1);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(2);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(3);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(4);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(5);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(6);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(7);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(8);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(9);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(10);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(11);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(12);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(13);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(14);
DIPU_ALLOCATOR_DISPATCH_DEVICE_ID(15);
TORCH_CHECK(false, "support up to 16 cards");
template <class Allocator, class MemPool>
c10::Allocator* get_allocator(int index, c10::Allocator* raw_allocator) {
TORCH_CHECK(0 <= index and index < dipu::kMaxDeviceNumber,
"device index out of range");
return StaticAllocatorArray<Allocator, MemPool>::get(index, raw_allocator);
}
#undef DIPU_ALLOCATOR_DISPATCH_DEVICE_ID

#define DIPU_REGISTER_ALLOCATOR(name, device_type, CachingAllocator, \
algorithm, priority) \
Expand Down
10 changes: 8 additions & 2 deletions dipu/torch_dipu/csrc_dipu/runtime/devproxy/deviceproxy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,15 @@ bool isStreamEmpty(deviceStream_t stream) {
// device event related
// =====================

void createEvent(deviceEvent_t* event) { return getEventFromPool(*event); }
void createEvent(deviceEvent_t* event) {
auto index = current_device();
return event_pool_acquire(index, *event);
}

void destroyEvent(deviceEvent_t event) { return restoreEventToPool(event); }
void destroyEvent(deviceEvent_t event) {
auto index = current_device();
return event_pool_release(index, event);
}

void waitEvent(deviceEvent_t event) { return devapis::waitEvent(event); }

Expand Down