-
Notifications
You must be signed in to change notification settings - Fork 46
feat: support continuous kvcache using virtual memory. #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: support continuous kvcache using virtual memory. #92
Conversation
425339a
to
62fb1c1
Compare
6f37ec9
to
3796d69
Compare
3796d69
to
df37855
Compare
|
||
private: | ||
// these two pointers must be one null and one non-null | ||
BlockManagerPool* block_manager_pool_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you build a ContinousBlockManagerPool class inherits from BlockManagerPool? In this way, this Client is not needed
options[1].num_kv_heads()); | ||
xtensor_options_vec.mutable_value_options()->set_head_size( | ||
options[1].head_size()); | ||
xtensor_options_vec.mutable_value_options()->set_max_context_len( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call function allocate_continuous_kv_cache
directly?
@@ -67,6 +67,10 @@ class BatchInputBuilder { | |||
uint32_t n_kv_cache_tokens, | |||
uint32_t seq_len, | |||
uint32_t q_seq_len); | |||
void setup_continuous_kv_cache_info(Sequence* sequence, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function setup_continuous_kv_cache_info
is not used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be used in func process_single_sequence
, but i forgot...
} | ||
|
||
private: | ||
torch::Tensor key_cache_; | ||
torch::Tensor value_cache_; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe it's better create a new class like class ContinuousKVcache
.
Anyway, it's ok currently. :) ignore.
// TODO: refine this | ||
seq_id = multi_layer_kv_xtensor_.first->allocate_seq_id(); | ||
seq_id = multi_layer_kv_xtensor_.second->allocate_seq_id(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no returned value ?
|
||
namespace xllm { | ||
|
||
bool PageManagerClient::allocate(int32_t& seq_id, size_t num_tokens) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PageManagerClient
is just the wrapper of page_manager_
, could we use page_manager_
directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's like WorkerImpl
and WorkerClient
, i think WorkerClient
is used for single node serving, so i use PageManagerClient
for single node serving.
@@ -119,6 +125,9 @@ class Engine { | |||
// block manager | |||
std::unique_ptr<BlockManagerPool> block_manager_pool_; | |||
|
|||
// page manager | |||
std::unique_ptr<PageManagerPool> page_manager_pool_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a base ManagerPool
class, BlockManagerPool
and PageManagerPool
can inherit from the base class. Then here we can only define a member std::unique_ptr<ManagerPool> manager_pool_;
#include "util/net.h" | ||
|
||
namespace xllm { | ||
void PageManagerServer::create_server(const page::Options& options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we'd better put these files here to their respective directories according to their functions.
like: page_manager_pool.h
to framework/block/
, page_manager_server.h
to core/distribute_runtime
, service
... etc. we can add a subdirectory under these directories like page/
.
Anyway, this can be refactored later, currently is ok.
namespace xllm { | ||
|
||
KVCacheManagerClient::KVCacheManagerClient(BlockManagerPool* block_manager_pool) | ||
: block_manager_pool_(block_manager_pool) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, after we create a base class ManagerPool
, there will be no need to determine whether it is a block
or a page
. We can directly handle it with the base class pointer
. And the KVCacheManagerClient
will also become unnecessary.
No description provided.