Skip to content

Conversation

XuZhang99
Copy link
Collaborator

No description provided.

@XuZhang99 XuZhang99 force-pushed the feature/continuous_kvcache branch 2 times, most recently from 425339a to 62fb1c1 Compare September 4, 2025 13:28
@XuZhang99 XuZhang99 changed the title feat: support continuous kvcache using virtual memory. [WIP] feat: support continuous kvcache using virtual memory. Sep 4, 2025
@XuZhang99 XuZhang99 changed the title [WIP] feat: support continuous kvcache using virtual memory. feat: support continuous kvcache using virtual memory. Sep 4, 2025
@XuZhang99 XuZhang99 force-pushed the feature/continuous_kvcache branch 5 times, most recently from 6f37ec9 to 3796d69 Compare September 9, 2025 06:58
@XuZhang99 XuZhang99 force-pushed the feature/continuous_kvcache branch from 3796d69 to df37855 Compare September 9, 2025 07:03

private:
// these two pointers must be one null and one non-null
BlockManagerPool* block_manager_pool_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you build a ContinousBlockManagerPool class inherits from BlockManagerPool? In this way, this Client is not needed

options[1].num_kv_heads());
xtensor_options_vec.mutable_value_options()->set_head_size(
options[1].head_size());
xtensor_options_vec.mutable_value_options()->set_max_context_len(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call function allocate_continuous_kv_cache directly?

@@ -67,6 +67,10 @@ class BatchInputBuilder {
uint32_t n_kv_cache_tokens,
uint32_t seq_len,
uint32_t q_seq_len);
void setup_continuous_kv_cache_info(Sequence* sequence,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function setup_continuous_kv_cache_info is not used ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be used in func process_single_sequence, but i forgot...

}

private:
torch::Tensor key_cache_;
torch::Tensor value_cache_;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it's better create a new class like class ContinuousKVcache.

Anyway, it's ok currently. :) ignore.

// TODO: refine this
seq_id = multi_layer_kv_xtensor_.first->allocate_seq_id();
seq_id = multi_layer_kv_xtensor_.second->allocate_seq_id();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no returned value ?


namespace xllm {

bool PageManagerClient::allocate(int32_t& seq_id, size_t num_tokens) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PageManagerClient is just the wrapper of page_manager_, could we use page_manager_ directly ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's like WorkerImpl and WorkerClient, i think WorkerClient is used for single node serving, so i use PageManagerClient for single node serving.

@@ -119,6 +125,9 @@ class Engine {
// block manager
std::unique_ptr<BlockManagerPool> block_manager_pool_;

// page manager
std::unique_ptr<PageManagerPool> page_manager_pool_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a base ManagerPool class, BlockManagerPool and PageManagerPool can inherit from the base class. Then here we can only define a member std::unique_ptr<ManagerPool> manager_pool_;

#include "util/net.h"

namespace xllm {
void PageManagerServer::create_server(const page::Options& options,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we'd better put these files here to their respective directories according to their functions.
like: page_manager_pool.h to framework/block/, page_manager_server.h to core/distribute_runtime, service ... etc. we can add a subdirectory under these directories like page/.

Anyway, this can be refactored later, currently is ok.

namespace xllm {

KVCacheManagerClient::KVCacheManagerClient(BlockManagerPool* block_manager_pool)
: block_manager_pool_(block_manager_pool) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, after we create a base class ManagerPool, there will be no need to determine whether it is a block or a page. We can directly handle it with the base class pointer. And the KVCacheManagerClient will also become unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants