Skip to content
This repository was archived by the owner on May 15, 2025. It is now read-only.

Conversation

vMaroon
Copy link
Member

@vMaroon vMaroon commented May 4, 2025

Summary

Due to the urgency, I completed @oglok's work #48.
Refer to his PR for initial context.

Changes:

  • Replaced PrefixStore with an LRU cache-map based data-structure

  • Implemented the missing functionalities such as the utilization of PostResponsePlugin to update store

  • Pending E2E tests

}

// AddEntry adds a new entry to the prefix store.
func (s *PrefixStore) AddEntry(modelName string, prompt string, pod *types.NamespacedName) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better if we have an interface to do that, and we can use different data types

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of data types would you envision?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash-table, Redix, and TRIE(later we can compare which one is better)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an interface when there's at least one more data-structure.

Comment on lines 100 to 113
return indexerScoresToNormalizedScoredPods(pods, scores)
if len(scores) == 0 {
loggerDebug.Info("No scores found for pods")
return nil
}

podToKey := func(pod types.Pod) (string, bool) {
metricsPod := pod.GetPod()
if metricsPod == nil {
return "", false
}
return metricsPod.Address, true
}

return indexedScoresToNormalizedScoredPods(pods, podToKey, scores)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not change the kvcache scorer in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in a comment below.

pdFilterEnablementEnvVar = "ENABLE_PD_FILTER"
prefixScorerEnablementEnvVar = "ENABLE_PREFIX_AWARE_SCORER"

pdFilterEnablementEnvVar = "ENABLE_PD_FILTER"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bunch with rest of env-vars, no need for extra empty lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepted.

Comment on lines +105 to +90
ctx := context.Background()
loggerDebug := log.FromContext(ctx).WithName("scheduler_config").V(logutil.DEBUG)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems off?
should you be getting the logger fields/configuration from an existing context? If you create a new context, it won't have any existing fields inherited from context

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but this entire configuration is off. Propagating context here roots it deeper into the codebase, I'd prefer living with the current state until fully refactored. Does that sound ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this change (in kvcache-aware) be part of the prefix PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to minimize some code re-use, but it could be that both of the scorers can share a ContextAwareScorer base or something, as they almost behave 1:1. Since both you and @oglok raised this, I'll revert this change and make it in another PR.

limitations under the License.
*/

package scorer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the package name is scorer and is used when referring to the objects outside the package, consider dropping the Scorer from function and other variables:
for example: scorer.PrefixAware instead of scorer.PrefixAwareScorer, scorer.NewPrefixAware instead of scorer.NewPrefixAwareScorer etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer this to a followup PR since this is relevant to other scorers too.

// NewPrefixAwareScorer creates a new PrefixAwareScorer with the given
// PrefixStoreConfig. If the config is nil, default is used.
func NewPrefixAwareScorer(config *PrefixStoreConfig) *PrefixAwareScorer {
return &PrefixAwareScorer{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other functions assume it is not nil (e.g., L57 below), suggest checking it here (change func signature to also return an error)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter is PrefixStoreConfig and not PrefixStore, which can be nil in its use. I think you missed the Config part.

Pods *lru.Cache[types.NamespacedName, time.Time] //TODO: implement Pod eviction based on staleness
}

// PrefixStore is an in-memory prefix-to-block cache with xxhash keys and LRU
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// PrefixStore is an in-memory prefix-to-block cache with xxhash keys and LRU
// PrefixStore is an in-memory prefix-to-block cache with hash keys and LRU

Why commit to xxhash at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to? I saw it has nice performance, and it was used in a reference implementation I saw.

// Chunk the text into blocks and populate the cache
for start := 0; start < len(prompt); start += s.blockSize {
end := start + s.blockSize
if end > len(prompt) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is guaranteed to miss on the next iteration. You're using a partial block size and the next iteration will add more bytes/runes up to a full block which won't match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of chunking hits only 1:1 indeed, and it's arguably useless. Dropping such chunks.

}

// Compute the hash for the current block
digest := xxhash.New()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: why xxhash? Speed, collision resistance, ..?
I think it is 64b so using the birthday paradox helps determine if good enough


// Compute the hash for the current block
digest := xxhash.New()
if _, err := digest.WriteString(prompt[start:end]); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this depends on the block content alone.
Should you take into account the hash of the previous block?

}

// FindMatchingPods finds all pods that match the given prompt and model name.
// It returns a map of pods and the number of blocks they match.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: can a match start mid-prompt or are you considering only full prefix (from pos 0)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, only full prefix. We may want to introduce different strategies but right now only full prefix is relevant.

@vMaroon vMaroon merged commit 09f7448 into neuralmagic:dev May 5, 2025
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants