-
Notifications
You must be signed in to change notification settings - Fork 3
Implemented PrefixAwareScorer Based On Ricardo's Work #118
Conversation
} | ||
|
||
// AddEntry adds a new entry to the prefix store. | ||
func (s *PrefixStore) AddEntry(modelName string, prompt string, pod *types.NamespacedName) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better if we have an interface to do that, and we can use different data types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of data types would you envision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash-table, Redix, and TRIE(later we can compare which one is better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have an interface when there's at least one more data-structure.
return indexerScoresToNormalizedScoredPods(pods, scores) | ||
if len(scores) == 0 { | ||
loggerDebug.Info("No scores found for pods") | ||
return nil | ||
} | ||
|
||
podToKey := func(pod types.Pod) (string, bool) { | ||
metricsPod := pod.GetPod() | ||
if metricsPod == nil { | ||
return "", false | ||
} | ||
return metricsPod.Address, true | ||
} | ||
|
||
return indexedScoresToNormalizedScoredPods(pods, podToKey, scores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd not change the kvcache scorer in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in a comment below.
pkg/epp/scheduling/local_config.go
Outdated
pdFilterEnablementEnvVar = "ENABLE_PD_FILTER" | ||
prefixScorerEnablementEnvVar = "ENABLE_PREFIX_AWARE_SCORER" | ||
|
||
pdFilterEnablementEnvVar = "ENABLE_PD_FILTER" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: bunch with rest of env-vars, no need for extra empty lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepted.
ctx := context.Background() | ||
loggerDebug := log.FromContext(ctx).WithName("scheduler_config").V(logutil.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems off?
should you be getting the logger fields/configuration from an existing context? If you create a new context, it won't have any existing fields inherited from context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but this entire configuration is off. Propagating context here roots it deeper into the codebase, I'd prefer living with the current state until fully refactored. Does that sound ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this change (in kvcache-aware) be part of the prefix PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to minimize some code re-use, but it could be that both of the scorers can share a ContextAwareScorer
base or something, as they almost behave 1:1. Since both you and @oglok raised this, I'll revert this change and make it in another PR.
limitations under the License. | ||
*/ | ||
|
||
package scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the package name is scorer
and is used when referring to the objects outside the package, consider dropping the Scorer
from function and other variables:
for example: scorer.PrefixAware
instead of scorer.PrefixAwareScorer
, scorer.NewPrefixAware
instead of scorer.NewPrefixAwareScorer
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer this to a followup PR since this is relevant to other scorers too.
// NewPrefixAwareScorer creates a new PrefixAwareScorer with the given | ||
// PrefixStoreConfig. If the config is nil, default is used. | ||
func NewPrefixAwareScorer(config *PrefixStoreConfig) *PrefixAwareScorer { | ||
return &PrefixAwareScorer{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other functions assume it is not nil (e.g., L57 below), suggest checking it here (change func signature to also return an error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter is PrefixStoreConfig
and not PrefixStore
, which can be nil in its use. I think you missed the Config
part.
Pods *lru.Cache[types.NamespacedName, time.Time] //TODO: implement Pod eviction based on staleness | ||
} | ||
|
||
// PrefixStore is an in-memory prefix-to-block cache with xxhash keys and LRU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// PrefixStore is an in-memory prefix-to-block cache with xxhash keys and LRU | |
// PrefixStore is an in-memory prefix-to-block cache with hash keys and LRU |
Why commit to xxhash
at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to? I saw it has nice performance, and it was used in a reference implementation I saw.
// Chunk the text into blocks and populate the cache | ||
for start := 0; start < len(prompt); start += s.blockSize { | ||
end := start + s.blockSize | ||
if end > len(prompt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is guaranteed to miss on the next iteration. You're using a partial block size and the next iteration will add more bytes/runes up to a full block which won't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of chunking hits only 1:1 indeed, and it's arguably useless. Dropping such chunks.
} | ||
|
||
// Compute the hash for the current block | ||
digest := xxhash.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: why xxhash? Speed, collision resistance, ..?
I think it is 64b so using the birthday paradox helps determine if good enough
|
||
// Compute the hash for the current block | ||
digest := xxhash.New() | ||
if _, err := digest.WriteString(prompt[start:end]); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this depends on the block content alone.
Should you take into account the hash of the previous block?
} | ||
|
||
// FindMatchingPods finds all pods that match the given prompt and model name. | ||
// It returns a map of pods and the number of blocks they match. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: can a match start mid-prompt or are you considering only full prefix (from pos 0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, only full prefix. We may want to introduce different strategies but right now only full prefix is relevant.
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega De Soto <[email protected]>
Signed-off-by: Ricardo Noriega <[email protected]>
Signed-off-by: Maroon Ayoub <[email protected]>
Summary
Due to the urgency, I completed @oglok's work #48.
Refer to his PR for initial context.
Changes:
Replaced PrefixStore with an LRU cache-map based data-structure
Implemented the missing functionalities such as the utilization of
PostResponsePlugin
to update storePending E2E tests