Implemented PrefixAwareScorer Based On Ricardo's Work #118

vMaroon · 2025-05-04T22:02:13Z

Summary

Due to the urgency, I completed @oglok's work #48.
Refer to his PR for initial context.

Changes:

Replaced PrefixStore with an LRU cache-map based data-structure
Implemented the missing functionalities such as the utilization of PostResponsePlugin to update store
Pending E2E tests

kfirtoledo · 2025-05-05T07:36:37Z

pkg/epp/scheduling/plugins/scorer/prefix_store.go

+}
+
+// AddEntry adds a new entry to the prefix store.
+func (s *PrefixStore) AddEntry(modelName string, prompt string, pod *types.NamespacedName) error {


It is better if we have an interface to do that, and we can use different data types

what kind of data types would you envision?

Hash-table, Redix, and TRIE(later we can compare which one is better)

I think we should have an interface when there's at least one more data-structure.

oglok · 2025-05-05T10:18:48Z

pkg/epp/scheduling/plugins/scorer/kvcache-aware-scorer.go

-	return indexerScoresToNormalizedScoredPods(pods, scores)
+	if len(scores) == 0 {
+		loggerDebug.Info("No scores found for pods")
+		return nil
+	}
+
+	podToKey := func(pod types.Pod) (string, bool) {
+		metricsPod := pod.GetPod()
+		if metricsPod == nil {
+			return "", false
+		}
+		return metricsPod.Address, true
+	}
+
+	return indexedScoresToNormalizedScoredPods(pods, podToKey, scores)


I'd not change the kvcache scorer in this PR.

Addressed in a comment below.

elevran · 2025-05-05T10:03:02Z

pkg/epp/scheduling/local_config.go

-	pdFilterEnablementEnvVar        = "ENABLE_PD_FILTER"
+	prefixScorerEnablementEnvVar    = "ENABLE_PREFIX_AWARE_SCORER"
+
+	pdFilterEnablementEnvVar = "ENABLE_PD_FILTER"


nit: bunch with rest of env-vars, no need for extra empty lines

elevran · 2025-05-05T10:05:27Z

pkg/epp/scheduling/local_config.go

+	ctx := context.Background()
+	loggerDebug := log.FromContext(ctx).WithName("scheduler_config").V(logutil.DEBUG)


This seems off?
should you be getting the logger fields/configuration from an existing context? If you create a new context, it won't have any existing fields inherited from context

I agree, but this entire configuration is off. Propagating context here roots it deeper into the codebase, I'd prefer living with the current state until fully refactored. Does that sound ok?

elevran · 2025-05-05T10:06:35Z

pkg/epp/scheduling/plugins/scorer/kvcache-aware-scorer.go

should this change (in kvcache-aware) be part of the prefix PR?

I wanted to minimize some code re-use, but it could be that both of the scorers can share a ContextAwareScorer base or something, as they almost behave 1:1. Since both you and @oglok raised this, I'll revert this change and make it in another PR.

elevran · 2025-05-05T10:09:43Z

pkg/epp/scheduling/plugins/scorer/prefix_aware_scorer.go

+limitations under the License.
+*/
+
+package scorer


since the package name is scorer and is used when referring to the objects outside the package, consider dropping the Scorer from function and other variables:
for example: scorer.PrefixAware instead of scorer.PrefixAwareScorer, scorer.NewPrefixAware instead of scorer.NewPrefixAwareScorer etc.

I'll defer this to a followup PR since this is relevant to other scorers too.

elevran · 2025-05-05T10:11:38Z

pkg/epp/scheduling/plugins/scorer/prefix_aware_scorer.go

+// NewPrefixAwareScorer creates a new PrefixAwareScorer with the given
+// PrefixStoreConfig. If the config is nil, default is used.
+func NewPrefixAwareScorer(config *PrefixStoreConfig) *PrefixAwareScorer {
+	return &PrefixAwareScorer{


Other functions assume it is not nil (e.g., L57 below), suggest checking it here (change func signature to also return an error)

The parameter is PrefixStoreConfig and not PrefixStore, which can be nil in its use. I think you missed the Config part.

elevran · 2025-05-05T10:16:12Z

pkg/epp/scheduling/plugins/scorer/prefix_store.go

+	Pods *lru.Cache[types.NamespacedName, time.Time] //TODO: implement Pod eviction based on staleness
+}
+
+// PrefixStore is an in-memory prefix-to-block cache with xxhash keys and LRU


Suggested change

// PrefixStore is an in-memory prefix-to-block cache with xxhash keys and LRU

// PrefixStore is an in-memory prefix-to-block cache with hash keys and LRU

Why commit to xxhash at this point?

Any reason not to? I saw it has nice performance, and it was used in a reference implementation I saw.

elevran · 2025-05-05T10:18:49Z

pkg/epp/scheduling/plugins/scorer/prefix_store.go

+	// Chunk the text into blocks and populate the cache
+	for start := 0; start < len(prompt); start += s.blockSize {
+		end := start + s.blockSize
+		if end > len(prompt) {


I think this is guaranteed to miss on the next iteration. You're using a partial block size and the next iteration will add more bytes/runes up to a full block which won't match.

This type of chunking hits only 1:1 indeed, and it's arguably useless. Dropping such chunks.

elevran · 2025-05-05T10:20:03Z

pkg/epp/scheduling/plugins/scorer/prefix_store.go

+		}
+
+		// Compute the hash for the current block
+		digest := xxhash.New()


q: why xxhash? Speed, collision resistance, ..?
I think it is 64b so using the birthday paradox helps determine if good enough

elevran · 2025-05-05T10:20:48Z

pkg/epp/scheduling/plugins/scorer/prefix_store.go

+
+		// Compute the hash for the current block
+		digest := xxhash.New()
+		if _, err := digest.WriteString(prompt[start:end]); err != nil {


this depends on the block content alone.
Should you take into account the hash of the previous block?

elevran · 2025-05-05T10:22:02Z

pkg/epp/scheduling/plugins/scorer/prefix_store.go

+}
+
+// FindMatchingPods finds all pods that match the given prompt and model name.
+// It returns a map of pods and the number of blocks they match.


q: can a match start mid-prompt or are you considering only full prefix (from pos 0)?

At the moment, only full prefix. We may want to introduce different strategies but right now only full prefix is relevant.

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Signed-off-by: Ricardo Noriega <[email protected]>

Signed-off-by: Maroon Ayoub <[email protected]>

vMaroon requested review from oglok, elevran, shmuelk, nirrozenbaum and kfirtoledo May 4, 2025 22:02

vMaroon mentioned this pull request May 4, 2025

Prefix Aware Scorer #48

Merged

kfirtoledo reviewed May 5, 2025

View reviewed changes

oglok reviewed May 5, 2025

View reviewed changes

elevran suggested changes May 5, 2025

View reviewed changes

oglok and others added 8 commits May 5, 2025 17:38

cherry-picked prefix_score

e45e31c

Add prefix store functionality

073069a

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Prefix Aware Scorer

9e30e07

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Add unit tests for prefix store

a481c85

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Add unit tests for prefix aware scorer

53c550d

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

implemented PrefixAwareScorer based on Ricardo's work

d7f20fe

Remove KVcache scorer changes for traceability

b852c92

Signed-off-by: Ricardo Noriega <[email protected]>

addressed review comments

a0e02c0

Signed-off-by: Maroon Ayoub <[email protected]>

vMaroon force-pushed the prefix-aware branch from 8972bd8 to a0e02c0 Compare May 5, 2025 14:39

This was referenced May 5, 2025

[PR #118 Followup] Refactor Scorer Namings #120

Closed

[PR #118 Followup] PrefixStore Block Hashing Fix #121

Closed

vMaroon merged commit 09f7448 into neuralmagic:dev May 5, 2025
1 check passed

		ctx := context.Background()
		loggerDebug := log.FromContext(ctx).WithName("scheduler_config").V(logutil.DEBUG)

	// PrefixStore is an in-memory prefix-to-block cache with xxhash keys and LRU
	// PrefixStore is an in-memory prefix-to-block cache with hash keys and LRU

Implemented PrefixAwareScorer Based On Ricardo's Work #118

Implemented PrefixAwareScorer Based On Ricardo's Work #118

Uh oh!

Conversation

vMaroon commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vMaroon commented May 4, 2025 •

edited

Loading