Prefix Aware Scorer #48

oglok · 2025-04-23T12:33:57Z

Co-authored with @vMaroon

This PR implements:

PrefixStore with an LRU cache-map based data-structure
Prefix aware scorer
Unit tests

Copilot

Pull Request Overview

This PR implements a new prefix store based on a radix tree and a prefix-aware scorer to route requests more efficiently. Key changes include:

Adding a PrefixStore and its configuration to scheduler configuration with environmental overrides.
Implementing a new PrefixAwareScorer and integrating it into the scheduler.
Providing unit tests for the new store and scorer components.

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pkg/epp/scheduling/types/types.go	Added a Prompt field to the LLMRequest structure.
pkg/epp/scheduling/scheduler.go	Introduced prefix store configuration and integrated the new scorer.
pkg/epp/scheduling/prefix_store.go	Implements the prefix store with entry addition, lookup, eviction, and maintenance.
pkg/epp/scheduling/prefix_store_test.go	Unit tests for basic prefix operations, constraints, TTL expiration, max entries, and maintenance routine.
pkg/epp/scheduling/prefix_aware_scorer.go	Defines the scorer which uses the prefix store to score pods.
pkg/epp/scheduling/prefix_aware_scorer_test.go	Tests behavior of the new scorer with various input scenarios.

Files not reviewed (1)

go.mod: Language not supported

Copilot · 2025-04-23T12:56:39Z

pkg/epp/scheduling/prefix_store.go

+	// Update LastUsed time for the matched entry
+	entry.LastUsed = time.Now()
+	ps.tree.Insert(matchedPrefix, entry)


Modifying 'entry.LastUsed' under a read lock (RLock) in FindPodForPrefix may lead to a race condition. Consider acquiring the write lock (Lock) when updating mutable state or restructuring the update to ensure thread safety.

Suggested change

// Update LastUsed time for the matched entry

entry.LastUsed = time.Now()

ps.tree.Insert(matchedPrefix, entry)

// Update LastUsed time for the matched entry

ps.mu.RUnlock() // Release the read lock before acquiring the write lock

ps.mu.Lock()

entry.LastUsed = time.Now()

ps.tree.Insert(matchedPrefix, entry)

ps.mu.Unlock()

rootfs · 2025-04-23T12:58:36Z

pkg/epp/scheduling/scheduler.go

@@ -43,6 +48,10 @@ const (
 	defaultQueueThresholdCritical = 5
 	defaultQueueingThresholdLoRA  = 128
 	defaultLoraAffinityThreshold  = 0.999
+	defaultPrefixStoreMaxEntries  = 1000
+	defaultPrefixStoreMinLen      = 3


this is probably too small :D

oops! testing defaults xD

rootfs · 2025-04-23T12:59:50Z

pkg/epp/scheduling/scheduler.go

+	defaultPrefixStoreMaxEntries  = 1000
+	defaultPrefixStoreMinLen      = 3
+	defaultPrefixStoreMaxLen      = 100
+	defaultPrefixStoreTTLHours    = 24


minute instead or hour?

I'm not really sure what should be a reasonable TTL for LLM inferencing.

vMaroon · 2025-04-23T14:21:06Z

The dev branch was rebased on upstream main, which includes the new and different Scorer interface. The Scorer pulled from there defines a Score function that works per-pod and not group of pods. We need to discuss whether to propose changing that but it makes sense to first adapt to it.

I'll open a KVCacheAwareScorer PR soon and we can sync.

elevran · 2025-04-27T12:47:05Z

pkg/epp/scheduling/prefix_aware_scorer.go

+// between the request's prompt and stored prefixes. The score is normalized between 0 and 1,
+// where 1 represents the longest matching prefix.
+type PrefixAwareScorer struct {
+	weight      float64


IIRC, weights are extracted to the layer above.

elevran · 2025-04-27T12:48:39Z

pkg/epp/scheduling/prefix_aware_scorer.go

+}
+
+// NewPrefixAwareScorer creates a new PrefixAwareScorer with the given weight and prefix store
+func NewPrefixAwareScorer(weight float64, prefixStore *PrefixStore) Scorer {


What is the rationale for receiving (instead of creating internally) the prefixStore?
Seems like this would be an internal implementation decision?
If so, PrefixStore is likely not an exported type.

elevran · 2025-04-27T12:51:26Z

pkg/epp/scheduling/prefix_aware_scorer.go

+	if !found {
+		logger.V(logging.DEBUG).Info("No matching prefix found, returning zero scores for all pods")
+		// If no matching prefix found, return zero scores for all pods
+		for i, pod := range pods {


consider moving this up to the function initialization?
That way, you'd be assured that you start with 0s for all Pods and would not need to repeat it multiple times.

elevran · 2025-04-27T12:53:08Z

pkg/epp/scheduling/prefix_store.go

+
+// PrefixEntry represents a single entry in the prefix store
+type PrefixEntry struct {
+	PodRef    types.NamespacedName


q: is this how defined in PodMetrics.Pod or would a simple string suffice?

elevran · 2025-04-27T12:54:03Z

pkg/epp/scheduling/prefix_store.go

+)
+
+// PrefixEntry represents a single entry in the prefix store
+type PrefixEntry struct {


q: should the struct and its fields be exported?
On first glance seems that it should be internal use.

elevran · 2025-04-27T13:02:37Z

pkg/epp/scheduling/prefix_store.go

+	errutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/error"
+	"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging"
+)
+


It would be helpful to have a high level description of data structure and algorithms, and tie those to the processing steps of the LLM request (e.g., consulted in read only or also updated on prompt request, are responses handed, ...)

elevran · 2025-04-27T13:05:07Z

pkg/epp/scheduling/prefix_store.go

+		if entry.PodRef == pod && entry.ModelName == modelName {
+			logger.V(logging.DEBUG).Info("Updating existing entry", "prefix", prefix, "pod", pod.String())
+			entry.LastUsed = time.Now()
+			ps.tree.Insert(prefix, entry)


will this allow multiple hits on a prefix (e.g., both pods A and B hold the prefix) or just one (last writer wins)?
I think we would want to have multiple (e.g., in case other scorers score these Pods differently).

elevran · 2025-04-27T13:07:04Z

pkg/epp/scheduling/prefix_store.go

+
+	// Check total entries limit
+	if ps.tree.Len() >= ps.config.MaxEntries {
+		logger.V(logging.DEBUG).Info("Store at capacity, evicting oldest entry", "currentSize", ps.tree.Len(), "maxSize", ps.config.MaxEntries)


Q: would any entries "below" the evicted node become dangling following the eviction?

elevran · 2025-04-27T13:07:58Z

pkg/epp/scheduling/prefix_store.go

+	if len(prefix) < ps.config.MinPrefixLen {
+		logger.V(logging.DEBUG).Info("Prefix too short", "prefix", prefix, "minLength", ps.config.MinPrefixLen)
+		return types.NamespacedName{}, false
+	}
+
+	if len(prefix) > ps.config.MaxPrefixLen {
+		logger.V(logging.DEBUG).Info("Truncating prefix", "originalLength", len(prefix), "maxLength", ps.config.MaxPrefixLen)
+		prefix = prefix[:ps.config.MaxPrefixLen]
+	}


consider extracting to an isValidPrompt function that can be called from multiple places

elevran · 2025-04-27T13:08:44Z

pkg/epp/scheduling/prefix_store.go

+	}
+
+	// Use LongestPrefix to find the best match
+	matchedPrefix, val, found := ps.tree.LongestPrefix(prefix)


The current implementation seems to score only the longest prefix and not any of the other pods?

elevran · 2025-04-27T13:11:49Z

@oglok would be great if you can rebase off latest dev for easier review. Most of the 78 modified files are coming in from different PRs and "polluting" your PR

elevran · 2025-04-27T13:21:21Z

pkg/epp/scheduling/session_affinity_scorer.go

Are there downsides to having this feature (including scorer, store and tests) in its own directory?
That would minimize the changes in the existing code base

elevran · 2025-04-27T13:22:30Z

pkg/epp/scheduling/session_affinity_scorer.go

+	}
+}
+
+// ScoreTargets does the actual scoring of the target pods by the session affinity.


It is not clear where/when the store is updated with new prompt requests and replies.
Is this currently handled?

oglok · 2025-04-28T10:02:21Z

Hey @elevran ! thanks for reviewing the PR. I'm working on the rebase, and rework my code because apparently the scorers interface merged upstream has changed. I'll get your comments in while doing that!

vMaroon · 2025-05-04T22:15:18Z

This PR #118 is based on this work. Feel free to rebase on it so that we can merge both or move it here and merge here. It preserves your contributions.

oglok · 2025-05-05T08:14:41Z

I've added @vMaroon 's work here too and changed last commit message.

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Signed-off-by: Ricardo Noriega <[email protected]>

Signed-off-by: Maroon Ayoub <[email protected]>

oglok changed the title ~~WIP: Prefix Aware Scorer~~ Prefix Aware Scorer Apr 23, 2025

rootfs requested a review from Copilot April 23, 2025 12:55

Copilot AI reviewed Apr 23, 2025

View reviewed changes

rootfs reviewed Apr 23, 2025

View reviewed changes

mayabar force-pushed the dev branch from b0d29ec to a80bcfc Compare April 23, 2025 14:13

oglok force-pushed the prefix_scorer branch 2 times, most recently from 7383242 to 037fadd Compare April 26, 2025 09:30

elevran suggested changes Apr 27, 2025

View reviewed changes

elevran reviewed Apr 27, 2025

View reviewed changes

oglok force-pushed the prefix_scorer branch 2 times, most recently from b3a2149 to 815a432 Compare April 30, 2025 15:41

vMaroon mentioned this pull request May 4, 2025

Implemented PrefixAwareScorer Based On Ricardo's Work #118

Merged

oglok force-pushed the prefix_scorer branch from 815a432 to c5fd826 Compare May 5, 2025 08:13

oglok force-pushed the prefix_scorer branch 2 times, most recently from 0edd1eb to f3c00aa Compare May 5, 2025 08:16

oglok and others added 8 commits May 5, 2025 17:38

cherry-picked prefix_score

e45e31c

Add prefix store functionality

073069a

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Prefix Aware Scorer

9e30e07

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Add unit tests for prefix store

a481c85

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

Add unit tests for prefix aware scorer

53c550d

Signed-off-by: Ricardo Noriega De Soto <[email protected]>

implemented PrefixAwareScorer based on Ricardo's work

d7f20fe

Remove KVcache scorer changes for traceability

b852c92

Signed-off-by: Ricardo Noriega <[email protected]>

addressed review comments

a0e02c0

Signed-off-by: Maroon Ayoub <[email protected]>

oglok force-pushed the prefix_scorer branch from 57c119d to a0e02c0 Compare May 5, 2025 16:39

vMaroon merged commit 09f7448 into neuralmagic:dev May 5, 2025
1 check passed

oglok mentioned this pull request May 7, 2025

Prefix Aware Routing #42

Closed

Prefix Aware Scorer #48

Prefix Aware Scorer #48

Uh oh!

Conversation

oglok commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vMaroon commented Apr 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elevran commented Apr 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elevran Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oglok commented Apr 28, 2025

Uh oh!

vMaroon commented May 4, 2025

Uh oh!

oglok commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oglok commented Apr 23, 2025 •

edited

Loading

elevran Apr 27, 2025 •

edited

Loading

oglok commented May 5, 2025 •

edited

Loading