Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler Framework Logic: Preview Only, Do Not Merge #1

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

michaelawyu
Copy link
Owner

No description provided.

// EqualCondition compares one condition with another; it ignores the LastTransitionTime and Message fields,
// and will consider the ObservedGeneration values from the two conditions a match if the current
// condition is newer.
func EqualCondition(current, desired *metav1.Condition) bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no such util in the existing k8s code base? feel like something useful. I searched in k8s repo but I can't seem to find one though.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it happened back then when fleet networking was under development as well. There isn't one available so eventually we build one of our own.

@@ -32,6 +37,16 @@ type CycleStatePluginReadWriter interface {
type CycleState struct {
// store is a concurrency-safe store (a map).
store sync.Map

// skippedFilterPlugins is a set of Filter plugins that should be skipped in the current scheduling cycle.
skippedFilterPlugins sets.String

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan! The reason why generic set is not used is that it is not supported yet with our current dependency version. It was added since 1.26; ours use 1.25. Is it OK if I bump version?

)

// A no-op, dummy plugin which connects to all extension points.
type DummyAllPurposePlugin struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be good to add the var _ Interface = DummyAllPurposePlugin{} to make it clear what interfaces this implements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we create an interface for Profile? I know there are already interface for its individual functions.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do for static check.

For the interface part, at this moment I couldn't think of any case where multiple implementations of profiles are needed, and there seems to be no need to limit access to this component. Though I don't really have any strong opinion against adding one though.

func ExtractNumOfClustersFromPolicySnapshot(policy *fleetv1beta1.ClusterPolicySnapshot) (int, error) {
numOfClustersStr, ok := policy.Annotations[fleetv1beta1.NumberOfClustersAnnotation]
if !ok {
return 0, fmt.Errorf("cannot find annotation %s", fleetv1beta1.NumberOfClustersAnnotation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will CRP name help the debug or this error will be wrapped with more info?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller will wrap this with more info I believe.

pkg/scheduler/framework/uniquename/uniquename.go Outdated Show resolved Hide resolved
if diff := maxClusterDecisionCount - len(refreshed); diff > 0 {
current := policy.Status.ClusterDecisions
for _, decision := range current {
if !decision.Selected {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we sure that the cluster here is not part of the already existing bindings? We update the status because we think the status might be out of sync with the binding

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this one is a missight on my end. It is true that we cannot trust any old decisions on policy snapshot. It has been fixed.

// Calculate the batch size.
//
// Note that obsolete bindings are not counted.
state.desiredBatchSize = int(*policy.Spec.Policy.NumberOfClusters) - len(active) - len(creating)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumberOfClusters is optional, what if it is nil?

Copy link
Owner Author

@michaelawyu michaelawyu Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have split the flow for PickAll and PickN for better clarity.

For this specific point, it is a leftover missight from earlier code where it is not yet certain whether numOfClusters will be annotated; it should use the extracted numOfClusters instead. Many apologies.

//
// Note that a race condition may arise here, when a rollout controller attempts to update bindings
// at the same time with the scheduler. An error induced requeue will happen in this case.
if err := f.updateBindings(ctx, toUpdate); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there real race between scheduler/rollout controller in this case as they manage different part of the binding?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan! A race could happen if the rollout controller are transitioning a binding from creating stated to active while the scheduler is trying to update the binding at the same time.

//
// This is set to happen after new bindings are created and old bindings are updated, to
// avoid interruptions (deselected then reselected) in a best effort manner.
if err := f.markAsDeletingFor(ctx, toDelete); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as scheduler has the truth state.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race could that the scheduler would like to mark a creating binding as deleting while the rollout controller is moving the binding to the active state.

// Trimming obsolete and creating bindings alone is not enough, move on to active bindings.

// Sort the bindings by their CreationTimestamps.
sorted := sortByCreationTimestampBindings(active)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that the create time of the binding is the best indicator. What about last update time? Need to think through what's the real life implication of picking one bind vs another.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ryan! The objectmeta does not have a lastUpdateTimestamp, there are only Creation and Deletion timestamps I think. We could use lastTransitionTime, but it is in the condition (which is manipulated by the Dispatcher). The alternative is to add a label of our own to mark lastUpdateTimestamp.

michaelawyu added a commit that referenced this pull request Dec 9, 2024
michaelawyu added a commit that referenced this pull request Jan 7, 2025
* drift detection and takeover implementation #1

* Minor fixes

* Included all changes + added/migrated tests

* Minor fixes

* Minor fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants