Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler: Building Block & Control Plane Service #7761

Closed
wants to merge 187 commits into from

Conversation

cicoyle
Copy link
Contributor

@cicoyle cicoyle commented May 24, 2024

This PR adds the relevant code for:

  1. a new Scheduler Building Block
  2. a new Scheduler Control Plane Service

This PR follows this official proposal, however we did implement bidirectional streaming between the Scheduler and Daprd Sidecar (the proposal will be updated to reflect this change).

This is an end to end solution where an app is able to schedule a job to the daprd sidecar and perform basic crud operations with the job. The daprd sidecar sends the job to the Scheduler control plane service, which then stores the job in its embedded etcd. At trigger time, the scheduler will send the job back to the daprd sidecar where the sidecar will send the triggered job along to the app.

This includes work to replace the actor reminder subsystem, by scheduling the actor reminders in the Scheduler control plane service embedded etcd.

Perf numbers will be provided soon.

There were several individuals involved in making this work happen - thank you to all 🚀

Issue reference

1.14 release P0
Here is the issue tracking the work. There are a few remaining items that will be completed before the release date.

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

cicoyle and others added 30 commits January 16, 2024 09:12
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: Deepanshu Agarwal <[email protected]>
Signed-off-by: joshvanl <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
…working. need to fix still

Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
cicoyle and others added 27 commits April 11, 2024 17:08
Co-authored-by: Josh van Leeuwen <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>
Co-authored-by: Josh van Leeuwen <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>
Co-authored-by: Josh van Leeuwen <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>
Streaming btw Scheduler & Sidecar
* Fixes warning “Running http and grpc server on single port. This is not recommended for production.”

Signed-off-by: Elena Kolevska <[email protected]>

* Suffixes data dirs with instance id

Signed-off-by: Elena Kolevska <[email protected]>

* Adds space quota parameter

Signed-off-by: Elena Kolevska <[email protected]>

* Sets default quota to 2GB

Signed-off-by: Elena Kolevska <[email protected]>

* Adds compaction parameters

Signed-off-by: Elena Kolevska <[email protected]>

* Updates helm charts

Signed-off-by: Elena Kolevska <[email protected]>

* Adds namespace to data dir name. Renames etcdID to just ID.

Signed-off-by: Elena Kolevska <[email protected]>

---------

Signed-off-by: Elena Kolevska <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
* able to send triggered job back to app via the app channel from daprd sidecar using both grpc and http protocols

Signed-off-by: Cassandra Coyle <[email protected]>

* change sidecar receiving job to debug level to still validate the scheduler stream

Signed-off-by: Cassandra Coyle <[email protected]>

* grpc test

Signed-off-by: Cassandra Coyle <[email protected]>

* wip

Signed-off-by: Cassandra Coyle <[email protected]>

* some cleanup

Signed-off-by: Cassandra Coyle <[email protected]>

* update test framework grpc app to add the OnJobEventFn and update test to use it. grpc appcallback test passes

Signed-off-by: Cassandra Coyle <[email protected]>

* wip http test

Signed-off-by: Cassandra Coyle <[email protected]>

* added http working test. need to make lint

Signed-off-by: Cassandra Coyle <[email protected]>

* update tests with stub for interface func for triggerJob to app now since its in the app channel interface

Signed-off-by: Cassandra Coyle <[email protected]>

* defer release of ch

Signed-off-by: Cassandra Coyle <[email protected]>

---------

Signed-off-by: Cassandra Coyle <[email protected]>
* go-etcd-cron

Signed-off-by: joshvanl <[email protected]>

* Fix multi-scheduler int test

Signed-off-by: joshvanl <[email protected]>

* Review comments

Signed-off-by: joshvanl <[email protected]>

* Rename schedule app job type to job

Signed-off-by: joshvanl <[email protected]>

---------

Signed-off-by: joshvanl <[email protected]>
* restore test file diff, keep chart chagnes

Signed-off-by: Cassandra Coyle <[email protected]>

* fix read-only err

Signed-off-by: Cassandra Coyle <[email protected]>

---------

Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
* Bidirectional job trigger & ack.

Adds job ack from scheduler client for when job is finished processing
and can be ticked.

Adds mTLS authorization to scheduler API server.

Adds integration tests for scheduler Jobs and Actor Reminders.

Signed-off-by: joshvanl <[email protected]>

* Review comments & reconnect scheduler int test

Signed-off-by: joshvanl <[email protected]>

* Update go-etcd-cron

Signed-off-by: joshvanl <[email protected]>

* Linting

Signed-off-by: joshvanl <[email protected]>

---------

Signed-off-by: joshvanl <[email protected]>
…ler statefulset (#27)

* Charts: Adds option to use PVC for Scheduler statefulset

Adds optional `dapr_scheduler.cluster.persistentVolumeClaimName` helm
chart values option to change the scheduler data dir volume to use the
references PersistentVolumeClaim, rather than an empty dir, making ETCD
data persistent across pod restarts.

Also changes the volume and mount paths so that all schedulers share the
same root mount path, but write to a sub directory of the form
"/<namespace>/<scheduler-id>".

Signed-off-by: joshvanl <[email protected]>

* Update scheduler volume to use volumeClaimTemplate

Signed-off-by: joshvanl <[email protected]>

---------

Signed-off-by: joshvanl <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
* merge & fix http status code check

Signed-off-by: Cassandra Coyle <[email protected]>

* triggered job e2e test for http app

Signed-off-by: Cassandra Coyle <[email protected]>

* update test iteration nums

Signed-off-by: Cassandra Coyle <[email protected]>

* rm time.sleep -> assert.eventually

Signed-off-by: Cassandra Coyle <[email protected]>

* rm local test changes

Signed-off-by: Cassandra Coyle <[email protected]>

* update test name

Signed-off-by: Cassandra Coyle <[email protected]>

* tweaks

Signed-off-by: Cassandra Coyle <[email protected]>

* grpc e2d works, need to cleanup grpc test

Signed-off-by: Cassandra Coyle <[email protected]>

* rm grpc test and combine into http test. keep both apps tho. need to cleanup local test changes in scheduler_test

Signed-off-by: Cassandra Coyle <[email protected]>

* cleanup local test changes

Signed-off-by: Cassandra Coyle <[email protected]>

* make lint

Signed-off-by: Cassandra Coyle <[email protected]>

* mv things around

Signed-off-by: Cassandra Coyle <[email protected]>

* cleanup

Signed-off-by: Cassandra Coyle <[email protected]>

* thread -> goroutine

Signed-off-by: Cassandra Coyle <[email protected]>

* Update clients.go

Signed-off-by: Cassie Coyle <[email protected]>

* rm commented line

Signed-off-by: Cassandra Coyle <[email protected]>

* Apply suggestions from code review

Co-authored-by: Josh van Leeuwen <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>

* PR review updates

Signed-off-by: Cassandra Coyle <[email protected]>

* review updates. add code todo

Signed-off-by: Cassandra Coyle <[email protected]>

---------

Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>
Co-authored-by: Josh van Leeuwen <[email protected]>
* continuously retry scheduler clients if it fails upon the first try

Signed-off-by: Cassandra Coyle <[email protected]>

* Apply suggestions from code review

Co-authored-by: Josh van Leeuwen <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>

* fix indentation after UI committing

Signed-off-by: Cassandra Coyle <[email protected]>

---------

Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassie Coyle <[email protected]>
Co-authored-by: Josh van Leeuwen <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
Signed-off-by: Cassandra Coyle <[email protected]>
@mikeee mikeee mentioned this pull request May 28, 2024
43 tasks
@cicoyle
Copy link
Contributor Author

cicoyle commented May 28, 2024

Sorry for the confusion - I reset my old branch and am closing this PR and opening another with the feat-dist-scheduler branch instead of my backup

@cicoyle cicoyle closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants