update task_resume_workflows to also resume processes in CREATED/RESUMED status #984

tjeerddie · 2025-07-02T14:56:10Z

improve start_process to include logic that happens in both executors.
Change process status updates to be done within a transaction to lock the process in DB in executor functions to prevent race conditions:
- change _celery_set_process_status_resumed to a transaction to lock process in the DB, preventing more then one worker to pick up the process.
  - also use can_be_resumed to check that only correct statuses can be changed.
- add _set_process_status_running to thread_start_process and thread_resume_process to update process status to RUNNING within a transaction and make sure that its status isn't RUNNING (picked up by another worker).
move threadpool executor functions to its own file.
- celery: orchestrator/services/celery.py -> orchestrator/services/executors/celery.py
- thread: orchestrator/services/processes.py -> orchestrator/services/executors/threadpool.py
  - thread_start_process
  - thread_resume_process
  - thread_validate_workflow
  - THREADPOOL_EXECUTION_CONTEXT
move can_be_resumed from api/processes.py to services/processes.py.
retrieve input state in thread_start_process and thread_resume_process to be able to restart the processes when the process gets stuck in CREATED or RESUMED right after initial form or form step.
improve workflow removed error messages.

updated testing:

add unit tests for threadpool and celery executor functions.
add unit tests for resume_process.
improve unit tests for start_process.
add 2 tests that validate the happy flow of start_process and resume_process.
change api resume workflow with incorrect status test to check all incorrect statuses.
update task_resume_workflow tests.

Related: #898

codspeed-hq · 2025-07-02T15:00:01Z

CodSpeed Performance Report

Merging #984 will not alter performance

_{Comparing 898-improve-worker-process (5f6e9bb) with main (0153fdb)}

Summary

✅ 12 untouched benchmarks

codecov · 2025-07-03T11:49:14Z

Codecov Report

❌ Patch coverage is 80.53691% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.90%. Comparing base (757fe18) to head (5f6e9bb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
orchestrator/services/tasks.py	17.64%	14 Missing ⚠️
orchestrator/services/executors/celery.py	79.16%	3 Missing and 2 partials ⚠️
orchestrator/services/processes.py	82.60%	4 Missing ⚠️
orchestrator/workflows/tasks/resume_workflows.py	86.66%	3 Missing and 1 partial ⚠️
orchestrator/services/input_state.py	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #984      +/-   ##
==========================================
+ Coverage   84.42%   84.90%   +0.48%     
==========================================
  Files         213      214       +1     
  Lines       10354    10390      +36     
  Branches     1016     1020       +4     
==========================================
+ Hits         8741     8822      +81     
+ Misses       1344     1295      -49     
- Partials      269      273       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

docs/reference-docs/app/celery-flow.png

orchestrator/services/tasks.py

docs/reference-docs/app/scaling.md

orchestrator/services/processes.py

orchestrator/services/executors/threadpool.py

orchestrator/services/tasks.py

orchestrator/services/executors/celery.py

orchestrator/workflows/tasks/resume_workflows.py

orchestrator/services/processes.py

…MED status - improve start_process to include logic that happens in both executors. - move threadpool executor functions to its own file.

- Add more info about the added graph flow in docs. - Remove `time_limit` from celery NEW_TASK. - Change workflow cannot be resumed error to be more specific - Remove duplicate auth check in `start_process`, its already done in `create_process`. - Move save input step to the processes.py resume_process. removes unnecessary state fetch in resume_task/resume_workflow

- add 2 tests that validate the happy flow of start_process and resume_process. - add unit tests for resume_process - improve unit tests for start_process - move can_be_resumed from api/processes.py to services/processes.py. - improve workflow removed error messages. - revert resume_process incorrect return type. - add can_be_resumed check in _celery_set_process_status_resumed so only correct statuses can be changed. - add process status check in thread_resume_process to only retrieve input state on suspended status. - change api resume workflow with incorrect status test to check all incorrect statuses - update task_resume_workflow tests.

…ess in DB - change _celery_set_process_status_resumed to use transaction. - change _set_process_status_running to use transaction and make sure that its status isn't RUNNING, picked up by another worker. - change get_process_ids_by_process_statuses to not fetch be able to fetch its own process. - change retrieve_input_state in threadpool resume process since the SUSPEND check doesn't work with CELERY since it changes the process status to resumed. - update unit tests.

orchestrator/services/executors/celery.py

orchestrator/services/executors/threadpool.py

orchestrator/services/processes.py

orchestrator/workflows/tasks/resume_workflows.py

test/unit_tests/services/executors/test_celery.py

test/unit_tests/services/executors/test_threadpool.py

orchestrator/workflows/tasks/resume_workflows.py

Mark90

LGTM, 3 small nitpicks (sorry) and looks like the branch is out of date.

Great work! 😄

...igrations/versions/schema/2025-07-28_850dccac3b02_update_description_of_resume_workflows_.py

orchestrator/workflows/tasks/resume_workflows.py

… step - update `processes.restart_process` docstring. - add docstrings to `_celery_set_process_status_resumed` and `_set_process_status_running` - clean tests. - fix edgecase of a `CREATED` process being resumed by stopping and starting workflow engine.

tjeerddie force-pushed the 898-improve-worker-process branch from ff408fb to 1ac7b2d Compare July 2, 2025 14:57

kvklink approved these changes Jul 3, 2025

View reviewed changes

Mark90 self-requested a review July 9, 2025 08:28

Mark90 requested changes Jul 10, 2025

View reviewed changes

tjeerddie force-pushed the 898-improve-worker-process branch 2 times, most recently from ecdc192 to 32bd94a Compare July 14, 2025 13:12

tjeerddie force-pushed the 898-improve-worker-process branch from e4ce69d to 7a8f80a Compare July 21, 2025 08:13

tjeerddie added 6 commits July 24, 2025 09:15

update task_resume_workflows to also resume processes in CREATED/RESU…

7a3bf46

…MED status - improve start_process to include logic that happens in both executors. - move threadpool executor functions to its own file.

Fix unit tests

ceb47bd

Add user input into the pstat in thread start and resume

679a751

tjeerddie force-pushed the 898-improve-worker-process branch from 7a8f80a to ca929a8 Compare July 25, 2025 11:54

Mark90 reviewed Jul 28, 2025

View reviewed changes

tjeerddie force-pushed the 898-improve-worker-process branch from 8aa938b to 7d3c634 Compare July 29, 2025 07:43

Mark90 approved these changes Jul 29, 2025

View reviewed changes

tjeerddie force-pushed the 898-improve-worker-process branch from 7d3c634 to 02c653c Compare July 29, 2025 11:40

Update unit tests

5f6e9bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update task_resume_workflows to also resume processes in CREATED/RESUMED status #984

update task_resume_workflows to also resume processes in CREATED/RESUMED status #984

Uh oh!

tjeerddie commented Jul 2, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jul 2, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mark90 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

update task_resume_workflows to also resume processes in CREATED/RESUMED status #984

Are you sure you want to change the base?

update task_resume_workflows to also resume processes in CREATED/RESUMED status #984

Uh oh!

Conversation

tjeerddie commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #984 will not alter performance

Summary

Uh oh!

codecov bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mark90 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjeerddie commented Jul 2, 2025 •

edited

Loading

codspeed-hq bot commented Jul 2, 2025 •

edited

Loading

codecov bot commented Jul 3, 2025 •

edited

Loading