Skip to content

Comments

[WIP] Fix job scheduling#472

Closed
parkan wants to merge 35 commits intomainfrom
feat/dag-emit-deferred
Closed

[WIP] Fix job scheduling#472
parkan wants to merge 35 commits intomainfrom
feat/dag-emit-deferred

Conversation

@parkan
Copy link
Collaborator

@parkan parkan commented Mar 28, 2025

This was originally titled "[WIP] enable producing DAG cars in inline mode (deferred approach)" but it turns out the problem might have been a lot simpler.

Still validating this but the issue appears to have been that scheduled daggen tasks would not succeed because of incorrect prioritization, where they would try to run before pack jobs and therefore error out. This was discovered while refactoring the scheduling mode, which is explained below. Unclear if inline mode somehow exacerbated the issue, will post an update on Monday.

This PR changes the scheduling semantics, fixes an issue where daggen tasks had higher scheduling priority than pack tasks and therefore tried to run ahead (erroring out), and adds a configurable auto mode which correctly executes the scan-pack-daggen sequence.

  • adds a --auto boolean (default true) flag to prep create, which sequences the scan/pack/daggen tasks automatically
  • aligns semantics of start-pack with start-daggen; previously pack tasks were automatically created by scan in a Ready state, which ran immediately -- now this will only happen if auto is on, otherwise the pack tasks are created in a Paused state and will wait for start-pack to run
  • automatically schedules a daggen task in auto mode (this is currently done in the start-scan handler)
  • re-orders the job type struct such that single-worker processing will run pack tasks before daggen tasks, satisfying the dependency relation

TODO:

  • consider refining the task prioritization such that multi-worker setups respect the dependency relationships correctly, especially in the case of a dedicated daggen worker (e.g. by checking state of dependency tasks and parking current task until they are complete)
  • alternatively, consider scheduling the daggen task from the last pack task instead of at the beginning of the pipeline, however this requires introspecting the full packing task set, especially if they execute out of order or have failures
  • consider alternatives to --auto flag and using start-scan as the kickoff mechanism; an alternative approach may be to introduce a new command (--start-all or similar) however the semantics get messy and require threading a parameter through scan and pack tasks
  • tests, documentation, etc

@parkan
Copy link
Collaborator Author

parkan commented Mar 31, 2025

can confirm that this breaks with multiple workers, working on a fix

@parkan parkan force-pushed the feat/dag-emit-deferred branch from d4d3bbe to 11e4ff4 Compare June 4, 2025 13:06
@parkan
Copy link
Collaborator Author

parkan commented Jun 4, 2025

ok, conflicts resolved to the point where the diff is correct 👍

@ianconsolata
Copy link
Collaborator

@parkan @Sankara-Jefferson what is the status of this PR? Still being worked on, or was this work captured in a different PR? I noticed some force pushes near the end, so I am not sure if this code is still correct.

@parkan
Copy link
Collaborator Author

parkan commented Jul 10, 2025

there is still potential relevance to the ideas in here, but it likely requires review and re-consideration in light of everything we've learned

it might be better to close it for the moment and come back as needed

(the force pushes after mine likely deal with the filter-repo)

@parkan
Copy link
Collaborator Author

parkan commented Jul 10, 2025

actually it looks like anjor's work is likely going to supersede this anyway, closing

@parkan parkan closed this Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants