Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resumeOnRestart: Successful jobs never get restarted #54

Open
christian-schwaderer opened this issue Aug 23, 2024 · 10 comments
Open

resumeOnRestart: Successful jobs never get restarted #54

christian-schwaderer opened this issue Aug 23, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@christian-schwaderer
Copy link

Description

I have some concern regarding the resumeOnRestart option.

Imagine a situation like this:

  • We have a job to run every minute
  • The job successfully runs at 9:00:00
  • nextRunAt is set to 9:01:00
  • lastFinishedAt is set to 9:00:00
  • Now, the server crashes at 9:00:30
  • The Server successfully restarts at 9:01:20
    => Server has missed the run at 9:01:00
    => I expect the job to rerun at 9:02:00

However, if my testing is right, it will never run again because of

{
              lockedAt: { $exists: false },
              lastFinishedAt: { $exists: false },
              nextRunAt: { $lte: now, $ne: null },
            },

in resume-on-restart.ts.
The crucial part is lastFinishedAt: { $exists: false }.
Pulse obviously "thinks": "Oh, this got successfully finished, so we do not have to do anything here".

Code example

No response

Additional context

No response

@christian-schwaderer christian-schwaderer added the bug Something isn't working label Aug 23, 2024
@code-xhyun
Copy link
Contributor

code-xhyun commented Aug 24, 2024

@christian-schwaderer

If the resumeOnRestart option is set to true, it operates according to the following scenario (tested in an actual environment):
If there is a task that runs every minute

  • 09:00:00 execution

    • lastFinishedAt 09:00:00
    • lastRunAt 09:00:00
    • nextRunAt 09:01:00
  • 09:00:30 server crash

  • 09:01:20 server restart

  • At 09:01:20, immediately executes the missed task from 09:01:00

    • lastFinishedAt 09:01:20
    • lastRunAt 09:01:20
    • nextRunAt 09:02:00

Please attach the actual code you tested. I will test that code.

@tcastelli
Copy link

We also tried to test this scenario and I think the key point is that resumeOnRestart WON'T restart the job but the regular polling will do. Since the job finished correctly, resumeOnrestart won't catch that recurring job, but the regular polling will check that nextRun is lower than your current time and it will run and schedule it for every minute from that point in time (I think this is okay for most scenarios, but if you rely on your job having run exactly a certain number of times after a crash the implemented logic is not going to help you.)

@code-xhyun
Copy link
Contributor

@christian-schwaderer @tcastelli

Since we forked the existing agenda library code to create the resumeOnRestart feature, architectural issues emerged from
the fundamental structure. Currently, I lack the practical time needed to modify this structure. I would be grateful if contributors could help resolve this issue together.

@Beelink
Copy link

Beelink commented Nov 19, 2024

@tcastelli What is the correct way to pull all the jobs from mongo and then resume them? I cannot simulate the resumeOnRestart behavior because there is no job.resume() method. Example:

const jobs = await pulse.jobs({});
const promises = jobs.map(async (job) => await job.resume());
await Promise.all(promises);

@jonaszuberbuehler
Copy link

jonaszuberbuehler commented Nov 20, 2024

I am confused: should an already existing job with an repeat interval be restarted after pulse.start() call? I just made a simple hello world, the following job is defined in the DB:

{
    _id: ObjectId('673da4da57ed2f3136b36c47'),
    name: 'welcomeMessage',
    type: 'single',
    attempts: 0,
    backoff: null,
    data: {},
    endDate: null,
    nextRunAt: ISODate('2024-11-20T09:20:17.070Z'),
    priority: 0,
    repeatInterval: '5 seconds',
    repeatTimezone: null,
    shouldSaveResult: false,
    skipDays: null,
    startDate: null,
    finishedCount: 2,
    lastFinishedAt: ISODate('2024-11-20T08:59:16.062Z'),
    runCount: 2
  }

If I re-run the sample code w/o the pulse.define() call, nothing happens after pulse.start(). I thought that was the whole idea of the resumeOnRestart flag. Am I missing out on smth? Or do we have to call define for already existing jobs on every server restart?

@Beelink
Copy link

Beelink commented Nov 20, 2024

@jonaszuberbuehler I was able to solve this issue by switching to agenda.

After server restart fetch all jobs using agenda.jobs() and then do

agenda.define(job.attrs.name, { priority: 10 }, yourCallback)

for each job.

I don't know why exactly the same thing doesn't work in pulsecron.

@jonaszuberbuehler
Copy link

@Beelink Thx, I am coming from Agenda. Your workaround was mentioned on many issues there. I was under the impression that Pulse would fix this (was also mentioned some times). For me it's counterintuitive: the jobs are already there, why should we call define again?

@Beelink
Copy link

Beelink commented Nov 20, 2024

@jonaszuberbuehler I totally agree, but I think this would be hard to implement because, on server restart, we need to find the callback provided. How? The old js context doesn't exist and there is no callback function anymore.

@jonaszuberbuehler
Copy link

@Beelink Valid point, which brings me back to what exactly is the resumeOnRestart flag for?

@Beelink
Copy link

Beelink commented Nov 20, 2024

@jonaszuberbuehler I chose pulse because of this flag and it never worked for me, I was trying to make it work a few times. Tried a few last versions from 1.6.3 to 1.6.6

Same with the disableAutoIndex flag, I don't know what it supposed to do and it never worked for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants