Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embassy-rp time-driver-impl hangs #3758

Open
robot-rover opened this issue Jan 11, 2025 · 0 comments · May be fixed by #3763
Open

embassy-rp time-driver-impl hangs #3758

robot-rover opened this issue Jan 11, 2025 · 0 comments · May be fixed by #3763

Comments

@robot-rover
Copy link

robot-rover commented Jan 11, 2025

I've been experiencing an intermittent hang. My tasks awaiting an embassy-time Timer stop waking up, while the rest of my tasks continue to function as normal. In addition, causing a new task to await a Timer unfreezes the time driver.

By exposing some of the internals of embassy-rp's time-driver, I was able to see the state of the timer queue after my system had hung:

Debug: TimerDebug {
    current_time: 2637680334, // Result of reading the TIMERRAWH/L register
    fire_time: 1703154155, // ALARM0's fire time
    armed: false, // ALARM0's armed bit in the ARMED register
    enabled: true, // ALARM0's interrupt enable bit in the INTE register
    timer_queue: [
        TaskData {
            state: 1, // The Task's state u8
            loc: 536880272, // The memory location of the TaskHeader
            expires_at: 1703173985,
        },
        TaskData {
            state: 1,
            loc: 536880208,
            expires_at: 1703253822,
        },
        TaskData {
            state: 1,
            loc: 536880448,
            expires_at: 1703154155,
        },
    ],
}

The fact that the alarm was not armed made me think I was looking for a race condition in the timer queue's implementation. I believe the offending code is here, in the IRQ itself:

fn check_alarm(&self) {
let n = 0;
critical_section::with(|cs| {
let alarm = &self.alarms.borrow(cs);
let timestamp = alarm.timestamp.get();
if timestamp <= self.now() {
self.trigger_alarm(cs)
} else {
// Not elapsed, arm it again.
// This can happen if it was set more than 2^32 us in the future.
TIMER.alarm(n).write_value(timestamp as u32);
}
});
// clear the irq
TIMER.intr().write(|w| w.set_alarm(n, true));
}

(Comments beginning with % are mine)

    fn check_alarm(&self) {
        let n = 0;
        critical_section::with(|cs| {
            let alarm = &self.alarms.borrow(cs);
            let timestamp = alarm.timestamp.get();
            // % Lets say that the current timestamp has elapsed
            if timestamp <= self.now() {
                // % Trigger alarm is called, which sets the next alarm
                self.trigger_alarm(cs)
            } else {
                // Not elapsed, arm it again.
                // This can happen if it was set more than 2^32 us in the future.
                TIMER.alarm(n).write_value(timestamp as u32);
            }
        });

        // % set_alarm does check to make sure that the alarm has not
        // % elapsed by the time has finished setting it, but if the alarm was very
        // % nearly about to expire, it could go off here, and the interrupt
        // % is clobbered by the next line

        // clear the irq
        TIMER.intr().write(|w| w.set_alarm(n, true));
    }

I think the fix would be to check that the alarm is still armed before returning from the IRQ, and if it is not, it should circle back and set a new one. I'm very willing to make a PR to address this, but I want to get input on whether maintainer's think this is an issue or not first.

@robot-rover robot-rover linked a pull request Jan 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant