Skip to content

Conversation

@BrianJKoopman
Copy link
Member

This PR implements signal handling during shutdown operations. This should mitigate issues with commands run from sorunlib that error out and leave either SMuRF streaming or ACU motion (or both) on.

Some things I want to mention:

  • SIGTERM is now always handled just like SIGINT. This makes "Interrupt" and "Terminate" behave the same from nextline, both raising a KeyboardInterrupt. In my opinion, both signals should result in a clean shutdown. This was the simplest way to accomplish that. I may change how SIGTERM is handled in the future to try to shutdown things on its own (likely after Check stream status and stop running streams before operations #231 or similar gets implemented), and then sys.exit(0), but this should work for now.
  • I replaced any direct use of run.smurf.stream('off') in modules that called it. This was done so that these modules also benefit from the signal handling addition.
  • The protect_shutdown decorator should be used on any functions that perform shutdown actions moving forward. This catches the signals and just prints them to stdout while shutdown occurs. If something isn't stopping properly a SIGKILL can still be sent to exit (likely with streams/ACU motion still occurring).

I tested this in a local environment with the following elements:

  • nextline
  • ocs-web
  • ACU agent w/accompanying ACU simulator
  • 2x SMuRF File Emulator agents

I tested sending SIGINT, SIGTERM, and SIGKILL at various points during an example scan block that looked like this:

from nextline import disable_trace
import time

with disable_trace():
    import sorunlib as run
    from ocs.ocs_client import OCSClient
    run.initialize(test_mode=True)

run.acu.set_scan_params(0.5, 0.25)
run.acu.move_to(az=49.1, el=60.0)

################### Detector Setup######################
with disable_trace():
    run.initialize(test_mode=True)
run.smurf.take_bgmap(concurrent=True)
run.smurf.take_noise(concurrent=True, tag='res_check')
run.smurf.iv_curve(concurrent=True,
    iv_kwargs={'run_serially': False, 'cool_wait': 60*5})
run.smurf.bias_dets(concurrent=True)
time.sleep(1)
run.smurf.bias_step(concurrent=True)
run.smurf.take_noise(concurrent=True, tag='bias_check')
#################### Detector Setup Over ####################

# hwp already spinning cw (negative frequency)
run.wait_until('2025-09-04T18:27:00+00:00')
run.acu.set_scan_params(0.8, 0.25)
# scan duration = 0:46:30
run.seq.scan(
    description='Medium priority NE scan',
    stop_time='2025-09-09T23:13:30+00:00',
    width=40.0, az_drift=0,
    subtype='cmb', tag='uid-dad4ce63-cd6e-4591-b8b4-a8e747a242dd-pass-0,49-89',
    min_duration=600,
)
run.acu.move_to(az=49.1, el=60.0)

I am able to interrupt during the run.seq.scan() line and watch the scan stop gracefully. Additional signals sent (with the exception of SIGKILL) are simply printed to stdout. SIGKILL ends things abruptly with streams still running.

This does not handle the case of streams left running while a schedule is started. That will still run into the issues described in #229.

Resolves #166.

Otherwise a SIGINT might be timed just right to start most of the streams, but
never stop them.
This simple wrapper was created to help with error handling and testing. We had
similar code elsewhere, so we replace direct uses of run.smurf.stream('off')
with this wrapper.
This can be used to protect shutdown functions that performing cleanup before
exiting. This will prevent accidental interruptions of cleanup code, which in
the past has left SMuRF streams or ACU motion on.
@BrianJKoopman BrianJKoopman force-pushed the koopman/sigint-handling branch from 318c066 to 04c527c Compare September 10, 2025 15:16
@BrianJKoopman BrianJKoopman merged commit d95f8eb into main Sep 10, 2025
6 checks passed
@BrianJKoopman BrianJKoopman deleted the koopman/sigint-handling branch September 10, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KeyboardInterrupt might disrupt seq.scan shutdown

2 participants