Skip to content

Conversation

bscottm
Copy link
Contributor

@bscottm bscottm commented Mar 17, 2025

Fix race condition in _debug_fwrite_all() where the FILE *f output file can be permanently NULL, causing fwrite() to permanently return zero and resulting in an infinite loop on mulithreaded SIMH. This pathology primarily occurs when main thread calls flush_svc() and one of the Ethernet threads (reader, writer) emits debugging output.

Linux, macOS invoke fsync(). Windows invokes a fsync() wrapper that invokes _commit().

Issue traced to _sim_debug_flush() calling sim_set_deboff(0, NULL) to simulate fsync().

@bscottm
Copy link
Contributor Author

bscottm commented Mar 17, 2025

@pkoning2: Please merge PR #450 so that CI/CD can get past the LTO issue so that this and other builds have some chance at success (barring simulator test failures.)

@pkoning2
Copy link
Member

I'm a bit confused here. You pointed out that the simulated fsync can't be used in multithreaded simh. So why is it still in the code? If it's broken, would it not be more logical to remove it, especially since everyone has fsync?

@bscottm
Copy link
Contributor Author

bscottm commented Mar 17, 2025

I'm a bit confused here. You pointed out that the simulated fsync can't be used in multithreaded simh. So why is it still in the code? If it's broken, would it not be more logical to remove it, especially since everyone has fsync?

It can still be used for single threaded code, i.e., simulators that don't use AIO.

Also, I don't have a good handle on which platforms SIMH is supposed to support (the makefile seems to support a lot more than just Linux, macOS and Windows... Haiku? SunOS variants?) So, I left it in there as CYA.

Very happy to take it out and rewrite code to assume everyone has fsync(), except for Windows, which needs the diaper.

@bscottm
Copy link
Contributor Author

bscottm commented Mar 18, 2025

I'm a bit confused here. You pointed out that the simulated fsync can't be used in multithreaded simh. So why is it still in the code? If it's broken, would it not be more logical to remove it, especially since everyone has fsync?

@pkoning2: It's all just fsync now. If there's a platform out there that needs to be supported (VMS?), an issue is probably the best way to address it. Now back to the ETH_MAC * discussion...

*
* f == NULL: fwrite() returns 0 on Linux and Windows. len never gets decremented
* and an infinite loop ensues (cue "Forever NULL" sung to the tune of Alphaville's
* "Forever Young".)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply treat length written == 0 as an error? But in my testing, a NULL file pointer causes a segfault (on recent Linux as well as on Mac OS).

Copy link
Contributor Author

@bscottm bscottm Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkoning2: I left the "dragon" note as more of a "Don't Do This" warning. I added extra checks deal with the f == NULL,len_written == 0 and errno != EAGAIN cases. In each case, SIMH outputs a diagnostic message (the "why") and the buffer's contents that weren't written to the debug output.

@bscottm bscottm force-pushed the debug_flush_loop branch 3 times, most recently from af23c88 to 072fe53 Compare August 21, 2025 16:28
Fix race condition in _debug_fwrite_all() where the FILE *f output
file can be permanently NULL, causing fwrite() to permanently return
zero and resulting in an infinite loop on mulithreaded SIMH. This
pathology primarily occurs when main thread calls flush_svc() and
one of the Ethernet threads (reader, writer) emits debugging output.

Linux and macOS platforms invoke fsync() to synchronize file data to
disk. Windows has a fsync() wrapper calling _commit() providing the
same semantics.

Issue traced to _sim_debug_flush() calling sim_set_deboff(0, NULL) to
simulate fsync().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants