-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCORE-2234 Crash in dart due to debug output when app is being torn down #7985
Conversation
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1354Details
💛 - Coveralls |
…crash-during-teardown
src/realm/util/logger.cpp
Outdated
void StderrLogger::do_write(std::string&& output) | ||
{ | ||
REALM_ASSERT(m_log_mutex); | ||
// Lock the mutex to avoid comingling the logger output messages | ||
std::lock_guard l(*m_log_mutex.get()); | ||
// std::cerr is unbuffered, so no need to flush | ||
std::cerr << cat.get_name() << " - " << get_level_prefix(level) << message << '\n'; // Throws | ||
std::cerr << output << std::endl; // Throws | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added so the StderrLogger
and the TimestampStderrLogger
use the same static mutex when writing to stderr.
src/realm/util/logger.cpp
Outdated
void StderrLogger::do_write(std::string&& output) | ||
{ | ||
REALM_ASSERT(m_log_mutex); | ||
// Lock the mutex to avoid comingling the logger output messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something i've never understood is why we need this mutex at all. std::cerr is threadsafe on its own. if the C++ standard library is working correctly, you should never see comingling of logger output messages to stderr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the mutex - a single <<
call to std::cerr
is thread safe, so output message is created first and then pushed to stderr as a single string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear, i meant this as a question rather than a statement. is the reason we added the mutex here because we were seeing interleaving or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some interleaving in the past during the tests. For std::cerr, each individual streaming operator call is thread safe, but chained operators are not, allowing context changes to happen before the entire message is printed.
e.g., std::cerr << "string a" << "string b";
==> std::cerr << "string a"; std::cerr << "string b";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But unfortunately, MacOS TSAN is not happy with writing to std::cerr
from two threads, but it is fine with fprintf()
and supposedly it is faster then std::cerr
. I updated StderrLogger
to use fprintf()
instead.
src/realm/util/logger.cpp
Outdated
// std::cerr is unbuffered, so no need to flush | ||
std::cerr << cat.get_name() << " - " << get_level_prefix(level) << message << '\n'; // Throws | ||
std::cerr << output << std::endl; // Throws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment right above says that you don't need to do std::endl because cerr is unbuffered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, bad habit - reverted to '\n'
test/object-store/audit.cpp
Outdated
@@ -52,13 +52,6 @@ using namespace std::string_literals; | |||
using Catch::Matchers::StartsWith; | |||
using nlohmann::json; | |||
|
|||
static auto audit_logger = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to remove the instances of static loggers, but it doesn't matter here - reverted the change
src/realm/util/timestamp_logger.hpp
Outdated
@@ -9,15 +9,15 @@ | |||
namespace realm { | |||
namespace util { | |||
|
|||
class TimestampStderrLogger : public Logger { | |||
class TimestampStderrLogger : public StderrLogger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the TimestampStderrLogger is completely unused outside of tests. Can we just remove it?
src/realm/util/logger.cpp
Outdated
|
||
void StderrLogger::do_write(std::string&& output) | ||
{ | ||
REALM_ASSERT(m_log_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When could this assert ever be false?
as a more general high level question - did we get to the bottom of why this was crashing in dart? destruction of statics is undefined behavior, but generally i think we've seen that happen while the process is exiting, which implies dart maybe isn't tearing realm down cleanly since there are threads out there still running while the process exits? do we know what's actually going on here? |
I tried reproducing the issue with several versions of realm-dart, but I was never able to reproduce it in a macos app or in the ios simulator. From the stack trace, however, I believe it is clear that the crash is occurring because the static mutex is no longer valid when |
Tldr - bug fixes should be scoped narrowly, especially if we haven't been able to reproduce the crash. Going out and refactoring things as part of a speculative bugfix is how you make more bugs. It's pretty disappointing that we never reproduced this despite having a sample project that demonstrates the problem. I think it's a good guess that this static mutex is being destroyed before all the databases have been torn down and that's causing problems, but it would be great to know whether that behavior in itself is a bug or if this is just weird behavior we'll have to work around. All that said, I think the first approach of using a std::shared_ptr was valid. A simpler solution could have been to just treat this mutex the way we treat the default logger mutex and other static mutexes we really want to live forever by doing I also think it's fine to keep using std::cerr and have a mutex to synchronize access to it, I just wanted to understand why we needed it. The answer that we were seeing interpolation is totally valid. It looks like it was added during a general CI improvements PR, so I wasn't sure the context for it being added. |
Understood @jbreams - I looked further into how the realm shared lib was being loaded/unloaded. The realm library is loaded the first time the I see that JS also has had some logger crashes (which may be due to the |
I was able to reproduce the logger crash yesterday on macos using the Dart SDK 3.4.1, but I have been unable to create a local build of the Dart SDK that is able to compile a macos flutter app to verify the changes in this PR. I can build my flutter app for iOS with my local Dart SDK build, but I haven't been able to reproduce the logger crash on that platform. I've reached out to the SDK team to see if they can help with my flutter app compile issue. |
Verified these changes address the crash in Dart - I've also reached out to the JS SDK team to see if this also helps with the logger crash they are seeing as well. |
So what's going on? Why is the mutex getting destroyed when it is? |
src/realm/util/logger.cpp
Outdated
void StderrLogger::do_log(const LogCategory& cat, Level level, const std::string& message) | ||
{ | ||
static Mutex mutex; | ||
LockGuard l(mutex); | ||
std::lock_guard l(m_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need to capture the global mutex as a member variable reference? Could the whole change for this ticket just be changing this line from static Mutex mutex;
to static auto& mutex = *new Mutex;
? Or changing the mutex from a Mutex to a std::mutex as you do above and locking it here without changing any of the headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if there was a risk that the static auto& mutex
variable could become invalid so I was saving a local reference to the mutex. Fortunately, this should be defined in the .data
region of the application, so actually there shouldn't be any risk of it going away. Just verified with a simple test, so I'll remove the extra reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also verified that the minimal set of logger changes are work fine with the Dart SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand a bit on how the variable could become invalid and how taking a reference to it would prevent it from becoming invalid? Do you mean there was a risk the heap-allocated mutex could go out-of-scope and be destroyed without anything calling delete on it? How would taking a reference to it prevent that and not just lead to a dangling reference? How does being in the .data
section of the image change things here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the global static reference (s_stderr_logger_mutex
) variable being destroyed/going away during the app exit. But, since globals/statics like this are symbols defined in the program memory (.bss
or .data
sections), it will not go away during the lifetime of the app.
What, How & Why?
The static mutex used to add thread safety to the StderrLogger was sometimes causing a crash on app exit when the mutex was destroyed before the stderr logger shared pointer instance was deallocated.
These changes remove the mutex altogether and use the inherent thread safety in
std::cerr
to prevent commingling the debug messages.Fixes #7969
☑️ ToDos
[ ] 🚦 Tests (or not relevant)[ ] C-API, if public C++ API changed[ ]bindgen/spec.yml
, if public C++ API changed