Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master crash in CondVarAny::notify_all #4770

Closed
romange opened this issue Mar 15, 2025 · 3 comments · Fixed by #4789
Closed

master crash in CondVarAny::notify_all #4770

romange opened this issue Mar 15, 2025 · 3 comments · Fixed by #4789
Assignees
Labels
important higher priority than the usual ongoing development tasks

Comments

@romange
Copy link
Collaborator

romange commented Mar 15, 2025

the stack trace below. notify_all at dragonfly_connection.cc:1640 notifies the connection fiber that is blocked on the pipelining. Both the notifier and the fiber should be on the same thread,
however when inspecting the corefile at WaitQueue::NotifyAll, i see that active->scheduler and cntx->scheduler are not the same.

#3  0x00007ffff7c4527e in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#4  <signal handler called>
#5  boost::intrusive::list_node_traits<void*>::set_next (next=0x0, n=0x0) at /usr/include/boost/intrusive/detail/list_node.hpp:65
#6  boost::intrusive::circular_list_algorithms<boost::intrusive::list_node_traits<void*> >::unlink (this_node=<optimized out>) at /usr/include/boost/intrusive/circular_list_algorithms.hpp:145
#7  boost::intrusive::list_impl<boost::intrusive::mhtraits<util::fb2::detail::Waiter, boost::intrusive::list_member_hook<boost::intrusive::link_mode<(boost::intrusive::link_mode_type)1> >, &util::fb2::detail::Waiter::wait_hook>, unsigned long, false, void>::pop_front_and_dispose<boost::intrusive::detail::null_disposer> (disposer=..., this=0x59e681c0770) at /usr/include/boost/intrusive/list.hpp:353
#8  boost::intrusive::list_impl<boost::intrusive::mhtraits<util::fb2::detail::Waiter, boost::intrusive::list_member_hook<boost::intrusive::link_mode<(boost::intrusive::link_mode_type)1> >, &util::fb2::detail::Waiter::wait_hook>, unsigned long, false, void>::pop_front (this=0x59e681c0770) at /usr/include/boost/intrusive/list.hpp:336
#9  util::fb2::detail::WaitQueue::NotifyAll (this=this@entry=0x59e681c0770, active=0x7ffff0263f00) at /home/dev/projects/dragonfly/helio/util/fibers/detail/wait_queue.cc:46
#10 0x0000555555b7b38d in util::fb2::CondVarAny::notify_all (this=0x59e681c0770) at /home/dev/projects/dragonfly/helio/util/fibers/synchronization.h:167
#11 facade::Connection::AsyncFiber (this=0x59e6a0b2a80) at /home/dev/projects/dragonfly/src/facade/dragonfly_connection.cc:1640
#12 0x0000555555b7b770 in operator() (__closure=<synthetic pointer>) at /home/dev/projects/dragonfly/src/facade/dragonfly_connection.cc:1778
#13 std::__invoke_impl<void, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> > (__f=<synthetic pointer>) at /usr/include/c++/11/bits/invoke.h:61
#14 std::__invoke<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> > (__fn=<synthetic pointer>) at /usr/include/c++/11/bits/invoke.h:96
#15 std::__apply_impl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>, std::tuple<> > (__t=<synthetic pointer>, __f=<synthetic pointer>) at /usr/include/c++/11/tuple:1854
#16 std::apply<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>, std::tuple<> > (__t=<synthetic pointer>, __f=<synthetic pointer>) at /usr/include/c++/11/tuple:1865
#17 util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::run_ (c=..., this=0x7ffff0263f00) at /home/dev/projects/dragonfly/helio/util/fibers/detail/fiber_interface.h:313
#18 operator() (caller=..., __closure=<optimized out>) at /home/dev/projects/dragonfly/helio/util/fibers/detail/fiber_interface.h:295
#19 std::__invoke_impl<boost::context::fiber, util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::WorkerFiberImpl<util::fb2::FixedStackAllocator>(std::string_view, const boost::context::preallocated&, util::fb2::FixedStackAllocator&&, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>&&)::<lambda(util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::FbCntx&&)>&, boost::context::fiber> (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#20 std::__invoke<util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::WorkerFiberImpl<util::fb2::FixedStackAllocator>(std::string_view, const boost::context::preallocated&, util::fb2::FixedStackAllocator&&, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>&&)::<lambda(util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::FbCntx&&)>&, boost::context::fiber> (__fn=...)   at /usr/include/c++/11/bits/invoke.h:97
#21 std::invoke<util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::WorkerFiberImpl<util::fb2::FixedStackAllocator>(std::string_view, const boost::context::preallocated&, util::fb2::FixedStackAllocator&&, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>&&)::<lambda(util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::FbCntx&&)>&, boost::context::fiber> (__fn=...)            at /usr/include/c++/11/functional:98
#22 boost::context::detail::fiber_record<boost::context::fiber, util::fb2::FixedStackAllocator, util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::WorkerFiberImpl<util::fb2::FixedStackAllocator>(std::string_view, const boost::context::preallocated&, util::fb2::FixedStackAllocator&&, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>&&)::<lambda(util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::FbCntx&&)> >::run (fctx=0x0, this=<optimized out>) at /usr/include/boost/context/fiber_fcontext.hpp:143
@romange
Copy link
Collaborator Author

romange commented Mar 15, 2025

We should not use connection migrations so it's not clear how the async fiber and connection fiber find themselves on differrent threads. Next step will be to add verbosity logs around connection migrations and see if it matches the crashes.

@romange romange self-assigned this Mar 15, 2025
@romange romange added the important higher priority than the usual ongoing development tasks label Mar 15, 2025
@romange
Copy link
Collaborator Author

romange commented Mar 17, 2025

Now I see the same on the same thread:

> f 9
> p waiter->cntx_->scheduler_ == active->scheduler
true
> p p cntx->scheduler_ == active->scheduler_ 
  true

backtrace:

#4  <signal handler called>
#5  boost::intrusive::list_node_traits<void*>::set_next (next=0x440ae1c0330, n=0x0) at /usr/include/boost/intrusive/detail/list_node.hpp:65
#6  boost::intrusive::circular_list_algorithms<boost::intrusive::list_node_traits<void*> >::unlink (this_node=0x7ffff00f2050) at /usr/include/boost/intrusive/circular_list_algorithms.hpp:154
#7  boost::intrusive::list_impl<boost::intrusive::mhtraits<util::fb2::detail::Waiter, boost::intrusive::list_member_hook<boost::intrusive::link_mode<(boost::intrusive::link_mode_type)1> >, &util::fb2::detail::Waiter::wait_hook>, unsigned long, false, void>::pop_front_and_dispose<boost::intrusive::detail::null_disposer> (disposer=..., this=0x440ae1c0330) at /usr/include/boost/intrusive/list.hpp:355
#8  boost::intrusive::list_impl<boost::intrusive::mhtraits<util::fb2::detail::Waiter, boost::intrusive::list_member_hook<boost::intrusive::link_mode<(boost::intrusive::link_mode_type)1> >, &util::fb2::detail::Waiter::wait_hook>, unsigned long, false, void>::pop_front (this=0x440ae1c0330) at /usr/include/boost/intrusive/list.hpp:338
#9  util::fb2::detail::WaitQueue::NotifyAll (this=this@entry=0x440ae1c0330, active=0x7ffff035af00) at /home/dev/projects/dragonfly/helio/util/fibers/detail/wait_queue.cc:46
#10 0x0000555555b9a9d7 in util::fb2::CondVarAny::notify_all (this=0x440ae1c0330) at /home/dev/projects/dragonfly/helio/util/fibers/synchronization.h:167
#11 facade::Connection::AsyncFiber (this=0x440b00bde80) at /home/dev/projects/dragonfly/src/facade/dragonfly_connection.cc:1642
#12 0x0000555555b9b71f in operator() (__closure=<synthetic pointer>) at /home/dev/projects/dragonfly/src/facade/dragonfly_connection.cc:1780
#13 std::__invoke_impl<void, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> > (__f=<synthetic pointer>) at /usr/include/c++/13/bits/invoke.h:61
#14 std::__invoke<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> > (__fn=<synthetic pointer>) at /usr/include/c++/13/bits/invoke.h:96
#15 std::__apply_impl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>, std::tuple<> > (__t=<synthetic pointer>, __f=<synthetic pointer>) at /usr/include/c++/13/tuple:2302
#16 std::apply<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>, std::tuple<> > (__t=<synthetic pointer>, __f=<synthetic pointer>) at /usr/include/c++/13/tuple:2313
#17 util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::run_ (c=..., this=0x7ffff035af00) at /home/dev/projects/dragonfly/helio/util/fibers/detail/fiber_interface.h:313
#18 operator() (caller=..., __closure=<optimized out>) at /home/dev/projects/dragonfly/helio/util/fibers/detail/fiber_interface.h:295
#19 std::__invoke_impl<boost::context::fiber, util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::WorkerFiberImpl<util::fb2::FixedStackAllocator>(std::string_view, const boost::context::preallocated&, util::fb2::FixedStackAllocator&&, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>&&)::<lambda(util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::FbCntx&&)>&, boost::context::fiber> (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#20 std::__invoke<util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::WorkerFiberImpl<util::fb2::FixedStackAllocator>(std::string_view, const boost::context::preallocated&, util::fb2::FixedStackAllocator&&, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>&&)::<lambda(util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::FbCntx&&)>&, boost::context::fiber> (__fn=...)          at /usr/include/c++/13/bits/invoke.h:97
#21 std::invoke<util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::WorkerFiberImpl<util::fb2::FixedStackAllocator>(std::string_view, const boost::context::preallocated&, util::fb2::FixedStackAllocator&&, facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()>&&)::<lambda(util::fb2::detail::WorkerFiberImpl<facade::Connection::LaunchAsyncFiberIfNeeded()::<lambda()> >::FbCntx&&)>&, boost::context::fiber> (__fn=...)            at /usr/include/c++/13/functional:114

@romange
Copy link
Collaborator Author

romange commented Mar 17, 2025

Ok, this is the reason for these crashes: https://github.com/dragonflydb/dragonfly/pull/4482/files#diff-b88cbca1b2c9337ab2a37262f9d516b117f4996da25e2ae173e0e129f2b82b20R312

it's indeed a regression from 1.26.x @adiholden and justifies a patch.

1.27.3 here we go!

romange added a commit that referenced this issue Mar 17, 2025
thread_queue_backpressure is a global array of per thread QueueBackpressure
objects. We referenced these objects incorrectly in 1.27.0-2.

Fixes #4770

Signed-off-by: Roman Gershman <[email protected]>
romange added a commit that referenced this issue Mar 17, 2025
thread_queue_backpressure is a global array of per thread QueueBackpressure
objects. We referenced these objects incorrectly in 1.27.0-2.

Fixes #4770

Signed-off-by: Roman Gershman <[email protected]>
romange added a commit that referenced this issue Mar 17, 2025
thread_queue_backpressure is a global array of per thread QueueBackpressure
objects. We referenced these objects incorrectly in 1.27.0-2.

Fixes #4770

Signed-off-by: Roman Gershman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
important higher priority than the usual ongoing development tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant