Description
I'm using Toxiproxy for our external services and we're now getting ready to do a bunch of DB failover work. To better handle our failovers without dropping queries, we've patched ActiveRecord to catch any MySQL errors, perform a reconnect and then try the query again. I can manually confirm this works by kicking off this script and either toggling the availability of the toxiproxy or DB server manually during the execution.
ATTEMPT_COUNT = 300
puts "==> Truncating the users test_db table"
ActiveRecord::Base.connection.execute("truncate test_db.user")
puts "==> Starting to send MySQL queries"
require 'securerandom'
ATTEMPT_COUNT.times do
sleep 0.1
hash = SecureRandom.uuid
begin
puts " [#{Time.now.strftime("%T.%L")}] Inserting #{hash}"
ActiveRecord::Base.connection.execute("INSERT INTO user (first_name, last_name) VALUES ('test', '#{hash}')")
puts " Success!"
rescue Exception => e
puts " [#{Time.now.strftime("%T.%L")}] #{e}"
end
end
row_count = ActiveRecord::Base.connection.execute("select * from user").count
puts
puts "Attempted writes: #{ATTEMPT_COUNT}"
puts "DB row count: #{row_count}"
puts "Variance: #{ATTEMPT_COUNT - row_count}"
However, I'm getting a little stuck when it comes to using Toxiproxy to emulate the failover completing. I first tried:
Toxiproxy[:mysql_master].down do
User.first
end
It seems our patch works a little too well because it sits here waiting for the MySQL server to come back but it never does as the yield
is still running. I then tried to split the enable
/disable
but still had the same results with the following:
Toxiproxy[:mysql_master].disable
User.first
Toxiproxy[:mysql_master].enable
Which leads me to the following questions:
- Could you share how your using Toxiproxy with things like DB failovers? Is this something you're able to test similarly to my intention or do you handle it on a per model basis? I essentially need the proxy to only be present for a short period of time but re-enable after the time has passed.
- The only way I could think of having this work would be to pass another argument to
down
(and laterdisable
) which would only disable the proxy for a period of time. Is applying a non-blocking timeout to that functionality something you'd consider useful for the library?
Thanks!