fix(RHINENG-15555): Fix infinite export when a host is deleted #2236
+43
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR is being created to address RHINENG-15555.
When we initiated an export and then deleted a host, the export would never finish and the status would say that it is "pending" forever. I found out that there were 2 reasons why this was happening:
DeletedObjectError
. We didn't catch the DB errors when fetching hosts for export, so when the DB error occurred, we didn't send the failure status to the export service. I'm fixing this by catching the DB errors inget_hosts_to_export
function and raisingInventoryException
, which is caught and correctly handled bycreate_export
function. I also added a unit test for this.db.session.query
tosqlalchemy.select
when fetching the hosts for export. It turns out that this is significantly slower (exporting 1000 in ephemeral hosts withdb.session.query
takes about 0.3s, while exporting 1000 hosts in ephemeral withselect
takes more than 5 seconds - more than 16 times slower), I'll try to explain why later. This means that there is a much higher chance of seeing DB errors (for example when a host is deleted), because the export takes much more time.I looked at the DB queries we make with both
db.session.query
andselect
, and whiledb.session.query
makes just a single SQL query,select
makes 4 additional SQL queries for EVERY host that is being exported. These additional queries don't even make any sense, as they query hosts bynull
ids.Here is what
db.session.query
does:And here is what
select
does:The initial query looks exactly the same, but for some reason, it makes these additional 4 queries for every single exported host (this is not a complete log, the logs go like this for a long time). I have no idea why this is happening. Everything I found on the internet suggests that these 2 methods should do essentially the same thing, it's just a different way how to construct the DB query in Python. If anyone knows what might cause this please share, I think that this can be fixed in another Jira, but for now, I just returned back to using
db.session.query
to fix the bug.There were a few other issues that I noticed while working on this:
_handle_export_error
, it returned HTTP error 400 and complained that the message was empty. This was happening because theInventoryException
didn't have a__str__
method, so doingstr(e)
returned an empty string. I added the__str__
method to theInventoryException
and made it return a string out ofto_json
method._handle_export_response
. This was caused by sending a response text into the status code parameter. This is a nice example of how type-checking can help us prevent from creating bugs.sqldumps
function decorator didn't work for me: RHINENG-15779SQLDump
logs don't look very nice when we use logger to write the messages: RHINENG-15874PR Checklist
Secure Coding Practices Documentation Reference
You can find documentation on this checklist here.
Secure Coding Checklist