Fix: Prevent additional query when updating new graphs (faster inserts) #146
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm using GraphDiff in a few projects that perform a lot of batch updates and inserts. Here are my optimizations that made it about 15-20x faster in my use cases. All tests in GraphDiff.Tests and in my own projects run fine. Perhaps Brent or Andreas can have a look at my changes.
Inserting new entities
When updating new graphs (insert) GraphDiff tries to load the persisted graph, which always returns null and requires a lot of unneccessary work (predicate expression, include strings, actual query to db). If an entity is new and not persisted yet, you should not try to load it from the database.
This optimization is optional for simple int and long primary keys (
GraphDiffConfiguration
) and is automatically ignored in other scenarios (string/guid/composite/etc).Updating of collections
When updating many entities of the same type you have to call UpdateGraph in a loop. This results in a lot of queries from
QueryLoader
and extremely slows down large batch updates. As proposed by DixonD-git (Issue https://github.com/refactorthis/GraphDiff/issues/127), loading of many entities can be done in a single query to increase performance.The performance boost was quite a surprise (15-20x on my local machine). I only had to change internal interfaces and classes, while the public interface (
DbContextExtensions
) did not change. Composite keys create predicates like(KeyA='a1' AND KeyB='b1') OR (KeyA='a2' AND KeyB='b2') OR ...
and single int keys are translated to...Key in (1,2,3,4,5...)
.