-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor bulk_create calls #9047
base: develop
Are you sure you want to change the base?
Conversation
What's the purpose of this? |
As the description states - use chunk size by default. It's often unintentionally forgotten. |
Sure, but why do we want this? |
Because we don't want it to be unintentionally forgotten in random places. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9047 +/- ##
===========================================
+ Coverage 73.43% 73.75% +0.32%
===========================================
Files 419 428 +9
Lines 44351 44523 +172
Branches 3875 3881 +6
===========================================
+ Hits 32569 32839 +270
+ Misses 11782 11684 -98
|
ignore_conflicts: bool = ..., | ||
update_conflicts: bool | None = ..., | ||
update_fields: Sequence[str] | None = ..., | ||
unique_fields: Sequence[str] | None = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be replaced with **kwargs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, I just wanted it to be a "drop-in" replacement with the same signature. It can't be a pure drop-in, because now we require the first model argument, but this design offers good enough compatibility.
cvat/apps/engine/model_utils.py
Outdated
if "postgresql" in settings.DATABASES["default"]["ENGINE"]: | ||
return db_model.objects.bulk_create(objs, batch_size=batch_size, **kwargs) | ||
else: | ||
# imitate RETURNING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this workaround can be removed now? The only relevant database that this would be used for is SQLite; according to the Django docs, SQLite 3.35 supports returning the primary key; and Ubuntu 22.04 already has SQLite 3.37.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, probably.
cvat/apps/engine/views.py
Outdated
@@ -1153,7 +1153,7 @@ def _append_upload_info_entries(self, client_files: list[dict[str, Any]]): | |||
task_data.client_files.bulk_create([ | |||
ClientFile(file=self._prepare_upload_info_entry(cf['file'].name), data=task_data) | |||
for cf in client_files | |||
]) | |||
], batch_size=settings.DEFAULT_DB_BULK_CREATE_BATCH_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the helper function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving this to cvat/utils
. Doesn't seem like it has much to do with engine
.
cvat/apps/engine/default_settings.py
Outdated
@@ -95,3 +95,5 @@ | |||
MAX_CONSENSUS_REPLICAS = int(os.getenv("CVAT_MAX_CONSENSUS_REPLICAS", 11)) | |||
if MAX_CONSENSUS_REPLICAS < 1: | |||
raise ImproperlyConfigured(f"MAX_CONSENSUS_REPLICAS must be >= 1, got {MAX_CONSENSUS_REPLICAS}") | |||
|
|||
DEFAULT_DB_BULK_CREATE_BATCH_SIZE = int(os.getenv("CVAT_DEFAULT_DB_BULK_CREATE_BATCH_SIZE", 1000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to be a setting, and an environment variable? Under what circumstances would one want to adjust this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, it's just a named constant. I can't say why this may require changes so far, so I'm open to removing configurability. Typically, it's better have such performance constants configurable.
cvat/apps/engine/model_utils.py
Outdated
update_conflicts: bool | None = ..., | ||
update_fields: Sequence[str] | None = ..., | ||
unique_fields: Sequence[str] | None = ..., | ||
*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to move this above batch_size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used API from the documentation (and from the VS code django stub). Do you think the argument order should be changed?
bulk_create(objs, batch_size=None, ignore_conflicts=False, update_conflicts=False, update_fields=None, unique_fields=None)
Quality Gate passedIssues Measures |
Motivation and context
bulk_create()
callsHow has this been tested?
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.