-
Notifications
You must be signed in to change notification settings - Fork 455
CDRIVER-5869 fix bulk write crash with verbose results #1843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The `bson_iter_t` stores a pointer to the referred data. The payload buffer `mongoc_bulkwrite_t::ops` may be reallocated and invalidate existing iterators. Instead, store integral offsets into the payload.
9338849
to
3f3cb04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the purpose of the id_loc
(formerly id_iter
) data member is ultimately to populate the insertedId
field in the bulk write insert result document by _bulkwriteresult_set_insertresult in _bulkwritereturn_apply_result, which must correlate to the index of a bulk write insert operation.
According to db.collection.bulkWrite Examples (MongoDB Manual) there should be an insertedIds
field in the reply document containing the list of index-id pairs (regardless whether the bulk write operation is ordered or unordered). Can this field be parsed instead, in a manner similar to what is currently being done with the upserted
field to populate the upsertedId
field for an bulk write update operation result? This would avoid the need to retain an iterator to the _id
field of the insert document and its resulting implicit reference semantics.
If the insertedIds
field cannot be used (I could not find discussion/rationale in #1590), can we avoid implicit reference semantics entirely by instead storing an owning copy (via bson_strdup
) of the BSON UTF-8 string (as already done for the .ns
data member)?
That example uses the db.coll.drop()
// Do an insert:
const reply = db.adminCommand({bulkWrite: "admin", ops: [{insert: 0, document: {_id: "foo" } }], nsInfo: [{ns: "test.coll"}] })
print (reply.cursor.firstBatch[0]) // { ok: 1, idx: 0, n: 1 } For upserts, the server is expected to generate the db.coll.drop()
// Do an upsert:
const reply = db.adminCommand({bulkWrite: "admin", errorsOnly: false, ops: [{update: 0, filter: { _id: "foo" }, updateMods: { $set: { x: 1 }}, upsert: true }], nsInfo: [{ns: "test.coll"}] })
print (reply.cursor.firstBatch[0]) // { ok: 1, idx: 0, n: 1, nModified: 0, upserted: { _id: 'foo' } }
Possibly yes, but it may come at a non-neglible performance cost. Reporting inserted mongoc_bulkwrite_t *bw = mongoc_client_bulkwrite_new (client);
ASSERT_OR_PRINT (mongoc_bulkwrite_append_insertone (bw, "db.c", tmp_bson ("{'_id': 'foo'}"), NULL, &error), error);
mongoc_bulkwriteopts_t *bw_opts = mongoc_bulkwriteopts_new ();
mongoc_bulkwriteopts_set_verboseresults (bw_opts, true); // Report insertedId's in result.
mongoc_bulkwritereturn_t bwr = mongoc_bulkwrite_execute (bw, bw_opts);
ASSERT (bwr.res);
const bson_t *insertresults = mongoc_bulkwriteresult_insertresults (bwr.res);
ASSERT_MATCH (insertresults, BSON_STR ({"0" : {"insertedId" : "foo"}}));
I considered a simpler alternative to avoid tracking the |
Interesting bug. There might be a nonzero security impact (address space read) if the returned IDs and input documents were both accessible to untrusted code. One early comment I have is that a copy of ObjectID should actually be smaller than this location reference. I don't think there's a performance argument for referencing instead of copying. Edit: Never mind, I'm forgetting the _id can be an arbitrary type not just an ObjectID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
// `id_loc` locates the "_id" field of an insert document. | ||
struct { | ||
size_t op_start; // Offset in `mongoc_bulkwrite_t::ops` to the BSON for the insert op: { "document": ... } | ||
size_t op_len; // Length of insert op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tempting to suggest narrower types here to save a little cache space etc, but this is already SO much smaller than bson_iter_t :)
Co-authored-by: Ezra Chung <[email protected]>
* do not use `bson_iter_t` to store `_id` location The `bson_iter_t` stores a pointer to the referred data. The payload buffer `mongoc_bulkwrite_t::ops` may be reallocated and invalidate existing iterators. Instead, store integral offsets into the payload. * fix comment describing `ops_byte_len` and `ops_doc_len` --------- Co-authored-by: Ezra Chung <[email protected]>
Summary
Uses integral offsets, rather than a
bson_iter_t
, to refer to "_id" fields in thebulkWrite
payload.Verified with this patch build: https://spruce.mongodb.com/version/6798e244aed6b800073ef361
Background
See CDRIVER-5869 for background.