Skip to content

Commit b147f78

Browse files
authored
Start at operation time implementation (#978)
* RUBY-1369 Collection change stream integration test * RUBY-1369 try_next implementation * RUBY-1369 Move to spec/integration * RUBY-1369 Got try_next working when there are changes * RUBY-1369 Test try_next with no changes * RUBY-1369 Document try_next * RUBY-1369 Change change stream resume logic more to retry once only * RUBY-1369 Apply current resume behavior to try_next * RUBY-1369 Add docstrings for change_stream_resumable? * RUBY-1369 Document how to restart change streams and when they are resumed by the driver automatically * RUBY-1369 failCommand is 4.0+ only * RUBY-1369 These are now non-resumed * RUBY-1369 Repair the logic yet again. The concept of "retrying once" is apparently much more difficult than it first appears * RUBY-1369 Repair this test again and try another way to not have it be stuck in a getmore loop * RUBY-1369 Use timeout-interrupt to stop tests hanging forever on change stream reads * RUBY-1369 Force a successful getMore prior to failing them. I believe change stream drops documents upon a reset without startAtOperationTime, hence right now we can't reliably test initial getMores failing * RUBY-1369 Need to clear fail points only on tests using them if I want to run others on servers < 4.0
1 parent 124568d commit b147f78

File tree

17 files changed

+669
-36
lines changed

17 files changed

+669
-36
lines changed

Gemfile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,9 @@ group :development do
3434
gem 'pry-rescue'
3535
gem 'pry-nav'
3636
end
37+
38+
group :testing do
39+
platforms :mri do
40+
gem 'timeout-interrupt'
41+
end
42+
end

docs/tutorials/ruby-driver-change-streams.txt

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,3 +77,24 @@ You can close a change stream by calling the ``#close`` method:
7777
doc = stream.to_enum.next
7878
process(doc)
7979
stream.close
80+
81+
Resuming a Change Stream
82+
------------------------
83+
84+
The driver will automatically retry getMore operations on a change stream
85+
once. Initial aggregation is never retried. In practical terms this means
86+
that, for example:
87+
88+
- Calling ``collection.watch`` will fail if the cluster does not have
89+
enough available nodes to satisfy the ``"majority"`` read preference;
90+
- Once ``collection.watch`` successfully returns, if the cluster subsequently
91+
experiences an election or loses a node, but heals quickly enough,
92+
change stream reads via ``next`` or ``each`` methods will continue
93+
transparently to the application.
94+
95+
If the cluster loses enough nodes to not be able to satisfy the ``"majority"``
96+
read preference and does not heal quickly enough, ``next`` and ``each``
97+
will raise an error. In these cases the application must track, via the
98+
resume token, which documents from the change stream it has processed and
99+
create a new change stream object via the ``watch`` call, passing an
100+
appropriate ``:resume_after`` argument.

lib/mongo/client.rb

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -487,9 +487,10 @@ def start_session(options = {})
487487
# @option options [ Integer ] :batch_size The number of documents to return per batch.
488488
# @option options [ BSON::Document, Hash ] :collation The collation to use.
489489
# @option options [ Session ] :session The session to use.
490-
# @option options [ BSON::Timestamp ] :start_at_cluster_time Only return changes that occurred
491-
# after the specified timestamp. Any command run against the server will return a cluster time
492-
# that can be used here. Only valid in server versions 4.0+.
490+
# @option options [ BSON::Timestamp ] :start_at_operation_time Only return
491+
# changes that occurred at or after the specified timestamp. Any command run
492+
# against the server will return a cluster time that can be used here.
493+
# Only recognized by server versions 4.0+.
493494
#
494495
# @note A change stream only allows 'majority' read concern.
495496
# @note This helper method is preferable to running a raw aggregation with a $changeStream

lib/mongo/collection.rb

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -322,9 +322,10 @@ def aggregate(pipeline, options = {})
322322
# per batch.
323323
# @option options [ BSON::Document, Hash ] :collation The collation to use.
324324
# @option options [ Session ] :session The session to use.
325-
# @option options [ BSON::Timestamp ] :start_at_cluster_time Only return changes that occurred
326-
# after the specified timestamp. Any command run against the server will return a cluster time
327-
# that can be used here. Only valid in server versions 4.0+.
325+
# @option options [ BSON::Timestamp ] :start_at_operation_time Only return
326+
# changes that occurred at or after the specified timestamp. Any command run
327+
# against the server will return a cluster time that can be used here.
328+
# Only recognized by server versions 4.0+.
328329
#
329330
# @note A change stream only allows 'majority' read concern.
330331
# @note This helper method is preferable to running a raw aggregation with

lib/mongo/collection/view/change_stream.rb

Lines changed: 143 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,10 @@ class ChangeStream < Aggregation
7777
# on new documents to satisfy a change stream query.
7878
# @option options [ Integer ] :batch_size The number of documents to return per batch.
7979
# @option options [ BSON::Document, Hash ] :collation The collation to use.
80-
# @option options [ BSON::Timestamp ] :start_at_cluster_time Only return changes that occurred
81-
# after the specified timestamp. Any command run against the server will return a cluster time
82-
# that can be used here. Only valid in server versions 4.0+.
80+
# @option options [ BSON::Timestamp ] :start_at_operation_time Only
81+
# return changes that occurred at or after the specified timestamp. Any
82+
# command run against the server will return a cluster time that can
83+
# be used here. Only recognized by server versions 4.0+.
8384
#
8485
# @since 2.5.0
8586
def initialize(view, pipeline, changes_for, options = {})
@@ -88,11 +89,19 @@ def initialize(view, pipeline, changes_for, options = {})
8889
@change_stream_filters = pipeline && pipeline.dup
8990
@options = options && options.dup.freeze
9091
@resume_token = @options[:resume_after]
91-
read_with_one_retry { create_cursor! }
92+
create_cursor!
93+
94+
# We send different parameters when we resume a change stream
95+
# compared to when we send the first query
96+
@resuming = true
9297
end
9398

9499
# Iterate through documents returned by the change stream.
95100
#
101+
# This method retries once per error on resumable errors
102+
# (two consecutive errors result in the second error being raised,
103+
# an error which is recovered from resets the error count to zero).
104+
#
96105
# @example Iterate through the stream of documents.
97106
# stream.each do |document|
98107
# p document
@@ -105,20 +114,82 @@ def initialize(view, pipeline, changes_for, options = {})
105114
# @yieldparam [ BSON::Document ] Each change stream document.
106115
def each
107116
raise StopIteration.new if closed?
117+
retried = false
108118
begin
109119
@cursor.each do |doc|
110120
cache_resume_token(doc)
111121
yield doc
112122
end if block_given?
113123
@cursor.to_enum
114-
rescue => e
124+
rescue Mongo::Error => e
125+
if retried || !e.change_stream_resumable?
126+
raise
127+
end
128+
129+
retried = true
130+
# Rerun initial aggregation.
131+
# Any errors here will stop iteration and break out of this
132+
# method
115133
close
116-
if retryable?(e)
134+
create_cursor!
135+
retry
136+
end
137+
end
138+
139+
# Return one document from the change stream, if one is available.
140+
#
141+
# Retries once on a resumable error.
142+
#
143+
# Raises StopIteration if the change stream is closed.
144+
#
145+
# This method will wait up to max_await_time_ms milliseconds
146+
# for changes from the server, and if no changes are received
147+
# it will return nil.
148+
#
149+
# @note This method is experimental and subject to change.
150+
#
151+
# @return [ BSON::Document | nil ] A change stream document.
152+
# @api private
153+
def try_next
154+
raise StopIteration.new if closed?
155+
retried = false
156+
157+
begin
158+
doc = @cursor.try_next
159+
rescue Mongo::Error => e
160+
unless e.change_stream_resumable?
161+
raise
162+
end
163+
164+
if retried
165+
# Rerun initial aggregation.
166+
# Any errors here will stop iteration and break out of this
167+
# method
168+
close
117169
create_cursor!
170+
retried = false
171+
else
172+
# Attempt to retry a getMore once
173+
retried = true
118174
retry
119175
end
120-
raise
121176
end
177+
178+
if doc
179+
cache_resume_token(doc)
180+
end
181+
doc
182+
end
183+
184+
def to_enum
185+
enum = super
186+
enum.send(:instance_variable_set, '@obj', self)
187+
class << enum
188+
def try_next
189+
@obj.try_next
190+
end
191+
end
192+
enum
122193
end
123194

124195
# Close the change stream.
@@ -176,15 +247,30 @@ def for_collection?
176247
end
177248

178249
def cache_resume_token(doc)
250+
# Always record both resume token and operation time,
251+
# in case we get an older or newer server during rolling
252+
# upgrades/downgrades
179253
unless @resume_token = (doc[:_id] && doc[:_id].dup)
180-
raise Error::MissingResumeToken.new
254+
raise Error::MissingResumeToken
181255
end
182256
end
183257

184258
def create_cursor!
259+
# clear the cache because we may get a newer or an older server
260+
# (rolling upgrades)
261+
@start_at_operation_time_supported = nil
262+
185263
session = client.send(:get_session, @options)
186264
server = server_selector.select_server(cluster)
187265
result = send_initial_query(server, session)
266+
if doc = result.replies.first && result.replies.first.documents.first
267+
@start_at_operation_time = doc['operationTime']
268+
else
269+
# The above may set @start_at_operation_time to nil
270+
# if it was not in the document for some reason,
271+
# for consistency set it to nil here as well
272+
@start_at_operation_time = nil
273+
end
188274
@cursor = Cursor.new(view, result, server, disable_retry: true, session: session)
189275
end
190276

@@ -200,6 +286,32 @@ def aggregate_spec(session)
200286

201287
def change_doc
202288
{ fullDocument: ( @options[:full_document] || FULL_DOCUMENT_DEFAULT ) }.tap do |doc|
289+
if resuming?
290+
# We have a resume token once we retrieved any documents.
291+
# However, if the first getMore fails and the user didn't pass
292+
# a resume token we won't have a resume token to use.
293+
# Use start_at_operation time in this case
294+
if @resume_token
295+
# Spec says we need to remove startAtOperationTime if
296+
# one was passed in by user, thus we won't forward it
297+
elsif start_at_operation_time_supported? && @start_at_operation_time
298+
# It is crucial to check @start_at_operation_time_supported
299+
# here - we may have switched to an older server that
300+
# does not support operation times and therefore shouldn't
301+
# try to send one to it!
302+
#
303+
# @start_at_operation_time is already a BSON::Timestamp
304+
doc[:startAtOperationTime] = @start_at_operation_time
305+
else
306+
# Can't resume if we don't have either
307+
raise Mongo::Error::MissingResumeToken
308+
end
309+
else
310+
if options[:start_at_operation_time]
311+
doc[:startAtOperationTime] = time_to_bson_timestamp(
312+
options[:start_at_operation_time])
313+
end
314+
end
203315
doc[:resumeAfter] = @resume_token if @resume_token
204316
doc[:allChangesForCluster] = true if for_cluster?
205317
end
@@ -208,6 +320,29 @@ def change_doc
208320
def send_initial_query(server, session)
209321
initial_query_op(session).execute(server)
210322
end
323+
324+
def time_to_bson_timestamp(time)
325+
if time.is_a?(Time)
326+
seconds = time.to_f
327+
BSON::Timestamp.new(seconds.to_i, ((seconds - seconds.to_i) * 1000000).to_i)
328+
elsif time.is_a?(BSON::Timestamp)
329+
time
330+
else
331+
raise ArgumentError, 'Time must be a Time or a BSON::Timestamp instance'
332+
end
333+
end
334+
335+
def resuming?
336+
!!@resuming
337+
end
338+
339+
def start_at_operation_time_supported?
340+
if @start_at_operation_time_supported.nil?
341+
server = server_selector.select_server(cluster)
342+
@start_at_operation_time_supported = server.description.max_wire_version >= 7
343+
end
344+
@start_at_operation_time_supported
345+
end
211346
end
212347
end
213348
end

lib/mongo/collection/view/change_stream/retryable.rb

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,25 +26,13 @@ module Retryable
2626

2727
def read_with_one_retry
2828
yield
29-
rescue => e
30-
if retryable?(e)
29+
rescue Mongo::Error => e
30+
if e.change_stream_resumable?
3131
yield
3232
else
3333
raise(e)
3434
end
3535
end
36-
37-
def retryable?(error)
38-
network_error?(error) || retryable_operation_failure?(error)
39-
end
40-
41-
def network_error?(error)
42-
[ Error::SocketError, Error::SocketTimeoutError].include?(error.class)
43-
end
44-
45-
def retryable_operation_failure?(error)
46-
error.is_a?(Error::OperationFailure) && error.change_stream_resumable?
47-
end
4836
end
4937
end
5038
end

lib/mongo/cursor.rb

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,46 @@ def each
125125
end
126126
end
127127

128+
# Return one document from the query, if one is available.
129+
#
130+
# Retries once on a resumable error.
131+
#
132+
# This method will wait up to max_await_time_ms milliseconds
133+
# for changes from the server, and if no changes are received
134+
# it will return nil.
135+
#
136+
# @note This method is experimental and subject to change.
137+
#
138+
# @return [ BSON::Document | nil ] A document.
139+
# @api private
140+
def try_next
141+
if @documents.nil?
142+
@documents = process(@initial_result)
143+
# the documents here can be an empty array, hence
144+
# we may end up issuing a getMore in the first try_next call
145+
end
146+
147+
if @documents.empty?
148+
if more?
149+
if exhausted?
150+
kill_cursors
151+
return nil
152+
end
153+
154+
@documents = get_more
155+
end
156+
else
157+
# cursor is closed here
158+
# keep documents as an empty array
159+
end
160+
161+
if @documents
162+
return @documents.shift
163+
end
164+
165+
nil
166+
end
167+
128168
# Get the batch size.
129169
#
130170
# @example Get the batch size.

lib/mongo/database.rb

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -267,9 +267,10 @@ def users
267267
# @option options [ Integer ] :batch_size The number of documents to return per batch.
268268
# @option options [ BSON::Document, Hash ] :collation The collation to use.
269269
# @option options [ Session ] :session The session to use.
270-
# @option options [ BSON::Timestamp ] :start_at_cluster_time Only return changes that occurred
271-
# after the specified timestamp. Any command run against the server will return a cluster time
272-
# that can be used here. Only valid in server versions 4.0+.
270+
# @option options [ BSON::Timestamp ] :start_at_operation_time Only return
271+
# changes that occurred after the specified timestamp. Any command run
272+
# against the server will return a cluster time that can be used here.
273+
# Only recognized by server versions 4.0+.
273274
#
274275
# @note A change stream only allows 'majority' read concern.
275276
# @note This helper method is preferable to running a raw aggregation with a $changeStream

lib/mongo/error.rb

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,19 @@ class Error < StandardError
7070
# @since 2.2.3
7171
CURSOR_NOT_FOUND = 'Cursor not found.'
7272

73+
# Can the change stream on which this error occurred be resumed,
74+
# provided the operation that triggered this error was a getMore?
75+
#
76+
# @example Is the error resumable for the change stream?
77+
# error.change_stream_resumable?
78+
#
79+
# @return [ true, false ] Whether the error is resumable.
80+
#
81+
# @since 2.6.0
82+
def change_stream_resumable?
83+
false
84+
end
85+
7386
# Error label describing commitTransaction errors that may or may not occur again if a commit is
7487
# manually retried by the user.
7588
#
@@ -111,6 +124,7 @@ def add_label(label)
111124

112125
require 'mongo/error/parser'
113126
require 'mongo/error/write_retryable'
127+
require 'mongo/error/change_stream_resumable'
114128
require 'mongo/error/bulk_write_error'
115129
require 'mongo/error/closed_stream'
116130
require 'mongo/error/extra_file_chunk'

0 commit comments

Comments
 (0)