Skip to content

Commit 072098e

Browse files
Document guidelines for physical operator yielding (#15030)
* Document guidelines for physical operator yielding To start a policy of the behavior physical operator streams should have and drive improvements in this area to allow for timely cancellation. Connects to #14036 and related to pull requests such as #14028. * Remove discussion of tokio coop * Move rationale up * TODO ADD LINK * Say block the CPU rather than pin the CPU * Add a caveat to use the right tool for the situation * Improve documentation of yield guidelines Co-authored-by: Mehmet Ozan Kabak <[email protected]> * Fix newlines and whitespace in comment * Add a link to the cancellation benchmark documented in the README * Fix newlines in benchmarks README --------- Co-authored-by: Mehmet Ozan Kabak <[email protected]>
1 parent 6028474 commit 072098e

File tree

2 files changed

+21
-1
lines changed

2 files changed

+21
-1
lines changed

benchmarks/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,8 @@ The output of `dfbench` help includes a description of each benchmark, which is
333333

334334
## Cancellation
335335

336-
Test performance of cancelling queries
336+
Test performance of cancelling queries.
337+
337338
Queries in DataFusion should stop executing "quickly" after they are
338339
cancelled (the output stream is dropped).
339340

datafusion/physical-plan/src/execution_plan.rs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,13 +260,32 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
260260
/// used.
261261
/// Thus, [`spawn`] is disallowed, and instead use [`SpawnedTask`].
262262
///
263+
/// To enable timely cancellation, the [`Stream`] that is returned must not
264+
/// block the CPU indefinitely and must yield back to the tokio runtime regularly.
265+
/// In a typical [`ExecutionPlan`], this automatically happens unless there are
266+
/// special circumstances; e.g. when the computational complexity of processing a
267+
/// batch is superlinear. See this [general guideline][async-guideline] for more context
268+
/// on this point, which explains why one should avoid spending a long time without
269+
/// reaching an `await`/yield point in asynchronous runtimes.
270+
/// This can be achieved by manually returning [`Poll::Pending`] and setting up wakers
271+
/// appropriately, or the use of [`tokio::task::yield_now()`] when appropriate.
272+
/// In special cases that warrant manual yielding, determination for "regularly" may be
273+
/// made using a timer (being careful with the overhead-heavy system call needed to
274+
/// take the time), or by counting rows or batches.
275+
///
276+
/// The [cancellation benchmark] tracks some cases of how quickly queries can
277+
/// be cancelled.
278+
///
263279
/// For more details see [`SpawnedTask`], [`JoinSet`] and [`RecordBatchReceiverStreamBuilder`]
264280
/// for structures to help ensure all background tasks are cancelled.
265281
///
266282
/// [`spawn`]: tokio::task::spawn
283+
/// [cancellation benchmark]: https://github.com/apache/datafusion/blob/main/benchmarks/README.md#cancellation
267284
/// [`JoinSet`]: tokio::task::JoinSet
268285
/// [`SpawnedTask`]: datafusion_common_runtime::SpawnedTask
269286
/// [`RecordBatchReceiverStreamBuilder`]: crate::stream::RecordBatchReceiverStreamBuilder
287+
/// [`Poll::Pending`]: std::task::Poll::Pending
288+
/// [async-guideline]: https://ryhl.io/blog/async-what-is-blocking/
270289
///
271290
/// # Implementation Examples
272291
///

0 commit comments

Comments
 (0)