|
| 1 | +- Feature Name: Time Travel Queries |
| 2 | +- Status: draft |
| 3 | +- Start Date: 2016-05-07 |
| 4 | +- Authors: Matt Jibson |
| 5 | +- RFC PR: # |
| 6 | +- Issue PR: [#5963](https://github.com/cockroachdb/cockroach/issues/5963) |
| 7 | + |
| 8 | +# Summary |
| 9 | + |
| 10 | +Add support for `AS OF SYSTEM TIME` to `SELECT` statements to execute |
| 11 | +queries at some past time. Plumb doing things at a timestamp to various |
| 12 | +layers that previously did not need it. |
| 13 | + |
| 14 | +# Motivation |
| 15 | + |
| 16 | +[SQL:2011](https://en.wikipedia.org/wiki/SQL:2011) adds support for time |
| 17 | +travel queries. Our most immediate need for them is to support backups |
| 18 | +at a certain timestamp. (Consider a huge table being backed up. We need |
| 19 | +to page through the data in a way that will be consistent if a connection |
| 20 | +drops and needs to be resumed from where it left off.) |
| 21 | + |
| 22 | +The syntax is: `SELECT * FROM t AS OF SYSTEM TIME '2016-01-01 00:00'`. |
| 23 | +The term `TIME` will be used below as the time travel time. |
| 24 | + |
| 25 | +# Detailed design |
| 26 | + |
| 27 | +SQL queries already create a KV transaction with a timestamp. Hooking |
| 28 | +up the `AS OF SYSTEM TIME` syntax to set the transaction timestamp is |
| 29 | +straightforward. This causes older data in the MVCC table to be fetched. |
| 30 | + |
| 31 | +There are other issues that need to be taken into account. |
| 32 | + |
| 33 | +## Table Schema |
| 34 | + |
| 35 | +Before a SELECT, a table lease is obtained to determine the schema. If the |
| 36 | +schema has changed after `TIME`, we need to instead fetch the old schema |
| 37 | +as well. Since the schemas are stored in the KV store, we can use the same |
| 38 | +technique (setting the KV transaction timestamp) to fetch the old data. |
| 39 | + |
| 40 | +Since this query is known to be readonly, it should not trigger a lease even |
| 41 | +if `TIME` is now and the schema is current. We just need the descriptor at |
| 42 | +`TIME`. |
| 43 | + |
| 44 | +## MVCC Garbage Collection |
| 45 | + |
| 46 | +MVCC currently garbage collects old data after 1 day. If data is requested |
| 47 | +at a time before the oldest value, the MVCC layer currently returns empty |
| 48 | +data (not an error). This will be changed to return an error instead of an |
| 49 | +empty value. The GC will be changed to keep any data required so that it |
| 50 | +can respond to read requests within the GC threshold. For example instead |
| 51 | +of removing data older than 1d, it will keep any data that was valid within |
| 52 | +1d-ago, and discard the rest. |
| 53 | + |
| 54 | +## Timestamp in the Future |
| 55 | + |
| 56 | +The HLC advance itself to the KV timestamp if that timestamp is higher than |
| 57 | +any other time value it has seen. Since `TIME` sets the KV timestamp, this |
| 58 | +would effectively allow queries to advance the HLC of the system to any |
| 59 | +point on the future, which would cause problems. It will thus be an error |
| 60 | +to specify `TIME` greater than the value of `cluster_logical_timestamp()`. |
| 61 | + |
| 62 | +# Unresolved questions |
| 63 | + |
| 64 | +Do we care about the read timestamp cache? Do we care if these statements |
| 65 | +abort other transactions? If we do care does the TODO about the 250ms from |
| 66 | +now offset solve it? |
0 commit comments