-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
RFC: Expand documentation of seek
, position
and related functions
#58024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit intentionally omits expanding the documentation of `mark` etc., which although related, is a somewhat different mechanism.
seek,
position` and related functionsseek
, position
and related functions
LGTM. My understanding is that BGZF implementation does not provide |
The `pos` argument should be a value that can be returned by `position(s)`. | ||
Seeking before the first position should throw an error. Seeking after the | ||
final position should throw an error if the stream is not writable. | ||
If the stream is writable, reading will throw an error, and writing will fill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be okay for read
to return an empty vector instead of throwing an error. That is how IOStream
, and GZipStream
currently behave after seeking past the end.
@@ -190,6 +203,14 @@ end | |||
|
|||
Seek a stream relative to the current position. | |||
|
|||
This is equivalent to `seek(s, position(s) + offset)`, except where seeking would | |||
seek beyond the end of the stream, where `skip` will throw an `EOFError`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to leave the case of skip
beyond the end of the file as implementation-defined behavior.
Allowing skip
to go beyond the end of the stream to match the behavior of IOStream
in TranscodingStreams
is possible, but I think it would require keeping track of a bunch of additional states like zlib does in https://github.com/madler/zlib/blob/5a82f71ed1dfc0bec044d9702463dbdf84ea3b71/gzguts.h#L188-L196, basically making skip lazy and then running any requested skips on every read. https://github.com/madler/zlib/blob/5a82f71ed1dfc0bec044d9702463dbdf84ea3b71/gzread.c#L276-L281
Co-authored-by: Jameson Nash <[email protected]>
This commit intentionally omits expanding the documentation of
mark
etc., which although related, is a somewhat different mechanism.I'm interested in discussion about what the right behaviour is:
Here, I documented
seek
is able to seek past the end of the IO, if the IO is seekable, in which case writing to it will fill in zero bytes. But I don't actually like this. It feels like that API is mixing up the behaviour oftruncate
andseek
. It really feels like this should be an error. It also makes the behaviour ofskip
more awkward, because implementing this by reading a bunch of bytes no longer corresponds to seeking. It also creates a weird mismatch between reading and writing IOs. IMO, the congruence with libc'slseek
is not enough to warrant this. IMO this should throw an exception when the seeking goes out of bounds in either direction.This PR kind of chickens out and doesn't go ahead and say that
position
should return a zero-basedInt
. Having it be more generic is nice for e.g. BGZF's virtualoffset, but it also makes it harder to use. Probably, IOs will not document what theirposition
returns, so in practice, users will just rely on it returning a zero-basedInt
without it being documented.A comment from Jameson suggests
position
's return value should support arithmetic. That makes sense, since it then allowsskip
and relative seeking. However, the only non-integer position type I know of, BGZF's virtual offset, does not support arithmetic. So it feels like documenting this gets the worst of both worlds: It neither enables custom position types, nor do you get the convenience of knowing the return value is a zero-indexedInt
TODO
Closes #57618
Closes #57639