Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-safe read-only views #12

Open
ptitjes opened this issue Jul 28, 2022 · 18 comments
Open

Type-safe read-only views #12

ptitjes opened this issue Jul 28, 2022 · 18 comments

Comments

@ptitjes
Copy link

ptitjes commented Jul 28, 2022

In most of my use-cases, I need a read-only view of buffers (both Buffer and CompositeBuffer) with:

  • absolute read methods (i.e. random-access read)
  • relative read methods (i.e. sequential read)

It seems from e5l/view-and-refcount and whyoleg/resources that we are taking the route of having reference-counted underlying storages. So the following design proposal is based on this idea.

As a first step, I would propose that we extract a ReadBuffer interface from Buffer containing:

  • var readIndex: Int relative read index,
  • a new val readLimit: Int exclusive read limit index (always equal to writeIndex in Buffer's default implementation),
  • fun read*() relative read methods,
  • fun get*At(...) absolute read methods,
  • fun copyTo*(...) relative and absolute bulk read methods,
  • a new fun duplicate(): ReadBuffer method that returns a duplicate read buffer (with shared underlying storage but independent readIndex),
  • new fun slice(...): ReadBuffer methods that returns sliced duplicates.

Additionally, we would add the following to the Buffer interface:

  • a new val writeLimit: Int for "symmetry" (always equal to capacity)
  • an override override fun duplicate(): Buffer method that returns a duplicate buffer (with shared underlying storage but independent readIndex and writeIndex).
@whyoleg
Copy link
Collaborator

whyoleg commented Jul 28, 2022

While I thinking, that ReadBuffer looks reasonable, can you provide use case for duplicate/slice methods? The main problem with anything like duplicate is that we can spawn several objects, that will share same underlying memory segment. And f.e. after it, we can send some of duplicates to channel, or another function, or anywhere else, while somewhere else exists another duplicate, which can be mutated, and such situations are really hard to debug. That is why, I would think, that something like takeHead or anything else, that splits buffer in non intersection views is much better.

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

@whyoleg Indeed, all my use-cases are always: write something to the buffer, and then make read-only views (possibly used concurrently) of the already wrote part of the buffer. So indeed, I would prefer to have a contract similar to takeHead (not convinced by the name though) that mutates the current buffer so that it does not intersect the returned new head buffer.

We could then do:

val source: Source = // ...
val buffer: Buffer = // ...

val read = source.readFully(buffer) // Or something similar
val head: ReadBuffer = buffer.takeHead(read)
// Pass head to some external api

Also would it be possible to have an absolute variant of takeHead that takes an additional index: Int parameter (with a check such that this.startIndex + index < this.writeIndex? Maybe changing the current takeHead signature to:

fun takeHead(startIndex: Int = readIndex, endIndex: Int = writeIndex): Buffer

@whyoleg
Copy link
Collaborator

whyoleg commented Jul 28, 2022

Im not sure, that it will be possible to have such an absolute variant, as if so it will split buffer in 3 parts (from 0 to start, from start to end, from end to capacity) and returns center part of it, so not sure, that it will be convenient for anyone.

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

Well, then I could always do the following but that is not very elegant IMO:

val head = buffer.takeHead(endIndex)
head.takeHead(startIndex).close()
// Use head and buffer...

Also I still would need to be able to duplicate read-only views, in order to have multiple read views (possibly accessed concurrently) of the same slice of the underlying memory segment.

@whyoleg
Copy link
Collaborator

whyoleg commented Jul 28, 2022

Can you provide better sample (gist, link to repo) on what you are trying to achieve?
I think that I have an idea on what kind of API/operations are needed for your use case.

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

@whyoleg I will cook a more complete example today. Thanks.

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

So let me try to give a more complete example. This is the context of https://github.com/ptitjes/kzmq. Currently, I use ByteArrays everywhere, but, in the future final API, I want to enable buffer reuse. In my implementation of the PUB ZeroMQ socket, I have to send the same data to multiple sockets. There are channels in the middle and also wrappers around the buffers, but if I remove all the noise it boils down to:

interface SocketHandler {
  val input: Source
  val output: Destination
}

val sockets: List<SocketHandler> = // ... not important
val messages: Channel<ReadBuffer> = // ... not important

// At some point, I launch:
launch {
  while (isActive) {
    val message = messages.receive()
    sockets.forEach { socket ->
       val m = message.duplicate()
       launch {
         socket.output.write(m)
         m.close()
       }
    }
    message.close()
  }
}

// When user wants to send some data:
val data: ReadBuffer = // ... buffer given by the user
messages.send(data)

@whyoleg Is my use-case clearer?

@whyoleg
Copy link
Collaborator

whyoleg commented Jul 28, 2022

Yeah, thx! I see how this can be done. RSocket has similar requirements, so I know what you want

Buuuut 😅 , I'm really interested about 'takeHead'/'slice' use cases, this part of design is much harder from my point of view

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

@whyoleg Yeah I forgot to say that the data ReadBuffer, in my example, can itself be obtained from another socket. I receive frames from the socket that have the following shape:

  1. one byte containing flags
  2. 4 bytes containing size
  3. data of the above size

Either, I would:

  1. read 5 bytes to my buffer, and read in my buffer the flags and size
  2. compact the buffer
  3. read the data to my buffer
  4. take the head (from index 0) to size
  5. make this buffer a read view for my users

Or, I could:

  1. read 5 bytes to my buffer, and read in my buffer the flags and size
  2. read the data to my buffer
  3. take the head (from index 5) to size
  4. make this buffer a read view for my users

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

I feel like the read-only view/duplicate read-only view design is intricately linked to the takeHead/slice design:

val head = buffer.takeHead(size)
val readOnlyHead = head.someOperationToMakeAReadView()

Should someOperationToMakeAReadView steal the content of head, as does takeHead?

@whyoleg
Copy link
Collaborator

whyoleg commented Jul 28, 2022

Thx!
So, In my mind it will look somehow like this:

val buffer: Buffer = //retrieved from socket somehow
buffer
  .stealReadOnly() //will return ReadOnlyBuffer instance, which will be view over buffer underlying storage; after this call, `buffer` will be empty, and have no underlying storage
  .use { frame: ReadOnlyBuffer ->
    val flags = frame.readByte()
    val size = frame.readInt()
    val data: ReadOnlyBuffer = frame.readBuffer(size)
    //do anything with it
  }

Where Buffer and ReadOnlyBuffer are unrelated interfaces (not inherited one from another) with same ancestor Readable which defines simple buffer operations.
Buffer in addition has write operations (via Writable interface) and takeHead(index), steal() and stealReadOnly() - which modify underlying storage in way, that no overlapping is possible
ReadOnlyBuffer it addition has readBuffer, getBufferAt and copy - which return zero copy views

I will summarize my idea in PR and will try to create it today-tomorrow.

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

@whyoleg it would eleganty fit my use case. 👍 Thanks

@ptitjes
Copy link
Author

ptitjes commented Jul 28, 2022

Why not renametakeHead to stealHead? That would make the API even more discoverable.

@whyoleg
Copy link
Collaborator

whyoleg commented Jul 31, 2022

API Prototype (no implementation yet) is here: https://github.com/ktorio/ktor-io/tree/whyoleg/read-only-buffer
Still have a lot of todos, on how it will be better to do one or other things
Will wait for @e5l to be available to discuss this way in more detail, may be he have more ideas
P.S. also there is a prototype API there for accessing underlying components of Buffer for interop with exisisting solutions: f.e. to be able to access ByteBuffer or ByteArray from buffer without copying to interacting with platform APIs like javax.crypto, sockets and so on. Still have no idea how to make such an API really safe

@ptitjes
Copy link
Author

ptitjes commented Aug 8, 2022

Hi @whyoleg, sorry for the late answer. I very much like your design proposal.

However, reading your example, I am wondering whether a BytesSource.read() for a TCP socket would return a full TCP frame or not?

@whyoleg
Copy link
Collaborator

whyoleg commented Aug 8, 2022

Overall it's out of scope of this issue, because it can depend on implementation of source, but I would think, that most of the time it will be not single TCP frame, but limited by some buffer inside implementation. But for now, I have no preferences or ideas on this.

@ptitjes
Copy link
Author

ptitjes commented Aug 8, 2022

OK. But then this changes how we do use buffers. For instance, in your example, I cannot be sure that the returned read only buffer contains all the expected data, right?

@whyoleg
Copy link
Collaborator

whyoleg commented Aug 8, 2022

As I know, when you are doing some TCP work, you can't rely on an idea, that all data will be in one TCP frame. Amount of data in it depends a lot on buffers sizes of client and server. That's why we need to send data length when working with TCP. In my example There is no check for availability of data in buffer, yes, real world code should check, if it enough data left in buffer and then: or steal head, and wait more data, or just wait more data, if buffer has enough capacity, or fail if socket closed, and so on.
I haven't yet experimented on how we can work with sockets via this new buffer/source API, but still I think that it's about how source is implemented: may be there will be a possibility to peak buffer, read length, and if it not enough, await more data. But you also should be aware of TCP flow control, and may be even buffer, which leave underneath of source implementation can be limited to some size to not extend up to OOM, if user is slow to read

And of course, if there will be not enough data in buffer, and you want to read buffer of length bigger than exists - we will fail, don't think that silent returning part of a buffer is a good idea here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants