Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Do we need the bucket argument? #34

Closed
thomastaylor312 opened this issue Jan 3, 2024 · 6 comments
Closed

Discussion: Do we need the bucket argument? #34

thomastaylor312 opened this issue Jan 3, 2024 · 6 comments

Comments

@thomastaylor312
Copy link
Collaborator

Most of the interfaces in this repo have a bucket argument that is passed to most functions. This seemed a little odd to me as most keyvalue clients I've used don't require this. They generally connect to a specific bucket when opening their connection and then all operations are against that bucket. Manually passing the bucket name on every call feels a little clunky, so before we finalize the interface, I wanted to see if we could get away without using it.

My guess at the original reason for having it was to support connecting to multiple DBs/buckets, which is totally valid as you don't want to have vendor specific connection strings inside of code (kinda defeats the purpose of a generic interface). If this is the case, there are two questions that fall out of this:

  • How common is the use case of connecting to two entirely different DBs/buckets? Is that something we should even support in a "generic" interface like this one?
  • Is there any way we could still get this functionality without overly complicating the interface?

Curious on what people think here

@Mossaka
Copy link
Collaborator

Mossaka commented Jan 5, 2024

How common is the use case of connecting to two entirely different DBs/buckets?

If there is no concept of a bucket, then I have a hard time imagining how would we do data transfer from one bucket to another or data aggregations of multiple sources of stores.

Is there any way we could still get this functionality without overly complicating the interface?

On top of my mind, an idea is to define bucket as a Resource and then place all operations that require a bucket under that resource. But then how do we group operations like we did for atomic, batch and readwrite?

@thomastaylor312
Copy link
Collaborator Author

If there is no concept of a bucket, then I have a hard time imagining how would we do data transfer from one bucket to another or data aggregations of multiple sources of stores.

I think if I understand this, the problem is still in how buckets are configured. If a consumer of this wants to connect to 3 different buckets, do they have to configure 3 different connection strings on launch per bucket (which opens up a whole other can of worms as to how they configure that)? What if they want to transfer from one type of database to another (imagine moving data from a cache to a more durable store)? Those all feel like particularly thorny questions.

@Mossaka
Copy link
Collaborator

Mossaka commented Jan 10, 2024

They are great questions!

At the current design, the connection string needed for creating a bucket resource is just a String type.

e.g.

open-bucket: static func(name: string) -> result<bucket, error>;

My original thought for the name parameter is actually an URL which schema is host / platform dependent. Some examples:

  1. URL schema that looks like file:///dir or s3blob://my-bucket
  2. bucket name for some "database" configured outside of the wasm component. It could just be my-bucket

To answer your question

If a consumer of this wants to connect to 3 different buckets, do they have to configure 3 different connection strings on launch per bucket

No, in this case, they just give three different names to open-bucket, and the consumer component assumes the platform they are running has configured a connection string to a keyvalue store.

What if they want to transfer from one type of database to another

If the keyvalue store names are the same, but from different sources. They could use more elaborated URLs like the first example I mentioned above. If they are different, then names are sufficient, I think?

With that said, after I thought a bit more about it, I recognized it would be pretty difficult for the hosts / platforms to implement this API with the same interpretation. They may come up with all sorts of interpretations on how the URL schemas are represented but at least this API gives them the flexibility to do so.

Do you think the answer to your question "how buckets are configured" should be part of the semantics of the interface?

@Mossaka
Copy link
Collaborator

Mossaka commented Jan 10, 2024

Also, the WIT template proposal in component model is related to this discussion so I am going to link it here.

WebAssembly/component-model#172

@thomastaylor312
Copy link
Collaborator Author

With that said, after I thought a bit more about it, I recognized it would be pretty difficult for the hosts / platforms to implement this API with the same interpretation. They may come up with all sorts of interpretations on how the URL schemas are represented but at least this API gives them the flexibility to do so.

Yeah this is the part I'm most worried about as there could be many times a platform might not allow an outbound connection and instead give you one (think something like cloudflare and their built in blobstore). So the connection part would be a noop. That seems less-than-ideal, but I honestly haven't come up with something better. I'll do some more thinking and see what we can come up with

@thomastaylor312
Copy link
Collaborator Author

Resolved in #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants