-
Notifications
You must be signed in to change notification settings - Fork 24
[skip/helpers] Add join_one()
/join_many()
helpers.
#787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The default type parser from the `pg` NPM library converts these to `Date` objects which then isn't handled nicely in the Skip heap. This instead just keeps the raw string, which clients can handle as they choose. Also adds test coverage of a non-lowercase column, which came up in some user code.
The issue also happens in Wasm. close SkipLabs#633
Some basic package READMEs. which will then render in the browser on npmjs.com for our packages. close SkipLabs#712
This takes all of our Skip runtime packages to 0.0.10, increments all of our core `@skip-wasm/*` and `@skiplang/*` packages (some of which are already at 1.0.x, and none of which users will install, so I left out of the everything-to-0.0.10 edict), and stops versioning the test package.json.
Temporary disconnect hackernews lib check
- Internal packages are managed with symbolic links.
When calling `runWithGc()` without a `synchronizer`, the global lock was released after each iteration, and never re-acquired.
When calling `runWithGc()` without a `synchronizer`, the global lock was released after each iteration, and never re-acquired. The code path did not seem to be actively used, as it would systematically crash.
Increase buffer times to de-flake the expectation tests on our examples, leaving enough time for servers to spin up fully before clients start hitting them. This passed 20x in a row locally just now, but circleCI seems more inconsistent in general so testing up there as well. Marking as draft for now as I expect this may require a couple iterations.
This PR adds a git submodule for the docs_site repo and adjusts the docs workflow to update its contents, removing the need to manually sync the generated files between repos. The workflow for publishing the docs to the docs site is now (from www/README.md): ``` - $ make docs-publish ``` which will suggest to run: ``` Test locally: make docs-serve Push to live site: cd www/docs_site/; git add -A; git commit -m 'update to <commit>'; git push; cd - ``` Once the suggested command ending in `git push` is executed, the live docs site will update in a few seconds.
Before this PR the following reliably fails for me: ``` $ npm install && npm run build $ git clean -Xdf $ npm install && npm run build ``` In the final step, the build sees the symlinks created in the first step that were deleted in the second step as some sort of zombie, where `fs.exists` thinks they do not exist but `fs.symlink` fails because they do. Checking in a shell shows that `ls` is similarly confused. This PR changes the build from checking if the link exists and only if not calling symlink to creating a symlink with a temporary name and then renaming it to the desired name. The temporary names are randomly generated, and POSIX requires the rename to be atomic.
Turns out, the type constraint on |
Iterators are dangerous, they are used to traverse a (materialized or computed-on-the-fly) sequence. But it's Iterator-dependent whether they represent a point in the traversal or the traversal itself. In some cases, they can be cloned, reused, restarted. In others they can't. Most of the times, they can't and must be used linearly. E.g. `it.drop(1).first()` and `_ = it.drop(1); it.first()` may and may not give the same result, depending on the implementation. `next` and `sizeHint` are an exception though. This PR moves the materialization of values from `NonEmptyIterator` (Skip) to `ValuesImpl` (JS), then restricts how `NonEmptyIterator` is used.
…kipLabs#790) @jberdine , this should fix the excess querying you were seeing. Now that we've got framework-managed unique identifiers per-resource-instance, we can just use those as the pg_notify "channel" identifier.
The previous code relied on an assumption that `key`s were _primary_ or at least unique. But it may be useful to index on a foreign (or otherwise non-unique) key column for some data models, to skip the step of re-keying your skip collection. This PR drops that assumption and tweaks the unit test to include a table with non-unique key.
This got left out of some PR recently (looks like SkipLabs#786, my bad), just catching it up now.
Example usage for `join_one()`: ``` type User = { name: string; email: string; }; type Post = { title: string; body: string; author_id: number; }; type PostWithAuthor = { title: string; body: string; author: User; }; ... // The following turns `Post`s and `User`s into `PostWithAuthor`s. join_one(posts, users, { on: "author_id", name: "author", }) ``` Example usage for `join_many()`: ``` type Upvote = { post_id: number; user_id: number; }; type Post = { title: string; body: string; }; type PostWithUpvotes = { title: string; body: string; upvotes: { user_id: number; }[]; }; ... // The following turns `Post`s and `Upvote`s into `PostWithUpvote`s. join_many(posts, upvotes, { on: "post_id", name: "upvotes", }) ```
JoinOneMapper
/JoinManyMapper
helpers.join_one()
/join_many
helpers.
join_one()
/join_many
helpers.join_one()
/join_many()
helpers.
skipruntime-ts/helpers/src/index.ts
Outdated
@@ -14,3 +14,4 @@ export { | |||
export { SkipExternalService } from "./remote.js"; | |||
export { SkipServiceBroker, fetchJSON, type Entrypoint } from "./rest.js"; | |||
export { Count, Max, Min, Sum } from "./utils.js"; | |||
export { join_one, join_many } from "./join.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the snake_case haters going to object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to rename for consistency.
skipruntime-ts/helpers/src/join.ts
Outdated
return values.toArray().map((v: VLeft) => { | ||
const { [this.on]: key_right, ...value_left } = v; | ||
const value_right = { | ||
[this.name]: this.right.getUnique(key_right), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it obvious that this should use getUnique vs mapEntry producing an association for each value of key_right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. The getUnique()
comes from the fact that this is the implementation of JoinOne
. Do you mean that it is not clear whether this should be a left/right/inner join? As it stands, what we have is semantically the equivalent of having a non-nullable foreign key in left
, rather than a general left join.
It could be helpful to provide a helper for the case of a nullable foreign key in left
(which would make the output type Iterable<[K, Omit<VLeft, IdProperty> & Record<JoinedProperty, VRight | null>]>
and replace the getUnique()
with a getArray()[0]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
getUnique()
comes from the fact that this is the implementation ofJoinOne
.
:-) I'm going in the other direction: trying to understand what JoinOne
is meant to do from this code.
Do you mean that it is not clear whether this should be a left/right/inner join? As it stands, what we have is semantically the equivalent of having a non-nullable foreign key in
left
, rather than a general left join.
I suppose I'm thinking of this as just a multimap operation, and not in terms of sql, so my first expectation is that e.g.
h ↦ [{α, l:k}]
ₗ⊔ᵣ k ↦ [v₁, v₂,..., vₙ]
= h ↦ [{α, r:v₁}, {α, r:v₂},..., {α, r:vₙ}]
should hold. SQL joins have not yet become intuitive to me though, so maybe I'm just wrong.
I just don't like getUnique
.
It could be helpful to provide a helper for the case of a nullable foreign key in
left
(which would make the output typeIterable<[K, Omit<VLeft, IdProperty> & Record<JoinedProperty, VRight | null>]>
and replace thegetUnique()
with agetArray()[0]
.
Although getArray()[0]
will not be equivalent if there are multiple rather than zero values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
getUnique()
comes from the fact that this is the implementation ofJoinOne
.:-) I'm going in the other direction: trying to understand what
JoinOne
is meant to do from this code.Do you mean that it is not clear whether this should be a left/right/inner join? As it stands, what we have is semantically the equivalent of having a non-nullable foreign key in
left
, rather than a general left join.I suppose I'm thinking of this as just a multimap operation, and not in terms of sql, so my first expectation is that e.g.
h ↦ [{α, l:k}] ₗ⊔ᵣ k ↦ [v₁, v₂,..., vₙ] = h ↦ [{α, r:v₁}, {α, r:v₂},..., {α, r:vₙ}]
should hold. SQL joins have not yet become intuitive to me though, so maybe I'm just wrong.
Ok I understand now. I think there are two different notions:
- what I had in mind, which basically corresponds to the mundane use case of "I have two collections of objects, the first of which (collection
A
) contains objects with anb_id
field that corresponds to (exactly) one of the objects from the second one (collectionB
), and I want to build a collection of objects where theb_id
field (fromA
) is replaced with the actual corresponding object (fromB
).". It implicitly assumesB
is a simple map (i.e. only one value per key), and that eachb_id
field fromA
actually corresponds to an entry inB
. It is basically the semantics of an SQL join with a non-null foreign key. - what you mentioned, which makes more sense from an "operations on multimaps" perspective, is a generalization of what I had in mind, where the foreign key constraint is lifted (i.e. the
b_id
field ofA
can correspond to multiple objects ofB
). While I think it makes sense to offer that option to users, I expect most of the time people will want the former (i.e. having more than one value per key inB
would be the result of a bug, and thus thegetUnique()
injoin_one()
would rightfully warn them that something went wrong). Maybe the solution is to offer a generic way to enforce that a collection is indeed a simple map? In that case, we can use your semantic forjoin_one()
and keep offering the right guarantees.
I just don't like
getUnique
.It could be helpful to provide a helper for the case of a nullable foreign key in
left
(which would make the output typeIterable<[K, Omit<VLeft, IdProperty> & Record<JoinedProperty, VRight | null>]>
and replace thegetUnique()
with agetArray()[0]
.Although
getArray()[0]
will not be equivalent if there are multiple rather than zero values.
skipruntime-ts/helpers/src/join.ts
Outdated
>( | ||
left: EagerCollection<K, V1>, | ||
right: EagerCollection<V1[IdProperty], V2>, | ||
options: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why these 2 args are packed into a record. Just to try to have named arguments, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's just to mimic named arguments. I'm not sure which is best, but the rationale was that two consecutive positional string arguments might be more error prone.
skipruntime-ts/helpers/src/join.ts
Outdated
right: EagerCollection<V1[IdProperty], V2>, | ||
options: { | ||
on: IdProperty; | ||
name: JoinedProperty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure of this name
name. Is there an established terminology? E.g. it was not clear to me that it had anything to do with the produced result, as opposed to selecting things from the inputs. I don't know, maybe "to"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, name
is not very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on
/name
as the Id/Joined property for both join_one
and join_many
is also potentially confusing...
In join_one
, on
is a foreign-key pointer from left
into right
, with the assumption that it's unique in right
.
In join_many
, on
is a foreign-key pointer from right
into left
, with no uniqueness assumption.
The actual semantics seem good/useful for dealing with SQL one-to-many (join_one, where left is the "many" and right is the "one" table) and many-to-many (join_many, where left is one of the two "many" tables and right is the linking table) relationships but the naming/terminology is super tricky/under-specified here. I wonder if we could borrow that SQL relationship language here instead of talking about left/right/join (with all the connotations those carry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. maybe join_one(left,right,{on,name})
could instead be joinOneToMany({one, many, oneToManyFKey, asCol})
and join_many(left,right,{on,name})
could instead be joinManyToMany({many, linking, linkingFKey, asCol})
or something along those lines...
This line of thinking also suggests an optional third collection argument for joinMany
, corresponding to the other many
table -- e.g. if you wanted to have an array of user objects instead of just user IDs in the upvotes
field of the test case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as
is already much better than name
, yes.
This line of thinking also suggests an optional third collection argument for
joinMany
, corresponding to the othermany
table -- e.g. if you wanted to have an array of user objects instead of just user IDs in theupvotes
field of the test case here.
Yes, this could be an optional through
option, eg.
joinManyToMany(posts, users, {
through: upvotes,
left_id: "post_id", // Not sure about the `left_id`/`right_id` names.
right_id: "user_id",
as: "upvotes",
})
and keep the simple case without a "join table" as:
joinManyToMany(posts, upvotes, {
// `through` and `right_id` are implicitly null/undefined here.
left_id: "post_id",
as: "upvotes",
})
Example usage for
join_one()
:Example usage for
join_many()
: