Skip to content

[skip/helpers] Add join_one()/join_many() helpers. #787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4,258 commits into
base: main
Choose a base branch
from

Conversation

beauby
Copy link
Contributor

@beauby beauby commented Mar 2, 2025

Example usage for join_one():

type User = { name: string; email: string; };
type Post = { title: string; body: string; author_id: number; };
type PostWithAuthor = { title: string; body: string; author: User; };
...
// The following turns `Post`s and `User`s into `PostWithAuthor`s.
join_one(posts, users, {
  on: "author_id",
  name: "author",
})

Example usage for join_many():

type Upvote = { post_id: number; user_id: number; };
type Post = { title: string; body: string; };
type PostWithUpvotes = {
  title: string;
  body: string;
  upvotes: { user_id: number; }[];
};
...
// The following turns `Post`s and `Upvote`s into `PostWithUpvote`s.
join_many(posts, upvotes, {
  on: "post_id",
  name: "upvotes",
})

bennostein and others added 30 commits February 5, 2025 15:51
The default type parser from the `pg` NPM library converts these to
`Date` objects which then isn't handled nicely in the Skip heap. This
instead just keeps the raw string, which clients can handle as they
choose.

Also adds test coverage of a non-lowercase column, which came up in some
user code.
The issue also happens in Wasm.

close SkipLabs#633
Some basic package READMEs. which will then render in the browser on
npmjs.com for our packages.

close SkipLabs#712
This takes all of our Skip runtime packages to 0.0.10, increments all of
our core `@skip-wasm/*` and `@skiplang/*` packages (some of which are
already at 1.0.x, and none of which users will install, so I left out of
the everything-to-0.0.10 edict), and stops versioning the test
package.json.
Temporary disconnect hackernews lib check
- Internal packages are managed with symbolic links.
When calling `runWithGc()` without a `synchronizer`, the global lock
was released after each iteration, and never re-acquired.
When calling `runWithGc()` without a `synchronizer`, the global lock was
released after each iteration, and never re-acquired.

The code path did not seem to be actively used, as it would
systematically crash.
Increase buffer times to de-flake the expectation tests on our examples,
leaving enough time for servers to spin up fully before clients start
hitting them.

This passed 20x in a row locally just now, but circleCI seems more
inconsistent in general so testing up there as well. Marking as draft
for now as I expect this may require a couple iterations.
This PR adds a git submodule for the docs_site repo and adjusts the docs
workflow to update its contents, removing the need to manually sync the
generated files between repos. The workflow for publishing the docs to
the
docs site is now (from www/README.md):
```
- $ make docs-publish
```
which will suggest to run:
```
Test locally: make docs-serve
Push to live site: cd www/docs_site/; git add -A; git commit -m 'update to <commit>'; git push; cd -
```
Once the suggested command ending in `git push` is executed, the live
docs
site will update in a few seconds.
Before this PR the following reliably fails for me:
```
$ npm install && npm run build
$ git clean -Xdf
$ npm install && npm run build
```
In the final step, the build sees the symlinks created in the first step
that were deleted in the second step as some sort of zombie, where
`fs.exists` thinks they do not exist but `fs.symlink` fails because they
do. Checking in a shell shows that `ls` is similarly confused.

This PR changes the build from checking if the link exists and only if not
calling symlink to creating a symlink with a temporary name and then
renaming it to the desired name. The temporary names are randomly generated,
and POSIX requires the rename to be atomic.
@beauby
Copy link
Contributor Author

beauby commented Mar 7, 2025

Like I said, I'm wondering whether we could make the syntax more user-friendly.

Turns out, the type constraint on EagerCollection.map, forcing the mapper's constructor parameters to be ...params: Params with Params extends DepSafe[] prevents us from using an object literal to mimic named parameters (since the object literal won't be DepSafe). I'll merge it with the initial API (constructor(other_collection, join_key, resulting_property)), as it is in helpers and not part of the core API, and we can revisit later.

mbouaziz and others added 12 commits March 7, 2025 16:13
Iterators are dangerous, they are used to traverse a (materialized or
computed-on-the-fly) sequence. But it's Iterator-dependent whether they
represent a point in the traversal or the traversal itself.
In some cases, they can be cloned, reused, restarted. In others they
can't.
Most of the times, they can't and must be used linearly.
E.g. `it.drop(1).first()` and `_ = it.drop(1); it.first()` may and may
not give the same result, depending on the implementation.
`next` and `sizeHint` are an exception though.

This PR moves the materialization of values from `NonEmptyIterator`
(Skip) to `ValuesImpl` (JS), then restricts how `NonEmptyIterator` is
used.
…kipLabs#790)

@jberdine , this should fix the excess querying you were seeing.

Now that we've got framework-managed unique identifiers
per-resource-instance, we can just use those as the pg_notify "channel"
identifier.
The previous code relied on an assumption that `key`s were _primary_ or
at least unique. But it may be useful to index on a foreign (or
otherwise non-unique) key column for some data models, to skip the step
of re-keying your skip collection.

This PR drops that assumption and tweaks the unit test to include a
table with non-unique key.
This got left out of some PR recently (looks like SkipLabs#786, my bad), just
catching it up now.
Example usage for `join_one()`:
```
type User = { name: string; email: string; };
type Post = { title: string; body: string; author_id: number; };
type PostWithAuthor = { title: string; body: string; author: User; };
...
// The following turns `Post`s and `User`s into `PostWithAuthor`s.
join_one(posts, users, {
  on: "author_id",
  name: "author",
})
```

Example usage for `join_many()`:
```
type Upvote = { post_id: number; user_id: number; };
type Post = { title: string; body: string; };
type PostWithUpvotes = {
  title: string;
  body: string;
  upvotes: { user_id: number; }[];
};
...
// The following turns `Post`s and `Upvote`s into `PostWithUpvote`s.
join_many(posts, upvotes, {
  on: "post_id",
  name: "upvotes",
})
```
@beauby beauby changed the title [skip/helpers] Add JoinOneMapper/JoinManyMapper helpers. [skip/helpers] Add join_one()/join_many helpers. Mar 13, 2025
@beauby beauby changed the title [skip/helpers] Add join_one()/join_many helpers. [skip/helpers] Add join_one()/join_many() helpers. Mar 13, 2025
@@ -14,3 +14,4 @@ export {
export { SkipExternalService } from "./remote.js";
export { SkipServiceBroker, fetchJSON, type Entrypoint } from "./rest.js";
export { Count, Max, Min, Sum } from "./utils.js";
export { join_one, join_many } from "./join.js";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the snake_case haters going to object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to rename for consistency.

return values.toArray().map((v: VLeft) => {
const { [this.on]: key_right, ...value_left } = v;
const value_right = {
[this.name]: this.right.getUnique(key_right),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it obvious that this should use getUnique vs mapEntry producing an association for each value of key_right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. The getUnique() comes from the fact that this is the implementation of JoinOne. Do you mean that it is not clear whether this should be a left/right/inner join? As it stands, what we have is semantically the equivalent of having a non-nullable foreign key in left, rather than a general left join.
It could be helpful to provide a helper for the case of a nullable foreign key in left (which would make the output type Iterable<[K, Omit<VLeft, IdProperty> & Record<JoinedProperty, VRight | null>]> and replace the getUnique() with a getArray()[0].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getUnique() comes from the fact that this is the implementation of JoinOne.

:-) I'm going in the other direction: trying to understand what JoinOne is meant to do from this code.

Do you mean that it is not clear whether this should be a left/right/inner join? As it stands, what we have is semantically the equivalent of having a non-nullable foreign key in left, rather than a general left join.

I suppose I'm thinking of this as just a multimap operation, and not in terms of sql, so my first expectation is that e.g.

    h ↦ [{α, l:k}]
ₗ⊔ᵣ k ↦ [v₁, v₂,..., vₙ]
 =  h ↦ [{α, r:v₁}, {α, r:v₂},..., {α, r:vₙ}]

should hold. SQL joins have not yet become intuitive to me though, so maybe I'm just wrong.

I just don't like getUnique.

It could be helpful to provide a helper for the case of a nullable foreign key in left (which would make the output type Iterable<[K, Omit<VLeft, IdProperty> & Record<JoinedProperty, VRight | null>]> and replace the getUnique() with a getArray()[0].

Although getArray()[0] will not be equivalent if there are multiple rather than zero values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getUnique() comes from the fact that this is the implementation of JoinOne.

:-) I'm going in the other direction: trying to understand what JoinOne is meant to do from this code.

Do you mean that it is not clear whether this should be a left/right/inner join? As it stands, what we have is semantically the equivalent of having a non-nullable foreign key in left, rather than a general left join.

I suppose I'm thinking of this as just a multimap operation, and not in terms of sql, so my first expectation is that e.g.

    h ↦ [{α, l:k}]
ₗ⊔ᵣ k ↦ [v₁, v₂,..., vₙ]
 =  h ↦ [{α, r:v₁}, {α, r:v₂},..., {α, r:vₙ}]

should hold. SQL joins have not yet become intuitive to me though, so maybe I'm just wrong.

Ok I understand now. I think there are two different notions:

  • what I had in mind, which basically corresponds to the mundane use case of "I have two collections of objects, the first of which (collection A) contains objects with an b_id field that corresponds to (exactly) one of the objects from the second one (collection B), and I want to build a collection of objects where the b_id field (from A) is replaced with the actual corresponding object (from B).". It implicitly assumes B is a simple map (i.e. only one value per key), and that each b_id field from A actually corresponds to an entry in B. It is basically the semantics of an SQL join with a non-null foreign key.
  • what you mentioned, which makes more sense from an "operations on multimaps" perspective, is a generalization of what I had in mind, where the foreign key constraint is lifted (i.e. the b_id field of A can correspond to multiple objects of B). While I think it makes sense to offer that option to users, I expect most of the time people will want the former (i.e. having more than one value per key in B would be the result of a bug, and thus the getUnique() in join_one() would rightfully warn them that something went wrong). Maybe the solution is to offer a generic way to enforce that a collection is indeed a simple map? In that case, we can use your semantic for join_one() and keep offering the right guarantees.

I just don't like getUnique.

It could be helpful to provide a helper for the case of a nullable foreign key in left (which would make the output type Iterable<[K, Omit<VLeft, IdProperty> & Record<JoinedProperty, VRight | null>]> and replace the getUnique() with a getArray()[0].

Although getArray()[0] will not be equivalent if there are multiple rather than zero values.

>(
left: EagerCollection<K, V1>,
right: EagerCollection<V1[IdProperty], V2>,
options: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why these 2 args are packed into a record. Just to try to have named arguments, or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just to mimic named arguments. I'm not sure which is best, but the rationale was that two consecutive positional string arguments might be more error prone.

right: EagerCollection<V1[IdProperty], V2>,
options: {
on: IdProperty;
name: JoinedProperty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure of this name name. Is there an established terminology? E.g. it was not clear to me that it had anything to do with the produced result, as opposed to selecting things from the inputs. I don't know, maybe "to"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, name is not very helpful.

Copy link
Contributor

@bennostein bennostein Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on/name as the Id/Joined property for both join_one and join_many is also potentially confusing...

In join_one, on is a foreign-key pointer from left into right, with the assumption that it's unique in right.

In join_many, on is a foreign-key pointer from right into left, with no uniqueness assumption.

The actual semantics seem good/useful for dealing with SQL one-to-many (join_one, where left is the "many" and right is the "one" table) and many-to-many (join_many, where left is one of the two "many" tables and right is the linking table) relationships but the naming/terminology is super tricky/under-specified here. I wonder if we could borrow that SQL relationship language here instead of talking about left/right/join (with all the connotations those carry)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. maybe join_one(left,right,{on,name}) could instead be joinOneToMany({one, many, oneToManyFKey, asCol}) and join_many(left,right,{on,name}) could instead be joinManyToMany({many, linking, linkingFKey, asCol}) or something along those lines...

This line of thinking also suggests an optional third collection argument for joinMany, corresponding to the other many table -- e.g. if you wanted to have an array of user objects instead of just user IDs in the upvotes field of the test case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as is already much better than name, yes.

This line of thinking also suggests an optional third collection argument for joinMany, corresponding to the other many table -- e.g. if you wanted to have an array of user objects instead of just user IDs in the upvotes field of the test case here.

Yes, this could be an optional through option, eg.

joinManyToMany(posts, users, {
  through: upvotes,
  left_id: "post_id",  // Not sure about the `left_id`/`right_id` names.
  right_id: "user_id",
  as: "upvotes",
})

and keep the simple case without a "join table" as:

joinManyToMany(posts, upvotes, {
  // `through` and `right_id` are implicitly null/undefined here.
  left_id: "post_id",
  as: "upvotes",
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants