-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: allow aggregates inside spread one-to-many and many-to-many embedded resources #4002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
breaking: * Spreading using `...` no longer hoists the aggregates to the top relationships. * The new operator `^` is now used to hoist the aggregate to the top, e.g. `/table?select=^rel(cost.sum())`
While this reads nice at first, I wonder: Is there any real use-case that benefits from it? Aka is there any scenario, where both "use spreading only" and "use spreading + hoisting" give sensible results? |
Mostly the case of to-many spreading/hoisting that I mentioned at the end. It's mostly to avoid confusion when using any of those operations.
In that example, if we want to hoist the curl 'localhost:3000/factories?^processes(^process_costs(cost.sum())) [{"sum": 1000}] If we want to keep the sum at the curl 'localhost:3000/factories?...processes(^process_costs(cost.sum())) [
{"sum": 200},
{"sum": 100},
"..."
] In this case the sensible results that we were implementing (e.g. hoisting the aggregate to the top on to-one spreads) won't happen anymore, they'd need to be explicitly stated by the user by using the specific operator (e.g. using I may be misunderstanding the question though, lmk. Edit: My example needed a hoist in |
I guess my question is, whether the second example, without hoisting, is "sensible". I.e. would anybody use this? I see the syntax from your example and the expected outcome - but is it something "real"? Do the returned values make sense, would somebody actually use this? Certainly not in the contrived example with only a single "sum" column, but multiple rows of it, because it lacks context. I'm skeptical that, in the absence of explicit GROUP BY anywhere, the resulting numbers can actually be made sense of. I'm expecting that, when you return those multiple aggregation results, you will always end up returning some additional columns as an identifier of each of the rows, so that you can tell what the aggregation means. But once you have that identifier... the result with hoisting will be the same, because the identifier will be part of the GROUP BY. So... I don't see it, yet. Imho this adds another way to shoot yourself in the foot with our query syntax, with no sensible output. |
It would be sensible, for to-many relationships at least. My example was wrong though, the spread was meant to be used in the
In the case of to-many relationships they wouldn't be the same. I think a better word for "hoisting" would be "flattening" since we're selecting the columns in the top resource before grouping them. For example, say we want to get the curl 'localhost:3000/factories?...processes(category_id,^process_costs(cost.sum())) [
{"category_id": [1, 2], "sum": [200, 100]}, // Factory with id: 1
{"category_id": [1, 2], "sum": [300, 200]}, // Factory with id: 2
...
] No flattening done here, we return the response in arrays for each factory (my original example doesn't show it in arrays because we return a single But once we hoist it all the way to the top curl 'localhost:3000/factories?^processes(category_id,^process_costs(cost.sum())) [
{"category_id": 1, "sum": 600},
{"category_id": 2, "sum": 400}
]
Yes, that's fair. The user would need to check the hoisting or flattening at every single level to see if the |
WIP: Completes #3041 and continues what was discussed in this comment chain #3640 (comment)
There I commented that:
After trying a tentative implementation and checking many cases, using
group_by
gets quite tedious to maintain in the code or create the queries for the users. Mostly when we have many nested relationships (pending examples).The proposal in this PR is to kinda mix both concepts, that is having an expected result even if they're "useless" but avoiding to use a separate
group_by
. To achieve this we would need to separate the concept of "spreading" and "hoisting". Doing a "spread" means we're removing the surrounding object (or array into-many
relationships) but we won't be hoisting the aggregate to the top. In other words, what we get before using the spread is the same as what we get after, but without the surrounding object/array.That's why we would need a separate operator to hoist the aggregates (and the fields to group by) to the top, something like
^
. For example, in the following request, thecost.sum()
won't be hoisted automatically, but it will remove the surroundingprocess_costs
:curl 'localhost:3000/processes?select=category_id,...process_costs(cost.sum())'
We would now need to use the hoist operator to move the aggregate to the top:
curl 'localhost:3000/processes?select=category_id,^process_costs(cost.sum())'
With this we would always expect that the
GROUP BY
will always be done inside the relationship that contains it (even if it useless), and the only way of changing this is to explicitly hoist it to the top. There is also potential in allowing to hoist the aggregates in to-many spreads.I still vote for this. The "single object" result is what made me doubt in using
...
to hoist or doing an automatic hoisting altogether. I think there is some confusion at simple glance if we did something like:curl 'localhost:3000/factories?...processes(...process_costs(cost.sum()))
Would we want the aggregate
cost.sum()
to go to the top (i.e. for all thefactories
?), or keep it at the level ofprocesses
and just remove theprocesses
array wrapper. Using the hoist operator would clarify what we'd want to get.