Skip to content

Conversation

@sutugin
Copy link
Owner

@sutugin sutugin commented Apr 15, 2021

Try fixed #5 using sql min/max functions&

Copy link

@roychen11232357 roychen11232357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi bro,

I took a look at this commit, and it is indeed optimized for the performance of offset fetching.

The key of this commit should be this line

val minMaxQuery = s"(select max($offsetColumn) as max_$offsetColumn, min($offsetColumn) as min_$offsetColumn from $dbTable) minMaxTable"

Can this commit be merged back to master? Or are there other unresolved issues?


By the way,

I have tested this commit, I use postgresql to test, I found that every trigger will run these two query

SELECT * FROM (select max(number) as max_number, min(number) as min_number from public.test3) minMaxTable WHERE 1=0
SELECT "max_number","min_number" FROM (select max(number) as max_number, min(number) as min_number from public.test3) minMaxTable

The first query (xxxx where 1=0), which should be generated by spark jdbc, is to get the schema, right?

If we really don't want to call the first query (where 1=0 ) every time, does getOffsetValues ​not use spark jdbc to get the data, but directly use other jdbc libs?

Because I don't think there is a need to get the schema every time the trigger, I don't know if there are other ways to avoid this situation?

@arouel
Copy link

arouel commented May 31, 2021

@sutugin @roychen11232357 I'm curious about the outcome. Any suggestion to go forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

there is no filter condition of getOffset query, which may cause performance issues

4 participants