Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try fixed https://github.com/sutugin/spark-streaming-jdbc-source/issu… #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sutugin
Copy link
Owner

@sutugin sutugin commented Apr 15, 2021

Try fixed #5 using sql min/max functions&

Copy link

@roychen11232357 roychen11232357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi bro,

I took a look at this commit, and it is indeed optimized for the performance of offset fetching.

The key of this commit should be this line

val minMaxQuery = s"(select max($offsetColumn) as max_$offsetColumn, min($offsetColumn) as min_$offsetColumn from $dbTable) minMaxTable"

Can this commit be merged back to master? Or are there other unresolved issues?


By the way,

I have tested this commit, I use postgresql to test, I found that every trigger will run these two query

SELECT * FROM (select max(number) as max_number, min(number) as min_number from public.test3) minMaxTable WHERE 1=0
SELECT "max_number","min_number" FROM (select max(number) as max_number, min(number) as min_number from public.test3) minMaxTable

The first query (xxxx where 1=0), which should be generated by spark jdbc, is to get the schema, right?

If we really don't want to call the first query (where 1=0 ) every time, does getOffsetValues ​not use spark jdbc to get the data, but directly use other jdbc libs?

Because I don't think there is a need to get the schema every time the trigger, I don't know if there are other ways to avoid this situation?

@arouel
Copy link

arouel commented May 31, 2021

@sutugin @roychen11232357 I'm curious about the outcome. Any suggestion to go forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

there is no filter condition of getOffset query, which may cause performance issues
3 participants