Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test to validate recovery when StorageRead client fails #50

Conversation

prodriguezdefino
Copy link
Collaborator

The most common error, when reading large streams, occurs when iterating on the ServerStream. This new test should be able to demonstrate the pipeline is able to recover the execution and obtain the expected results by restarting the read from the beginning of the split or the last checkpointed offset.


private static Credentials createCredentialsFromFile(String file) {
try {
return GoogleCredentials.fromStream(new FileInputStream(file));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9% of developers fix this issue

PATH_TRAVERSAL_IN: This API (java/io/FileInputStream.(Ljava/lang/String;)V) reads a file whose location might be specified by user input


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

*/
@AutoValue
@PublicEvolving
public abstract class BigQuerySource<OUT>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0% of developers fix this issue

AutoValueCouldNotWrite: Could not write generated class com.google.cloud.flink.bigquery.source.AutoValue_BigQuerySource: javax.annotation.processing.FilerException: Attempt to recreate a file for type com.google.cloud.flink.bigquery.source.AutoValue_BigQuerySource

❗❗ 3 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
flink-connector-bigquery/src/main/java/com/google/cloud/flink/bigquery/common/config/CredentialsOptions.java 34
flink-connector-bigquery/src/main/java/com/google/cloud/flink/bigquery/source/config/BigQueryReadOptions.java 41
flink-connector-bigquery/src/main/java/com/google/cloud/flink/bigquery/common/config/BigQueryConnectOptions.java 34

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

@sonatype-lift
Copy link

sonatype-lift bot commented Jul 2, 2023

🛠 Lift Auto-fix

Some of the Lift findings in this PR can be automatically fixed. You can download and apply these changes in your local project directory of your branch to review the suggestions before committing.1

# Download the patch
curl https://lift.sonatype.com/api/patch/github.com/GoogleCloudDataproc/flink-bigquery-connector/50.diff -o lift-autofixes.diff

# Apply the patch with git
git apply lift-autofixes.diff

# Review the changes
git diff

Want it all in a single command? Open a terminal in your project's directory and copy and paste the following command:

curl https://lift.sonatype.com/api/patch/github.com/GoogleCloudDataproc/flink-bigquery-connector/50.diff | git apply

Once you're satisfied, commit and push your changes in your project.

Footnotes

  1. You can preview the patch by opening the patch URL in the browser.

@jayehwhyehentee
Copy link
Collaborator

/gcbrun

@jayehwhyehentee
Copy link
Collaborator

/gcbrun


private final Iterator<T> realIterator;
private final Double errorPercentage;
private final Random random = new Random();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't prefer using Random. If you absolutely have to, please use a fixed seed so that it's deterministic.

An alternative suggestion is to have it fail on every N-th invocation of next().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using random with seed

This pertains to identifying whether table is BQ native or external, which is relevant if we want to support external tables in the future.
@jayehwhyehentee
Copy link
Collaborator

/gcbrun

@jayehwhyehentee
Copy link
Collaborator

/gcbrun

@vishalkarve15 vishalkarve15 merged commit 22e5a48 into GoogleCloudDataproc:main Nov 28, 2023
4 checks passed
@jayehwhyehentee jayehwhyehentee deleted the test_bqstoragereadapi_error branch December 12, 2023 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants