Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLUGIN-498 Add integration test for GCS Sink/Multisink content type #1083

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ardian4
Copy link
Contributor

@ardian4 ardian4 commented Dec 30, 2020

Integration test for issue: https://cdap.atlassian.net/browse/PLUGIN-498

@google-cla
Copy link

google-cla bot commented Dec 30, 2020

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@Test
public void testMultiSinkContentType() throws Exception {
String bucketName = "cask-gcs-multisink-" + UUID.randomUUID().toString();
Bucket bucket = createBucket(bucketName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to delete the bucket at the end of the test.
As a best practice, all the resources created in the test should be deleted upon test assertion completion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already doing it, if you follow the method createBucket(bucketName) the bucket name is marked for delete after the test is done, also we run the the test and the bucket is deleted when the test is finished

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see.

Copy link
Contributor

@CuriousVini CuriousVini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix and squash

inputSourceConfig.put("format", "csv");
inputSourceConfig.put("referenceName", "source_" + UUID.randomUUID().toString());
inputSourceConfig.put("project", getProjectId());
inputSourceConfig.put("path", createPath(bucket, inputPath));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make it a macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GCSMultiBatchSink if the format is macro (null) this always fails with null pointer exception.
Please check the line 75 in this file:
src/main/java/io/cdap/plugin/gcp/gcs/sink/GCSMultiBatchSink.java.

That is the reason why we can not implement it right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We should still include it as a macro and have that bug fixed.
Could you please file a corresponding jira?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Jira Ticket regarding null pointer exception in GCSMultiBatchSink if the format is macro:
https://cdap.atlassian.net/browse/PLUGIN-553

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -736,7 +741,7 @@ public void testGcsSourceFormats() throws Exception {
ETLBatchConfig.Builder pipelineConfig = ETLBatchConfig.builder().addStage(source);
for (String format : formats) {
String path = String.format("%s/%s/%s", createPath(bucket, OUTPUT_BLOB_NAME), format, suffix);
ETLStage sink = new ETLStage(format, createSinkPlugin(format, path, schema));
ETLStage sink = new ETLStage(format, createSinkPlugin(format, path, schema, DEFAULT_CONTENT_TYPE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to specify this? If nothing is specified, the sink would interpret default content type right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not specified, the integration test still passes, is that correct?

Copy link
Contributor Author

@ardian4 ardian4 Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct.

@google-cla
Copy link

google-cla bot commented Jan 8, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@ardian4 ardian4 force-pushed the feature/PLUGIN-498-content-type-it branch from 7c04d82 to 435b096 Compare January 13, 2021 10:14
@google-cla
Copy link

google-cla bot commented Jan 13, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

Copy link
Contributor

@CuriousVini CuriousVini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make all the fields as macro except GCSMultiSink format type?
For GCSMultiSink format type, add a TODO to make it macro with the jira number

return createSinkPlugin(format, path, schema, null);
}

private ETLPlugin createSinkPlugin(String format, String path, Schema schema, String contentType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If contentType can be nullable, annotate it

Schema.Field.of("id", Schema.of(Schema.Type.INT)),
Schema.Field.of("name", Schema.nullableOf(Schema.of(Schema.Type.STRING))),
Schema.Field.of("email", Schema.nullableOf(Schema.of(Schema.Type.STRING)))
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for separate line

String line3 = "3,Jack Ferguson,[email protected],DepartmentA";
String inputPath = "input";

bucket.create(inputPath, String.join("\n", Arrays.asList(line1, line2, line3)).getBytes(StandardCharsets.UTF_8));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not have existing function for this? how do we insert values for other gcs tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no existing function for this, also other test use same logic.

String line1 = "1,Marilyn Hawkins,[email protected]";
String line2 = "2,Terry Perez,[email protected]";
String line3 = "3,Jack Ferguson,[email protected]";
String inputPath = "input";
bucket.create(inputPath, String.join("\n", Arrays.asList(line1, line2, line3)).getBytes(StandardCharsets.UTF_8));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you refactor it into a single method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this missed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the updated PR. This part is now handled.

private ETLPlugin createMultiSinkPlugin(String path, Schema schema, String splitField) {
Map<String, String> map = new HashMap<>();
map.put("path", path);
map.put("format", "${sinkFormat}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If format is a macro, this will fail right?
https://cdap.atlassian.net/browse/PLUGIN-553

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we have comment this line, format will not be macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@google-cla
Copy link

google-cla bot commented Jan 20, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@ardian4 ardian4 force-pushed the feature/PLUGIN-498-content-type-it branch from 9adb339 to d9ce2a7 Compare January 20, 2021 12:34
@google-cla
Copy link

google-cla bot commented Jan 20, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Jan 21, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@ardian4 ardian4 force-pushed the feature/PLUGIN-498-content-type-it branch from 5067e9e to 88f5ed7 Compare January 22, 2021 12:43
@google-cla
Copy link

google-cla bot commented Jan 22, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants