-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLUGIN-498 Add integration test for GCS Sink/Multisink content type #1083
base: develop
Are you sure you want to change the base?
PLUGIN-498 Add integration test for GCS Sink/Multisink content type #1083
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@Test | ||
public void testMultiSinkContentType() throws Exception { | ||
String bucketName = "cask-gcs-multisink-" + UUID.randomUUID().toString(); | ||
Bucket bucket = createBucket(bucketName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to delete the bucket at the end of the test.
As a best practice, all the resources created in the test should be deleted upon test assertion completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already doing it, if you follow the method createBucket(bucketName)
the bucket name is marked for delete after the test is done, also we run the the test and the bucket is deleted when the test is finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix and squash
inputSourceConfig.put("format", "csv"); | ||
inputSourceConfig.put("referenceName", "source_" + UUID.randomUUID().toString()); | ||
inputSourceConfig.put("project", getProjectId()); | ||
inputSourceConfig.put("path", createPath(bucket, inputPath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you make it a macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GCSMultiBatchSink if the format is macro (null) this always fails with null pointer exception.
Please check the line 75 in this file:
src/main/java/io/cdap/plugin/gcp/gcs/sink/GCSMultiBatchSink.java.
That is the reason why we can not implement it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We should still include it as a macro and have that bug fixed.
Could you please file a corresponding jira?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Jira Ticket regarding null pointer exception in GCSMultiBatchSink if the format is macro:
https://cdap.atlassian.net/browse/PLUGIN-553
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
@@ -736,7 +741,7 @@ public void testGcsSourceFormats() throws Exception { | |||
ETLBatchConfig.Builder pipelineConfig = ETLBatchConfig.builder().addStage(source); | |||
for (String format : formats) { | |||
String path = String.format("%s/%s/%s", createPath(bucket, OUTPUT_BLOB_NAME), format, suffix); | |||
ETLStage sink = new ETLStage(format, createSinkPlugin(format, path, schema)); | |||
ETLStage sink = new ETLStage(format, createSinkPlugin(format, path, schema, DEFAULT_CONTENT_TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to specify this? If nothing is specified, the sink would interpret default content type right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not specified, the integration test still passes, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
7c04d82
to
435b096
Compare
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make all the fields as macro except GCSMultiSink format type?
For GCSMultiSink format type, add a TODO to make it macro with the jira number
return createSinkPlugin(format, path, schema, null); | ||
} | ||
|
||
private ETLPlugin createSinkPlugin(String format, String path, Schema schema, String contentType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If contentType can be nullable, annotate it
Schema.Field.of("id", Schema.of(Schema.Type.INT)), | ||
Schema.Field.of("name", Schema.nullableOf(Schema.of(Schema.Type.STRING))), | ||
Schema.Field.of("email", Schema.nullableOf(Schema.of(Schema.Type.STRING))) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for separate line
String line3 = "3,Jack Ferguson,[email protected],DepartmentA"; | ||
String inputPath = "input"; | ||
|
||
bucket.create(inputPath, String.join("\n", Arrays.asList(line1, line2, line3)).getBytes(StandardCharsets.UTF_8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not have existing function for this? how do we insert values for other gcs tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no existing function for this, also other test use same logic.
cdap-integration-tests/integration-test-remote/src/test/java/io/cdap/cdap/app/etl/gcp/GCSTest.java
Lines 712 to 716 in d9ce2a7
String line1 = "1,Marilyn Hawkins,[email protected]"; | |
String line2 = "2,Terry Perez,[email protected]"; | |
String line3 = "3,Jack Ferguson,[email protected]"; | |
String inputPath = "input"; | |
bucket.create(inputPath, String.join("\n", Arrays.asList(line1, line2, line3)).getBytes(StandardCharsets.UTF_8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you refactor it into a single method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the updated PR. This part is now handled.
private ETLPlugin createMultiSinkPlugin(String path, Schema schema, String splitField) { | ||
Map<String, String> map = new HashMap<>(); | ||
map.put("path", path); | ||
map.put("format", "${sinkFormat}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If format is a macro, this will fail right?
https://cdap.atlassian.net/browse/PLUGIN-553
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we have comment this line, format will not be macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not comment the line. Lets remove macro and add it as a regular argument along with TODO in similar format
https://github.com/cdapio/cdap/blob/22718583995305aa9898eb54b5c7c2571ff1e3bb/cdap-app-fabric/src/main/java/io/cdap/cdap/internal/app/runtime/plugin/CollectMacroEvaluator.java#L33
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
9adb339
to
d9ce2a7
Compare
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
1 similar comment
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
5067e9e
to
88f5ed7
Compare
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Integration test for issue: https://cdap.atlassian.net/browse/PLUGIN-498