Adds e2e tests #65

prashastia · 2023-12-20T03:35:05Z

A simple e2e tests to check read of a small BigQuery Table.

This module is similar to the BigQueryExample. A few changes to count the number of records and log them.

This test reads a simpleTable. Shell script and python script to check the number of records read.

prashastia · 2023-12-20T03:36:38Z

/gcbrun

comments CODECOV_TOKEN usage.

prashastia · 2023-12-20T03:50:48Z

/gcbrun

jayehwhyehentee · 2023-12-20T12:30:48Z

cloudbuild/Dockerfile

+# Install Python and Basic Python Tools (Assuming VM does not have them)
+RUN apt-get -y install python3 && apt clean


Let's remove the assuming postscript. If VM doesn't have them, then it's fine to install like this. But if the VM does, then we must remove this superfluous installation.

cloudbuild/Dockerfile

jayehwhyehentee · 2023-12-20T12:34:24Z

cloudbuild/e2e-test-scripts/table_read.sh

+# We won't run this async as we can wait for a bounded job to succeed or fail.
+gcloud dataproc jobs submit flink --id "$JOB_ID" --jar="$GCS_JAR_LOCATION" --cluster="$CLUSTER_NAME" --region="$REGION" -- --gcp-project "$PROJECT_NAME" --bq-dataset "$DATASET_NAME" --bq-table "$TABLE_NAME" --agg-prop "$AGG_PROP_NAME" --query "$QUERY"
+# Wait for the logs to be saved.
+sleep 20


Aren't logs available as soon as job ends?
If not, then how was 20 seconds decided?

jayehwhyehentee · 2023-12-20T12:38:54Z

cloudbuild/e2e-test-scripts/table_read.sh

+# Now check the success of the job
+
+#Check if query has been set or not.
+if [ -z "$QUERY" ];
+then
+  echo "Run without Query"
+  python3 cloudbuild/python-scripts/parse_logs.py -- --job_id="$JOB_ID" --project_id="$PROJECT_ID" --cluster_name="$CLUSTER_NAME" --region="$REGION" --project_name="$PROJECT_NAME" --dataset_name="$DATASET_NAME" --table_name="$TABLE_NAME"
+  ret=$?
+else
+  echo "Run Query First"
+  python3 cloudbuild/python-scripts/parse_logs.py -- --job_id="$JOB_ID" --project_id="$PROJECT_ID" --cluster_name="$CLUSTER_NAME" --region="$REGION" --project_name="$PROJECT_NAME" --dataset_name="$DATASET_NAME" --table_name="$TABLE_NAME" --query="$QUERY"
+  ret=$?
+fi


No need to explicitly check if job runs with or without a query. Since the same parse_logs module is checking logs, you send query arg every time and check inside the module whether query has a legitimate value or not, and act accordingly.

In this script, simply put:

# Now check the success of the job python3 cloudbuild/python-scripts/parse_logs.py -- --job_id="$JOB_ID" --project_id="$PROJECT_ID" --cluster_name="$CLUSTER_NAME" --region="$REGION" --project_name="$PROJECT_NAME" --dataset_name="$DATASET_NAME" --table_name="$TABLE_NAME" --query="$QUERY" ret=$?

jayehwhyehentee · 2023-12-20T12:40:28Z

cloudbuild/nightly.sh

+case $STEP in
+  # Download maven and all the dependencies
+  init)
+    $MVN spotless:apply


Not needed. We'll use these scripts on main branch, which can only have reviewed and merged code.

jayehwhyehentee · 2023-12-20T12:41:26Z

cloudbuild/nightly.sh

+
+  # Run the small e2e tests
+  e2e_test_small)
+    # 1. Run the simple table test.


please mention bounded as well.

jayehwhyehentee · 2023-12-20T12:45:04Z

cloudbuild/presubmit.sh

Please revert all changes made in this file. This does not affect our nightly pipeline.

cloudbuild/python-scripts/parse_logs.py

jayehwhyehentee · 2023-12-20T12:49:15Z

cloudbuild/python-scripts/parse_logs.py

+
+def get_bq_query_rows(client_project_name):
+    client = bigquery.Client(project=client_project_name)
+    query = 'SELECT count(*) as count FROM `testproject-398714.testing_dataset.largeTable` where EXTRACT(HOUR from ts) = 17 and EXTRACT(DAY from ts) = 17;'


Please don't hard-code the table details.

This line is too long. Maximum allowed characters per line for readability is 80.

I'd suggest copying this file in a cider-v, and applying the editor's recommendations for best practices.

Look up f-strings to break a long string with variables into multiple lines.

jayehwhyehentee · 2023-12-20T12:56:02Z

cloudbuild/python-scripts/parse_logs.py

+import re
+
+
+def get_bq_query_rows(client_project_name):


Should be renamed to get_bq_query_result_row_count

jayehwhyehentee · 2023-12-20T12:59:27Z

cloudbuild/python-scripts/parse_logs.py

+    query_result = query_job.result()
+    records_read = 0
+    for result in query_result:
+        records_read = result[0]


You're only returning the last result in query_results.
What does each result in query_results look like?

Removed this usage.

jayehwhyehentee · 2023-12-20T13:00:22Z

cloudbuild/python-scripts/parse_logs.py

+        records_read = result[0]
+    return records_read
+
+# Remember these are the ones from args.


Remove this comment. Not needed

jayehwhyehentee · 2023-12-20T13:00:57Z

cloudbuild/python-scripts/parse_logs.py

+    return records_read
+
+# Remember these are the ones from args.
+def get_bq_table_rows(client_project_name, project_name, dataset_name, table_name, query):


rename to get_bq_table_row_count

jayehwhyehentee · 2023-12-20T13:03:55Z

cloudbuild/python-scripts/parse_logs.py

+    total_metric_sum_in_blob = 0
+    # Keep on finding the metric value as there can be
+    # 1 or more outputs in a log file.
+    metric_pattern = r'{}\s*(.*?)\s*{}'.format(re.escape(metric_string), re.escape(delimiter))


What does the metric string look like?

jayehwhyehentee · 2023-12-20T13:06:17Z

cloudbuild/python-scripts/parse_logs.py

+    return total_metric_sum_in_blob
+
+def check_query_correctness(logs_as_string):
+    query_records_pattern = r'\[\s(.*),\s(.*)\s\]'


Please share an example of the actual string we expect to match here

jayehwhyehentee · 2023-12-20T13:07:22Z

cloudbuild/python-scripts/parse_logs.py

+        for match in matches:
+            hour = match[0].strip()
+            day = match[1].strip()
+            if hour !='17' or day !='17':


Please mention that these checks are hardcoded to check the filter query from above

jayehwhyehentee · 2023-12-22T08:41:42Z

pom.xml

@@ -69,6 +69,7 @@ under the License.
        <module>flink-connector-bigquery</module>
        <module>flink-sql-connector-bigquery</module>
        <module>flink-connector-bigquery-examples</module>
+		<module>flink-connector-bigquery-integration-test</module>


Indentation is off

jayehwhyehentee · 2023-12-22T08:41:46Z

pom.xml

@@ -440,6 +441,7 @@ under the License.
                                <exclude>**/com/google/cloud/flink/bigquery/source/config/*</exclude>
                                <exclude>**/com/google/cloud/flink/bigquery/table/config/BigQueryConnectorOptions.*</exclude>
                                <exclude>**/com/google/cloud/flink/bigquery/examples/**</exclude>
+								<exclude>**/com/google/cloud/flink/bigquery/integration/**</exclude>


Indentation is off

jayehwhyehentee · 2023-12-22T08:43:08Z

...-test/src/main/java/com/google/cloud/flink/bigquery/integration/BigQueryIntegrationTest.java

+ *       The following cases are tested:
+ *       <ol>
+ *         <li>Reading a Simple Table: This test reads a simple table of 40,000 rows having size 900
+ *             KBs.
+ *         <li>Reading a Table with Complex Schema: This test reads a table with 15 levels (maximum
+ *             number of levels allowed by BigQuery). The table contains 100,000 rows and has a size
+ *             of 2.96 MB.
+ *         <li>Reading a Large Table: This test reads a large table. The table contains __ rows and
+ *             has a size of about 200 GBs.
+ *         <li>Reading a Table with Large Row: This test reads a table with a large row. The table
+ *             contains 100 rows each fo size 45 MB and has a size of about 450 GB.
+ *         <li>Testing a BigQuery Query Run: This tests a BigQuery Query run. The query filters
+ *             certain rows based on a condition, groups the records and finds the AVG of value of a
+ *             column.


This file is not aware of the tests being executed using it's main application. Please remove this part of the description. The nature of tests should be inferred from nightly.yaml and nightly.sh.

… the same.

cloudbuild/python-scripts/parse_logs.py

jayehwhyehentee · 2023-12-22T15:20:11Z

cloudbuild/python-scripts/parse_logs.py

+    if len(argv) > len(acceptable_arguments) + 1:
+        raise app.UsageError(
+            '[Log: parse_logs ERROR] Too many command-line arguments.'
+        )
+    elif len(argv) < len(required_arguments) + 1:
+        raise app.UsageError(
+            '[Log: parse_logs ERROR] Too less command-line arguments.'
+        )


Given that you have a separate validate arguments method, this is not needed. Please remove.

...-test/src/main/java/com/google/cloud/flink/bigquery/integration/BigQueryIntegrationTest.java

…lues.

jayehwhyehentee · 2023-12-27T12:45:47Z

/gcbrun

cloudbuild/nightly.yaml

cloudbuild/python-scripts/parse_logs.py

- Replaces print with log - Adds return in case a log file is not found.

cloudbuild/python-scripts/parse_logs.py

- Removed redundant descriptions in log and error messages.

vishalkarve15 · 2023-12-28T14:02:10Z

/gcbrun

prashastia added 10 commits December 18, 2023 10:54

Adds a new module for nightly tests.

a4d0d25

This module is similar to the BigQueryExample. A few changes to count the number of records and log them.

Modifies the docstring.

f674638

Modifies the docstring.

6ec22dc

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

f0cd825

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

c814bd9

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

f5d3f31

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

b9cad97

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

b3ce92b

This test reads a simpleTable. Shell script and python script to check the number of records read.

Adds simple e2e test, Adds parse_logs.py, Adds table_read.sh

4e6ef67

This test reads a simpleTable. Shell script and python script to check the number of records read.

Modifies IntegrationTest to check query correctness.

dd8e5be

prashastia requested a review from jayehwhyehentee December 20, 2023 03:35

Adds spotless:apply.

137030d

comments CODECOV_TOKEN usage.

jayehwhyehentee reviewed Dec 20, 2023

View reviewed changes

cloudbuild/Dockerfile Outdated Show resolved Hide resolved

jayehwhyehentee reviewed Dec 20, 2023

View reviewed changes

cloudbuild/python-scripts/parse_logs.py Show resolved Hide resolved

jayehwhyehentee reviewed Dec 20, 2023

View reviewed changes

jayehwhyehentee reviewed Dec 22, 2023

View reviewed changes

prashastia and others added 6 commits December 22, 2023 14:37

Addresses review comments in parse_logs.py.

ea0cba5

Addresses review comments in nightly.sh and modifies nightly.yaml for…

05bbd95

… the same.

Addresses review comments in requirements.txt

b731be7

Addresses review comments in table_read.sh

be260e4

Fixes checkstyle violations, addresses review comments.

f172b67

Update pom.xml

37bcd09

jayehwhyehentee reviewed Dec 22, 2023

View reviewed changes

cloudbuild/python-scripts/parse_logs.py Show resolved Hide resolved

prashastia added 2 commits December 22, 2023 17:46

Addresses review comments in table_read.sh

485d4b7

Fixes metric value regex to capture digits.

61e4709

jayehwhyehentee reviewed Dec 22, 2023

View reviewed changes

...-test/src/main/java/com/google/cloud/flink/bigquery/integration/BigQueryIntegrationTest.java Outdated Show resolved Hide resolved

jayehwhyehentee requested a review from vishalkarve15 December 22, 2023 15:32

prashastia added 4 commits December 23, 2023 11:44

Addresses review comments on parse_logs.py

8138578

Fixes indentation problems in pom.xml

ecf38d0

Fixes indentation problems in pom.xml

bd4d02c

modifies BigQueryIntegrationTest.java - removes setting of default va…

5db54a5

…lues.

jayehwhyehentee self-requested a review December 27, 2023 12:45

jayehwhyehentee approved these changes Dec 27, 2023

View reviewed changes

vishalkarve15 reviewed Dec 28, 2023

View reviewed changes

cloudbuild/nightly.yaml Show resolved Hide resolved

cloudbuild/python-scripts/parse_logs.py Outdated Show resolved Hide resolved

cloudbuild/python-scripts/parse_logs.py Show resolved Hide resolved

Modifies parse_logs.py

dae2f18

- Replaces print with log - Adds return in case a log file is not found.

vishalkarve15 reviewed Dec 28, 2023

View reviewed changes

Modifies parse_logs.py

8ef6a15

- Removed redundant descriptions in log and error messages.

vishalkarve15 approved these changes Dec 28, 2023

View reviewed changes

jayehwhyehentee merged commit 57d6e7a into GoogleCloudDataproc:main Dec 28, 2023
5 checks passed

jayehwhyehentee mentioned this pull request Jan 2, 2024

Adds python script for incremental partition insertion. #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds e2e tests #65

Adds e2e tests #65

prashastia commented Dec 20, 2023

prashastia commented Dec 20, 2023

prashastia commented Dec 20, 2023

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023 •

edited

Loading

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023

prashastia Dec 22, 2023

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023

prashastia Dec 22, 2023

jayehwhyehentee Dec 20, 2023

prashastia Dec 22, 2023

jayehwhyehentee Dec 20, 2023

prashastia Dec 22, 2023

jayehwhyehentee Dec 20, 2023 •

edited

Loading

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 20, 2023

jayehwhyehentee Dec 22, 2023

jayehwhyehentee Dec 22, 2023

jayehwhyehentee Dec 22, 2023

prashastia Dec 22, 2023

jayehwhyehentee Dec 22, 2023

jayehwhyehentee commented Dec 27, 2023

vishalkarve15 commented Dec 28, 2023

		# Install Python and Basic Python Tools (Assuming VM does not have them)
		RUN apt-get -y install python3 && apt clean

		import re


		def get_bq_query_rows(client_project_name):

Adds e2e tests #65

Adds e2e tests #65

Conversation

prashastia commented Dec 20, 2023

prashastia commented Dec 20, 2023

prashastia commented Dec 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayehwhyehentee Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayehwhyehentee Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayehwhyehentee commented Dec 27, 2023

vishalkarve15 commented Dec 28, 2023

jayehwhyehentee Dec 20, 2023 •

edited

Loading

jayehwhyehentee Dec 20, 2023 •

edited

Loading