Skip to content

Commit

Permalink
Spark 3.5 connector
Browse files Browse the repository at this point in the history
  • Loading branch information
vishalkarve15 committed Nov 3, 2023
1 parent 2bd1df6 commit 1acc7e8
Show file tree
Hide file tree
Showing 23 changed files with 599 additions and 15 deletions.
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Next

* PR #1117: Make read session caching duration configurable
* PR #1115: Added new connector, `spark-3.5-bigquery` aimed to be used in Spark 3.5. This connector implements new APIs and capabilities provided by the Spark Data Source V2 API.

## 0.34.0 - 2023-10-31

Expand Down
23 changes: 13 additions & 10 deletions README-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ The latest version of the connector is publicly available in the following links

| version | Link |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Spark 3.5 | `gs://spark-lib/bigquery/spark-3.5-bigquery-${next-release-tag}.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-${next-release-tag}.jar)) |
| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-${next-release-tag}.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-${next-release-tag}.jar)) |
| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-${next-release-tag}.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-${next-release-tag}.jar)) |
| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-${next-release-tag}.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-${next-release-tag}.jar)) |
Expand All @@ -79,16 +80,17 @@ The final two connectors are Scala based connectors, please use the jar relevant
below.

### Connector to Spark Compatibility Matrix
| Connector \ Spark | 2.3 | 2.4 | 3.0 | 3.1 | 3.2 | 3.3 |3.4 |
|---------------------------------------|---------|---------|---------|---------|---------|---------|---------|
| spark-3.4-bigquery | | | | | | | ✓ |
| spark-3.3-bigquery | | | | | | ✓ | ✓ |
| spark-3.2-bigquery | | | | | ✓ | ✓ | ✓ |
| spark-3.1-bigquery | | | | ✓ | ✓ | ✓ | ✓ |
| spark-2.4-bigquery | | ✓ | | | | | |
| spark-bigquery-with-dependencies_2.13 | | | | | ✓ | ✓ | ✓ |
| spark-bigquery-with-dependencies_2.12 | | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| spark-bigquery-with-dependencies_2.11 | ✓ | ✓ | | | | | |
| Connector \ Spark | 2.3 | 2.4 | 3.0 | 3.1 | 3.2 | 3.3 |3.4 | 3.5 |
|---------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|
| spark-3.5-bigquery | | | | | | | | ✓ |
| spark-3.4-bigquery | | | | | | | ✓ | ✓ |
| spark-3.3-bigquery | | | | | | ✓ | ✓ | ✓ |
| spark-3.2-bigquery | | | | | ✓ | ✓ | ✓ | ✓ |
| spark-3.1-bigquery | | | | ✓ | ✓ | ✓ | ✓ | ✓ |
| spark-2.4-bigquery | | ✓ | | | | | | |
| spark-bigquery-with-dependencies_2.13 | | | | | ✓ | ✓ | ✓ | ✓ |
| spark-bigquery-with-dependencies_2.12 | | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| spark-bigquery-with-dependencies_2.11 | ✓ | ✓ | | | | | | |

### Connector to Dataproc Image Compatibility Matrix
| Connector \ Dataproc Image | 1.3 | 1.4 | 1.5 | 2.0 | 2.1 | Serverless<br>Image 1.0 | Serverless<br>Image 2.0 | Serverless<br>Image 2.1 |
Expand All @@ -110,6 +112,7 @@ repository. It can be used using the `--packages` option or the

| version | Connector Artifact |
|------------|------------------------------------------------------------------------------------|
| Spark 3.5 | `com.google.cloud.spark:spark-3.5-bigquery:${next-release-tag}` |
| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:${next-release-tag}` |
| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:${next-release-tag}` |
| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:${next-release-tag}` |
Expand Down
14 changes: 13 additions & 1 deletion cloudbuild/cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,22 @@ steps:
- 'BIGLAKE_CONNECTION_ID=${_BIGLAKE_CONNECTION_ID}'
- 'BIGQUERY_KMS_KEY_NAME=${_BIGQUERY_KMS_KEY_NAME}'

# 4h. Run integration tests concurrently with unit tests (DSv2, Spark 3.5)
- name: 'gcr.io/$PROJECT_ID/dataproc-spark-bigquery-connector-presubmit'
id: 'integration-tests-3.5'
waitFor: ['integration-tests-3.4']
entrypoint: 'bash'
args: ['/workspace/cloudbuild/presubmit.sh', 'integrationtest-3.5']
env:
- 'GOOGLE_CLOUD_PROJECT=${_GOOGLE_CLOUD_PROJECT}'
- 'TEMPORARY_GCS_BUCKET=${_TEMPORARY_GCS_BUCKET}'
- 'BIGLAKE_CONNECTION_ID=${_BIGLAKE_CONNECTION_ID}'
- 'BIGQUERY_KMS_KEY_NAME=${_BIGQUERY_KMS_KEY_NAME}'

# 5. Upload coverage to CodeCov
- name: 'gcr.io/$PROJECT_ID/dataproc-spark-bigquery-connector-presubmit'
id: 'upload-it-to-codecov'
waitFor: ['integration-tests-2.12','integration-tests-2.13','integration-tests-2.4','integration-tests-3.1','integration-tests-3.2','integration-tests-3.3', 'integration-tests-3.4']
waitFor: ['integration-tests-2.12','integration-tests-2.13','integration-tests-2.4','integration-tests-3.1','integration-tests-3.2','integration-tests-3.3', 'integration-tests-3.4', 'integration-tests-3.5']
entrypoint: 'bash'
args: ['/workspace/cloudbuild/presubmit.sh', 'upload-it-to-codecov']
env:
Expand Down
7 changes: 5 additions & 2 deletions cloudbuild/nightly.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ case $STEP in
#coverage report
$MVN test jacoco:report jacoco:report-aggregate -Pcoverage,dsv1_2.12,dsv1_2.13,dsv2
# Run integration tests
$MVN failsafe:integration-test failsafe:verify jacoco:report jacoco:report-aggregate -Pcoverage,integration,dsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4
$MVN failsafe:integration-test failsafe:verify jacoco:report jacoco:report-aggregate -Pcoverage,integration,dsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4,dsv2_3.5
# Run acceptance tests
$MVN failsafe:integration-test failsafe:verify jacoco:report jacoco:report-aggregate -Pcoverage,acceptance,dsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4
$MVN failsafe:integration-test failsafe:verify jacoco:report jacoco:report-aggregate -Pcoverage,acceptance,dsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4,dsv2_3.5
# Upload test coverage report to Codecov
bash <(curl -s https://codecov.io/bash) -K -F "nightly"

Expand Down Expand Up @@ -79,6 +79,9 @@ case $STEP in
gsutil cp "${M2REPO}/com/google/cloud/spark/spark-3.4-bigquery/${BUILD_REVISION}/spark-3.4-bigquery-${BUILD_REVISION}.jar" "gs://${BUCKET}"
gsutil cp "gs://${BUCKET}/spark-3.4-bigquery-${BUILD_REVISION}.jar" "gs://${BUCKET}/spark-3.4-bigquery-nightly-snapshot.jar"

gsutil cp "${M2REPO}/com/google/cloud/spark/spark-3.5-bigquery/${BUILD_REVISION}/spark-3.5-bigquery-${BUILD_REVISION}.jar" "gs://${BUCKET}"
gsutil cp "gs://${BUCKET}/spark-3.5-bigquery-${BUILD_REVISION}.jar" "gs://${BUCKET}/spark-3.5-bigquery-nightly-snapshot.jar"

gsutil cp "${M2REPO}/com/google/cloud/spark/spark-bigquery-metrics/${BUILD_REVISION}/spark-bigquery-metrics-${BUILD_REVISION}.jar" "gs://${BUCKET}"
gsutil cp "gs://${BUCKET}/spark-bigquery-metrics-${BUILD_REVISION}.jar" "gs://${BUCKET}/spark-bigquery-metrics-nightly-snapshot.jar"

Expand Down
8 changes: 6 additions & 2 deletions cloudbuild/presubmit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,13 @@ case $STEP in
# Download maven and all the dependencies
init)
checkenv
$MVN install -DskipTests -Pdsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4
$MVN install -DskipTests -Pdsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4,dsv2_3.5
exit
;;

# Run unit tests
unittest)
$MVN test jacoco:report jacoco:report-aggregate -Pcoverage,dsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4
$MVN test jacoco:report jacoco:report-aggregate -Pcoverage,dsv1_2.12,dsv1_2.13,dsv2_2.4,dsv2_3.1,dsv2_3.2,dsv2_3.3,dsv2_3.4,dsv2_3.5
# Upload test coverage report to Codecov
bash <(curl -s https://codecov.io/bash) -K -F "${STEP}"
;;
Expand Down Expand Up @@ -79,6 +79,10 @@ case $STEP in
$MVN failsafe:integration-test failsafe:verify jacoco:report jacoco:report-aggregate -Pcoverage,integration,dsv2_3.4
;;

integrationtest-3.5)
$MVN failsafe:integration-test failsafe:verify jacoco:report jacoco:report-aggregate -Pcoverage,integration,dsv2_3.5
;;

upload-it-to-codecov)
checkenv
# Upload test coverage report to Codecov
Expand Down
22 changes: 22 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,28 @@
<module>spark-bigquery-pushdown/spark-3.3-bigquery-pushdown_2.13</module>
</modules>
</profile>
<profile>
<id>dsv2_3.5</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<modules>
<module>spark-bigquery-dsv2/spark-bigquery-dsv2-common</module>
<module>spark-bigquery-dsv2/spark-bigquery-dsv2-parent</module>
<module>spark-bigquery-dsv2/spark-3.1-bigquery-lib</module>
<module>spark-bigquery-dsv2/spark-bigquery-metrics</module>
<module>spark-bigquery-dsv2/spark-3.2-bigquery-lib</module>
<module>spark-bigquery-dsv2/spark-3.3-bigquery-lib</module>
<module>spark-bigquery-dsv2/spark-3.4-bigquery-lib</module>
<module>spark-bigquery-dsv2/spark-3.5-bigquery-lib</module>
<module>spark-bigquery-dsv2/spark-3.5-bigquery</module>
<module>spark-bigquery-pushdown/spark-bigquery-pushdown-parent</module>
<module>spark-bigquery-pushdown/spark-bigquery-pushdown-common_2.12</module>
<module>spark-bigquery-pushdown/spark-bigquery-pushdown-common_2.13</module>
<module>spark-bigquery-pushdown/spark-3.3-bigquery-pushdown_2.12</module>
<module>spark-bigquery-pushdown/spark-3.3-bigquery-pushdown_2.13</module>
</modules>
</profile>
<profile>
<id>coverage</id>
<activation>
Expand Down
2 changes: 2 additions & 0 deletions spark-bigquery-dsv2/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,7 @@
<module>spark-3.3-bigquery</module>
<module>spark-3.4-bigquery-lib</module>
<module>spark-3.4-bigquery</module>
<module>spark-3.5-bigquery-lib</module>
<module>spark-3.5-bigquery</module>
</modules>
</project>
40 changes: 40 additions & 0 deletions spark-bigquery-dsv2/spark-3.5-bigquery-lib/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-dsv2-parent</artifactId>
<version>${revision}</version>
<relativePath>../spark-bigquery-dsv2-parent</relativePath>
</parent>

<artifactId>spark-3.5-bigquery-lib</artifactId>
<version>${revision}</version>
<name>Connector code for BigQuery DataSource v2 for Spark 3.5</name>
<properties>
<spark.version>3.5.0</spark.version>
<scala.binary.version>2.13</scala.binary.version>
<shade.skip>true</shade.skip>
</properties>
<licenses>
<license>
<name>Apache License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>spark-3.4-bigquery-lib</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.13</artifactId>
<version>${spark.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
* Copyright 2023 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.google.cloud.spark.bigquery.v2;

import java.util.Map;
import org.apache.spark.sql.connector.catalog.Table;
import org.apache.spark.sql.connector.expressions.Transform;
import org.apache.spark.sql.types.StructType;

public class Spark35BigQueryTableProvider extends Spark34BigQueryTableProvider {
// empty
@Override
public Table getTable(
StructType schema, Transform[] partitioning, Map<String, String> properties) {
return Spark3Util.createBigQueryTableInstance(Spark33BigQueryTable::new, schema, properties);
}

@Override
protected Table getBigQueryTableInternal(Map<String, String> properties) {
return Spark3Util.createBigQueryTableInstance(Spark33BigQueryTable::new, null, properties);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
com.google.cloud.spark.bigquery.v2.TimestampNTZTypeConverter
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
com.google.cloud.spark.bigquery.v2.Spark35BigQueryTableProvider
51 changes: 51 additions & 0 deletions spark-bigquery-dsv2/spark-3.5-bigquery/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-dsv2-parent</artifactId>
<version>${revision}</version>
<relativePath>../spark-bigquery-dsv2-parent</relativePath>
</parent>

<artifactId>spark-3.5-bigquery</artifactId>
<version>${revision}</version>
<name>BigQuery DataSource v2 for Spark 3.5</name>
<properties>
<spark.version>3.5.0</spark.version>
<shade.skip>false</shade.skip>
</properties>
<licenses>
<license>
<name>Apache License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>spark-3.5-bigquery-lib</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>${spark.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>spark-3.3-bigquery-pushdown_2.12
</artifactId>
<version>${project.parent.version}</version>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>spark-3.3-bigquery-pushdown_2.13
</artifactId>
<version>${project.parent.version}</version>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
connector.version=${project.version}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
com.google.cloud.spark.bigquery.v2.Spark35BigQueryTableProvider
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright 2023 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.google.cloud.spark.bigquery.acceptance;

import org.junit.Ignore;

@Ignore // Waiting for the serverless dataproc.sparkBqConnector.uri property
public class Spark35BigNumericDataprocServerlessAcceptanceTest
extends BigNumericDataprocServerlessAcceptanceTestBase {

public Spark35BigNumericDataprocServerlessAcceptanceTest() {
super("spark-3.5-bigquery", "2.1");
}

// tests from superclass

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright 2023 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.google.cloud.spark.bigquery.acceptance;

import org.junit.Ignore;

@Ignore // Waiting for the serverless dataproc.sparkBqConnector.uri property
public class Spark35ReadSheakspeareDataprocServerlessAcceptanceTest
extends ReadSheakspeareDataprocServerlessAcceptanceTestBase {

public Spark35ReadSheakspeareDataprocServerlessAcceptanceTest() {
super("spark-3.5-bigquery", "2.1");
}

// tests from superclass

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright 2023 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.google.cloud.spark.bigquery.acceptance;

import org.junit.Ignore;

@Ignore // spark-3.5-bigquery does not support streaming yet
public class Spark35WriteStreamDataprocServerlessAcceptanceTest
extends WriteStreamDataprocServerlessAcceptanceTestBase {

public Spark35WriteStreamDataprocServerlessAcceptanceTest() {
super("spark-3.5-bigquery", "2.1");
}

// tests from superclass

}
Loading

0 comments on commit 1acc7e8

Please sign in to comment.