Skip to content

Commit 816ebac

Browse files
LuciferYangsunchao
authored andcommitted
[SPARK-42452][BUILD] Remove hadoop-2 profile from Apache Spark 3.5.0
### What changes were proposed in this pull request? This pr aims to remove `hadoop-2` profile from Apache Spark 3.5.0. ### Why are the changes needed? Spark 3.4.0 no longer releases Hadoop2 binary distribtuion(SPARK-42447) and Hadoop 2 GitHub Action job already removed after SPARK-42447, we can remove `hadoop-2` profile from Apache Spark 3.5.0. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes apache#40788 from LuciferYang/SPARK-42452. Authored-by: yangjie01 <[email protected]> Signed-off-by: Chao Sun <[email protected]>
1 parent 5cb1c63 commit 816ebac

File tree

14 files changed

+3
-408
lines changed

14 files changed

+3
-408
lines changed

assembly/README

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ This module is off by default. To activate it specify the profile in the command
99

1010
If you need to build an assembly for a different version of Hadoop the
1111
hadoop-version system property needs to be set as in this example:
12-
-Dhadoop.version=2.7.4
12+
-Dhadoop.version=3.3.5

dev/deps/spark-deps-hadoop-2-hive-2.3

-273
This file was deleted.

dev/run-tests-jenkins.py

-3
Original file line numberDiff line numberDiff line change
@@ -178,9 +178,6 @@ def main():
178178
# Switch to a Maven-based build if the PR title contains "test-maven":
179179
if "test-maven" in ghprb_pull_title:
180180
os.environ["SPARK_JENKINS_BUILD_TOOL"] = "maven"
181-
# Switch the Hadoop profile based on the PR title:
182-
if "test-hadoop2" in ghprb_pull_title:
183-
os.environ["SPARK_JENKINS_BUILD_PROFILE"] = "hadoop2"
184181
if "test-hadoop3" in ghprb_pull_title:
185182
os.environ["SPARK_JENKINS_BUILD_PROFILE"] = "hadoop3"
186183
# Switch the Scala profile based on the PR title:

dev/run-tests.py

-1
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,6 @@ def get_hadoop_profiles(hadoop_version):
217217
"""
218218

219219
sbt_maven_hadoop_profiles = {
220-
"hadoop2": ["-Phadoop-2"],
221220
"hadoop3": ["-Phadoop-3"],
222221
}
223222

dev/test-dependencies.sh

-3
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pmesos -Pkubernetes -Pyarn -Phive \
3434
-Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud"
3535
MVN="build/mvn"
3636
HADOOP_HIVE_PROFILES=(
37-
hadoop-2-hive-2.3
3837
hadoop-3-hive-2.3
3938
)
4039

@@ -85,8 +84,6 @@ $MVN -q versions:set -DnewVersion=$TEMP_VERSION -DgenerateBackupPoms=false > /de
8584
for HADOOP_HIVE_PROFILE in "${HADOOP_HIVE_PROFILES[@]}"; do
8685
if [[ $HADOOP_HIVE_PROFILE == **hadoop-3-hive-2.3** ]]; then
8786
HADOOP_PROFILE=hadoop-3
88-
else
89-
HADOOP_PROFILE=hadoop-2
9087
fi
9188
echo "Performing Maven install for $HADOOP_HIVE_PROFILE"
9289
$MVN $HADOOP_MODULE_PROFILES -P$HADOOP_PROFILE jar:jar jar:test-jar install:install clean -q

docs/building-spark.md

-4
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,6 @@ Example:
7979

8080
./build/mvn -Pyarn -Dhadoop.version=3.3.0 -DskipTests clean package
8181

82-
If you want to build with Hadoop 2.x, enable `hadoop-2` profile:
83-
84-
./build/mvn -Phadoop-2 -Pyarn -Dhadoop.version=2.8.5 -DskipTests clean package
85-
8682
## Building With Hive and JDBC Support
8783

8884
To enable Hive integration for Spark SQL along with its JDBC server and CLI,

hadoop-cloud/pom.xml

-7
Original file line numberDiff line numberDiff line change
@@ -208,13 +208,6 @@
208208
</dependencies>
209209

210210
<profiles>
211-
<!--
212-
hadoop-3 profile is activated by default so hadoop-2 profile
213-
also needs to be declared here for building with -Phadoop-2.
214-
-->
215-
<profile>
216-
<id>hadoop-2</id>
217-
</profile>
218211
<!--
219212
Hadoop 3 simplifies the classpath, and adds a new committer base class which
220213
enables store-specific committers.

pom.xml

-19
Original file line numberDiff line numberDiff line change
@@ -3504,25 +3504,6 @@
35043504
http://hadoop.apache.org/docs/ra.b.c/hadoop-project-dist/hadoop-common/dependency-analysis.html
35053505
-->
35063506

3507-
<profile>
3508-
<id>hadoop-2</id>
3509-
<properties>
3510-
<!-- make sure to update IsolatedClientLoader whenever this version is changed -->
3511-
<hadoop.version>2.7.4</hadoop.version>
3512-
<curator.version>2.7.1</curator.version>
3513-
<commons-io.version>2.4</commons-io.version>
3514-
<!--
3515-
the declaration site above of these variables explains why we need to re-assign them here
3516-
-->
3517-
<hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
3518-
<hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact>
3519-
<hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact>
3520-
<gcs-connector.version>hadoop2-2.2.11</gcs-connector.version>
3521-
<!-- SPARK-36547: Please don't upgrade the version below, otherwise there will be an error on building Hadoop 2.7 package -->
3522-
<scala-maven-plugin.version>4.3.0</scala-maven-plugin.version>
3523-
</properties>
3524-
</profile>
3525-
35263507
<profile>
35273508
<id>hadoop-3</id>
35283509
<!-- Default hadoop profile. Uses global properties. -->

python/pyspark/install.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
DEFAULT_HADOOP = "hadoop3"
2828
DEFAULT_HIVE = "hive2.3"
29-
SUPPORTED_HADOOP_VERSIONS = ["hadoop2", "hadoop3", "without-hadoop"]
29+
SUPPORTED_HADOOP_VERSIONS = ["hadoop3", "without-hadoop"]
3030
SUPPORTED_HIVE_VERSIONS = ["hive2.3"]
3131
UNSUPPORTED_COMBINATIONS = [] # type: ignore
3232

0 commit comments

Comments
 (0)