Skip to content

Commit

Permalink
PIG-5246: Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after u…
Browse files Browse the repository at this point in the history
…pgrading spark to 2 (liyunzhang)

git-svn-id: https://svn.apache.org/repos/asf/pig/trunk@1802880 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information
Liyun Zhang committed Jul 25, 2017
1 parent c5c5dd2 commit c61a195
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 10 deletions.
2 changes: 2 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ OPTIMIZATIONS

BUG FIXES

PIG-5246: Modify bin/pig about SPARK_HOME, SPARK_ASSEMBLY_JAR after upgrading spark to 2 (liyunzhang)

PIG-3655: BinStorage and InterStorage approach to record markers is broken (szita)

PIG-5274: TestEvalPipelineLocal#testSetLocationCalledInFE is failing in spark mode after PIG-5157 (nkollar via szita)
Expand Down
37 changes: 27 additions & 10 deletions bin/pig
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ additionalJars="";
prevArgExecType=false;
isSparkMode=false;
isSparkLocalMode=false;
sparkversion=2;

#verify the execType is SPARK or SPARK_LOCAL or not
function processExecType(){
Expand Down Expand Up @@ -402,18 +403,34 @@ if [ "$isSparkMode" == "true" ]; then
echo "Error: SPARK_HOME is not set!"
exit 1
fi

# Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed each time an application runs.
if [ -z "$SPARK_JAR" ]; then
echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs location of spark-assembly*.jar. This allows YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed each time an application runs."
exit 1
# spark-tags*.jar appears in spark2, spark1 does not include this jar, we use this jar to judge current spark is spark1 or spark2.
SPARK_TAG_JAR=`find $SPARK_HOME -name 'spark-tags*.jar'|wc -l`
if [ "$SPARK_TAG_JAR" -eq 0 ];then
sparkversion="1"
fi

if [ -n "$SPARK_HOME" ]; then
echo "Using Spark Home: " ${SPARK_HOME}
SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
if [ "$sparkversion" == "1" ]; then
# Please specify SPARK_JAR which is the hdfs path of spark-assembly*.jar to allow YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed each time an application runs.
if [ -z "$SPARK_JAR" ]; then
echo "Error: SPARK_JAR is not set, SPARK_JAR stands for the hdfs location of spark-assembly*.jar. This
allows YARN to cache spark-assembly*.jar on nodes so that it doesn't need to be distributed each time an application runs."
exit 1
fi

if [ -n "$SPARK_HOME" ]; then
echo "Using Spark Home: " ${SPARK_HOME}
SPARK_ASSEMBLY_JAR=`ls ${SPARK_HOME}/lib/spark-assembly*`
CLASSPATH=${CLASSPATH}:$SPARK_ASSEMBLY_JAR
fi
fi

if [ "$sparkversion" == "2" ]; then
if [ -n "$SPARK_HOME" ]; then
echo "Using Spark Home: " ${SPARK_HOME}
for f in $SPARK_HOME/jars/*.jar; do
CLASSPATH=${CLASSPATH}:$f
done
fi
fi
fi

#spark-assembly.jar contains jcl-over-slf4j which would create a LogFactory implementation that is incompatible
Expand Down

0 comments on commit c61a195

Please sign in to comment.