Compile Spark and Build Spark Package

29 Nov 2016

Step 1: Download Spark source code from git

Download Link of Apache Spark

Note: Starting version 2.0, Spark is built with Scala 2.11 and SBT 0.13.11 by default.

1
2
3
4
5
# Master development branch
git clone git://github.com/apache/spark.git

# 2.1 maintenance branch with stability fixes on top of Spark 2.1.0
git clone git://github.com/apache/spark.git -b branch-2.1

Step 2: Configure Maven and Compile

Refer to Building Apache Spark for more details.

(Optional) Export the mvn path within spark source code if not installed in your environment.

1
2
export MAVEN_HOME=/app/compiled/spark/build/apache-maven-3.3.9
export PATH=$PATH:$MAVEN_HOME/bin

If using build/mvn with no MAVEN_OPTS set, the script will automate this for you.

1
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

If you want to read from HDFS

1
./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package

Or using SBT to compile to support day-to-day development since it can provide much faster iterative compilation.

1
./build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package

Fix the build error

The following issue gone after reboot the machine

1
2
[error] Required file not found: sbt-interface.jar
[error] See zinc -help for information about locating necessary files

Step 3: Build a Runnable Distribution

1
./dev/make-distribution.sh --name custom-spark --tgz -Phadoop-2.6

<< Older Post     Newer Post >>