Compile Spark and Build Spark Package
Step 1: Download Spark source code from git
Download Link of Apache Spark
Note: Starting version 2.0, Spark is built with Scala 2.11 and SBT 0.13.11 by default.
1
2
3
4
5
# Master development branch
git clone git://github.com/apache/spark.git
# 2.1 maintenance branch with stability fixes on top of Spark 2.1.0
git clone git://github.com/apache/spark.git -b branch-2.1
Step 2: Configure Maven and Compile
Refer to Building Apache Spark for more details.
(Optional) Export the mvn path within spark source code if not installed in your environment.
1
2
export MAVEN_HOME=/app/compiled/spark/build/apache-maven-3.3.9
export PATH=$PATH:$MAVEN_HOME/bin
If using build/mvn with no MAVEN_OPTS set, the script will automate this for you.
1
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
If you want to read from HDFS
1
./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
Or using SBT to compile to support day-to-day development since it can provide much faster iterative compilation.
1
./build/sbt -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
Fix the build error
The following issue gone after reboot the machine
1
2
[error] Required file not found: sbt-interface.jar
[error] See zinc -help for information about locating necessary files
Step 3: Build a Runnable Distribution
1
./dev/make-distribution.sh --name custom-spark --tgz -Phadoop-2.6