Enable History Server with HDFS

25 Dec 2016

Table of Contents generated with DocToc

Install HDFS with Hadoop 2.6.x

Download HDFS package and decompress from http://hadoop.apache.org/releases.html

1
2
3
wget http://apache.fayea.com/hadoop/common/hadoop-2.6.5/hadoop-2.6.5.tar.gz
tar -zxvf hadoop-2.6.5.tar.gz
chown -R hadoop:hadoop hadoop-2.6.5

Create directories

1
2
3
4
cd hadoop-2.6.5/
mkdir tmp
mkdir name
mkdir data

Configure Hadoop and Yarn

Refer to http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/ClusterSetup.html

Edit the configuration files of Hadoop under ./hadoop-2.6.5/etc/hadoop

Configure /etc/profile

1
2
export HADOOP_HOME=/myhome/hadoop/hadoop-2.6.5
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Configure JAVA_HOME in hadoop-env.sh and yarn-env.sh

1
2
3
vim /myhome/hadoop/hadoop-2.6.5/etc/hadoop/hadoop-env.sh
vim /myhome/hadoop/hadoop-2.6.5/etc/hadoop/yarn-env.sh
export JAVA_HOME=/pcc/app/Linux_jdk1.7.0_x86_64

Configure ./hadoop-2.6.5/etc/hadoop/core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://9.111.159.156:9000</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/myhome/hadoop/hadoop-2.6.5/tmp</value>
    <description>Abaseforothertemporarydirectories.</description>
  </property>
    <property>
    <name>hadoop.proxyuser.spark.hosts</name>
  <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.spark.groups</name>
    <value>*</value>
  </property>
</configuration>

Configure ./hadoop-2.6.5/etc/hadoop/hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<configuration>
  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>9.111.159.156:9001</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/myhome/hadoop/hadoop-2.6.5/name</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/myhome/hadoop/hadoop-2.6.5/data</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>
</configuration>

Configure ./hadoop-2.6.5/etc/hadoop/mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>9.111.159.156:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>9.111.159.156:19888</value>
  </property>
</configuration>

Configure ./hadoop-2.6.5/etc/hadoop/yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>9.111.159.156:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>9.111.159.156:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>9.111.159.156:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>9.111.159.156:8033</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>9.111.159.156:8088</value>
  </property>
</configuration>

Configure the ./hadoop-2.6.5/etc/hadoop/slaves

1
2
3
9.111.159.156
9.111.159.163
9.111.159.164

Distribute Hadoop dir into nodes

1
2
3
cd /myhome/hadoop
scp -r hadoop-2.6.5/ hadoop@9.111.159.163://myhome/hadoop/
scp -r hadoop-2.6.5/ hadoop@9.111.159.164://myhome/hadoop/

Start HDFS

Format and start Namenode

1
2
3
4
cd /myhome/hadoop/hadoop-2.6.5
./bin/hdfs namenode -format
cd /myhome/hadoop/hadoop-2.6.5/sbin
./start-dfs.sh

Verify the HDFS daemons

1
2
3
4
5
6
7
8
9
10
11
[hadoop@bjqilitst1 sbin]$ jps
532 NameNode
838 SecondaryNameNode
963 Jps
656 DataNode
[hadoop@bjqilitst2 hadoop]$ jps
1854 Jps
1782 DataNode
[hadoop@bjqilitst3 hadoop]$ jps
1169 DataNode
1248 Jps

Verify health by CLI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
[hadoop@bjqilitst1 hadoop-2.6.5]$  ./bin/hdfs dfsadmin -report
Configured Capacity: 91005493248 (84.76 GB)
Present Capacity: 81165398016 (75.59 GB)
DFS Remaining: 81165385728 (75.59 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (3):

Name: 9.111.159.164:50010 (bjqilitst3)
Hostname: bjqilitst3
Decommission Status : Normal
Configured Capacity: 30335164416 (28.25 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3317608448 (3.09 GB)
DFS Remaining: 27017551872 (25.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.06%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 29 06:34:44 EDT 2016


Name: 9.111.159.156:50010 (bjqilitst1)
Hostname: bjqilitst1
Decommission Status : Normal
Configured Capacity: 30335164416 (28.25 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3205844992 (2.99 GB)
DFS Remaining: 27129315328 (25.27 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.43%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 29 06:34:44 EDT 2016


Name: 9.111.159.163:50010 (bjqilitst2)
Hostname: bjqilitst2
Decommission Status : Normal
Configured Capacity: 30335164416 (28.25 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3316641792 (3.09 GB)
DFS Remaining: 27018518528 (25.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.07%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 29 06:34:44 EDT 2016

Verify health by Hadoop GUI

http://9.111.159.156:50070/dfshealth.html#tab-overview

Once the Hadoop cluster is up and running check the web-ui of the components as described below:

Daemon Web Interface Notes
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.

Hadoop fs commands

http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/FileSystemShell.html

1
2
3
4
5
6
7
8
9
./bin/hadoop fs -help
hadoop fs -ls    /
hadoop fs -ls -R   /
hadoop fs -mkdir /dir
hadoop fs -put  <local file path>  <hdfs file path>
hadoop fs -get  <hdfs file path>   <local file path>
hadoop fs -text <HDFS file>
hadoop fs -rm   <HDFS file>
hadoop fs -rmr  <HDFS directory>

Enable Spark History Server

Refer to doc http://spark.apache.org/docs/latest/monitoring.html

Enable Spark History Server on HDFS by edit /myhome/hadoop/spark-1.6.2-bin-hadoop2.6/conf/spark-defaults.conf

1
2
3
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://9.111.159.156:9000/hadoop/logdir
spark.history.fs.logDirectory    hdfs://9.111.159.156:9000/hadoop/logdir

Start the Spark history server

1
./sbin/start-history-server.sh

Open the Historty Server Web GUI

http://9.111.159.156:18080


<< Older Post     Newer Post >>