Table of Contents generated with DocToc
Install HDFS with Hadoop 2.6.x
Download HDFS package and decompress from http://hadoop.apache.org/releases.html
1
2
3
wget http://apache.fayea.com/hadoop/common/hadoop-2.6.5/hadoop-2.6.5.tar.gz
tar -zxvf hadoop-2.6.5.tar.gz
chown -R hadoop:hadoop hadoop-2.6.5
Create directories
1
2
3
4
cd hadoop-2.6.5/
mkdir tmp
mkdir name
mkdir data
Refer to http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/ClusterSetup.html
Edit the configuration files of Hadoop under ./hadoop-2.6.5/etc/hadoop
1
2
export HADOOP_HOME = /myhome/hadoop/hadoop-2.6.5
export PATH = $HADOOP_HOME /bin:$HADOOP_HOME /sbin:$PATH
1
2
3
vim /myhome/hadoop/hadoop-2.6.5/etc/hadoop/hadoop-env.sh
vim /myhome/hadoop/hadoop-2.6.5/etc/hadoop/yarn-env.sh
export JAVA_HOME = /pcc/app/Linux_jdk1.7.0_x86_64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<configuration>
<property>
<name> fs.defaultFS</name>
<value> hdfs://9.111.159.156:9000</value>
</property>
<property>
<name> io.file.buffer.size</name>
<value> 131072</value>
</property>
<property>
<name> hadoop.tmp.dir</name>
<value> file:/myhome/hadoop/hadoop-2.6.5/tmp</value>
<description> Abaseforothertemporarydirectories.</description>
</property>
<property>
<name> hadoop.proxyuser.spark.hosts</name>
<value> *</value>
</property>
<property>
<name> hadoop.proxyuser.spark.groups</name>
<value> *</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<configuration>
<property>
<name> dfs.namenode.secondary.http-address</name>
<value> 9.111.159.156:9001</value>
</property>
<property>
<name> dfs.namenode.name.dir</name>
<value> file:/myhome/hadoop/hadoop-2.6.5/name</value>
</property>
<property>
<name> dfs.datanode.data.dir</name>
<value> file:/myhome/hadoop/hadoop-2.6.5/data</value>
</property>
<property>
<name> dfs.replication</name>
<value> 2</value>
</property>
<property>
<name> dfs.webhdfs.enabled</name>
<value> true</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name> mapreduce.framework.name</name>
<value> yarn</value>
</property>
<property>
<name> mapreduce.jobhistory.address</name>
<value> 9.111.159.156:10020</value>
</property>
<property>
<name> mapreduce.jobhistory.webapp.address</name>
<value> 9.111.159.156:19888</value>
</property>
</configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<configuration>
<property>
<name> yarn.nodemanager.aux-services</name>
<value> mapreduce_shuffle</value>
</property>
<property>
<name> yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value> org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name> yarn.resourcemanager.address</name>
<value> 9.111.159.156:8032</value>
</property>
<property>
<name> yarn.resourcemanager.scheduler.address</name>
<value> 9.111.159.156:8030</value>
</property>
<property>
<name> yarn.resourcemanager.resource-tracker.address</name>
<value> 9.111.159.156:8031</value>
</property>
<property>
<name> yarn.resourcemanager.admin.address</name>
<value> 9.111.159.156:8033</value>
</property>
<property>
<name> yarn.resourcemanager.webapp.address</name>
<value> 9.111.159.156:8088</value>
</property>
</configuration>
1
2
3
9.111.159.156
9.111.159.163
9.111.159.164
Distribute Hadoop dir into nodes
1
2
3
cd /myhome/hadoop
scp -r hadoop-2.6.5/ hadoop@9.111.159.163://myhome/hadoop/
scp -r hadoop-2.6.5/ hadoop@9.111.159.164://myhome/hadoop/
Start HDFS
1
2
3
4
cd /myhome/hadoop/hadoop-2.6.5
./bin/hdfs namenode -format
cd /myhome/hadoop/hadoop-2.6.5/sbin
./start-dfs.sh
Verify the HDFS daemons
1
2
3
4
5
6
7
8
9
10
11
[ hadoop@bjqilitst1 sbin]$ jps
532 NameNode
838 SecondaryNameNode
963 Jps
656 DataNode
[ hadoop@bjqilitst2 hadoop]$ jps
1854 Jps
1782 DataNode
[ hadoop@bjqilitst3 hadoop]$ jps
1169 DataNode
1248 Jps
Verify health by CLI
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
[ hadoop@bjqilitst1 hadoop-2.6.5]$ ./bin/hdfs dfsadmin -report
Configured Capacity: 91005493248 ( 84.76 GB)
Present Capacity: 81165398016 ( 75.59 GB)
DFS Remaining: 81165385728 ( 75.59 GB)
DFS Used: 12288 ( 12 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes ( 3) :
Name: 9.111.159.164:50010 ( bjqilitst3)
Hostname: bjqilitst3
Decommission Status : Normal
Configured Capacity: 30335164416 ( 28.25 GB)
DFS Used: 4096 ( 4 KB)
Non DFS Used: 3317608448 ( 3.09 GB)
DFS Remaining: 27017551872 ( 25.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.06%
Configured Cache Capacity: 0 ( 0 B)
Cache Used: 0 ( 0 B)
Cache Remaining: 0 ( 0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 29 06:34:44 EDT 2016
Name: 9.111.159.156:50010 ( bjqilitst1)
Hostname: bjqilitst1
Decommission Status : Normal
Configured Capacity: 30335164416 ( 28.25 GB)
DFS Used: 4096 ( 4 KB)
Non DFS Used: 3205844992 ( 2.99 GB)
DFS Remaining: 27129315328 ( 25.27 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.43%
Configured Cache Capacity: 0 ( 0 B)
Cache Used: 0 ( 0 B)
Cache Remaining: 0 ( 0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 29 06:34:44 EDT 2016
Name: 9.111.159.163:50010 ( bjqilitst2)
Hostname: bjqilitst2
Decommission Status : Normal
Configured Capacity: 30335164416 ( 28.25 GB)
DFS Used: 4096 ( 4 KB)
Non DFS Used: 3316641792 ( 3.09 GB)
DFS Remaining: 27018518528 ( 25.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.07%
Configured Cache Capacity: 0 ( 0 B)
Cache Used: 0 ( 0 B)
Cache Remaining: 0 ( 0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Oct 29 06:34:44 EDT 2016
Verify health by Hadoop GUI
http://9.111.159.156:50070/dfshealth.html#tab-overview
Once the Hadoop cluster is up and running check the web-ui of the components as described below:
Daemon
Web Interface
Notes
NameNode
http://nn_host:port /
Default HTTP port is 50070.
ResourceManager
http://rm_host:port /
Default HTTP port is 8088.
MapReduce JobHistory Server
http://jhs_host:port /
Default HTTP port is 19888.
Hadoop fs commands
http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/FileSystemShell.html
1
2
3
4
5
6
7
8
9
./bin/hadoop fs -help
hadoop fs -ls /
hadoop fs -ls -R /
hadoop fs -mkdir /dir
hadoop fs -put <local file path> <hdfs file path>
hadoop fs -get <hdfs file path> <local file path>
hadoop fs -text <HDFS file>
hadoop fs -rm <HDFS file>
hadoop fs -rmr <HDFS directory>
Enable Spark History Server
Refer to doc http://spark.apache.org/docs/latest/monitoring.html
Enable Spark History Server on HDFS by edit /myhome/hadoop/spark-1.6.2-bin-hadoop2.6/conf/spark-defaults.conf
1
2
3
spark.eventLog.enabled true
spark.eventLog.dir hdfs : //9.111.159.156:9000/hadoop/logdir
spark.history.fs.logDirectory hdfs : //9.111.159.156:9000/hadoop/logdir
Start the Spark history server
1
./sbin/start-history-server.sh
Open the Historty Server Web GUI
http://9.111.159.156:18080