hadoop 分布式集群搭建过程
总体思路,所有配置在NameNode主机上配置,然后分别复制到其他主机上
1.主要进程分部在不同的主机上,部署如下
1.1 hadoop05 192.168.99.5 NameNode (在core-site.xml、hdfs-site.xml中配置)
1.2 hadoop06 192.168.99.6 SecondNameNode (在masters、hdfs-site.xml中配置)
1.3 hadoop07 192.168.99.7 ResouceManager (在yarn-site.xml中配置)
1.4 hadoop08 192.168.99.8 JobHistoryServer (在mapred-site.xml中配置)
1.5 hadoop09 192.168.99.9 DataNode NodeManager (在slaves中配置)
1.6 hadoop10 192.168.99.10 DataNode NodeManager (在slaves中配置)
2.配置/etc/hosts
#Namenode
hadoop05 192.168.99.5
#SecondNamenode
hadoop06 192.168.99.6
#ResouceManager
hadoop07 192.168.99.7
#JobHistoryServer
hadoop08 192.168.99.8
#Datanode NodeManager
hadoop09 192.168.99.9
#Datanode NodeManager
hadoop10 192.168.99.10
3.配置SSH免密码输入登录
3.1 生成密钥文件 ssh-keygen -t rsa,回车至命令结束
3.2 在/root/.ssh/目录下会生成id_rsa.pub文件,将此文件复制一份重命名为authorized_keys,命令:cp id_rsa.put authorized_keys
3.3 验证: ssh localhost 不再需要输入密码
4.添加SSH授权,主机之间SSH登录不需要输入密码
4.1 需要在新增加的主机上执行ssh-copy-id -i hadoop05将此新增加的主机的SSH无密码登录授权给hadoop05
4.2 只需要将非NameNode执行,然后将NameNode上authorized_keys再复制到其他主机上即可
5.配置环境变量
5.1 编辑/etc/profile,在最后一行增加以下内容
#-----------------------------hadoop ENV---------------------------------
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JRE_HOME/lib
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export PATH=$PATH:$CLASSPATH:$HADOOP_HOME:.:$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin
#-----------------------------hadoop ENV---------------------------------
5.2 编辑过/etc/profile后,需要source /etc/profile加载一下新的环境变量
5.3 验证环境变量 echo $JAVA_HOME $HADOOP_HOME,会输出所配置的路径
6.配置HDFS
6.1 配置core-site.xml
fs.defaultFS指定NameNode文件系统,开启回收站功能,fs.trash.interval 单位为分钟,0 表示关闭回收站
<!--fs.default.name for MRV1 ,fs.defaultFS for MRV2(yarn) -->
<property>
<name>fs.defaultFS</name>
<!-- same as dfs.federation.nameservices in hdfs-site.xml-->
<value>hdfs://Hadoop05:9000</value>
</property>
<property>
<!-- trash callbak 0 disable unite is minutes -->
<name>fs.trash.interval</name>
<value>0</value>
</property>
<property>
<!-- The unit is minutes-->
<name>fs.trash.checkpoint.interval</name>
<value>10080</value>
</property>
<!-- web ui 页面,想通过application id 查看作业详情或者 hdfs 路径,报错没有权限: -->
<!-- 你需要修改 value 值为正确的hadoop属主用户 -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
6.2 配置hdfs-site.xml
dfs.namenode.name.dir指定NameNode存储meta和editlog的目录,
dfs.datanode.data.dir指定DataNode存储blocks的目录,
dfs.namenode.http-address指定NameNode地址,
dfs.namenode.secondary.http-address指定Secondary NameNode地址(分布式时通常会把SecondNameNode单独部署在一台主机上),
dfs.webhdfs.enabled开启WebHDFS。
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>Hadoop05:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Hadoop06:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
7.配置MapReduce
7.1 配置mapred-site.xml
配置使用yarn计算框架,以及jobhistory的地址
<property>
<name>mapreduce.shuffle.port</name>
<value>8017</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,
$HADOOP_YARN_HOME/lib/*
</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Hadoop07:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Hadoop07:19888</value>
</property>
7.2 配置yarn-site.xml
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Hadoop07:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Hadoop07:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Hadoop07:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Hadoop07:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Hadoop07:8088</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$YARN_HOME/share/hadoop/yarn/*,
$YARN_HOME/share/hadoop/yarn/lib/*,
$YARN_HOME/share/hadoop/mapreduce/*,
$YARN_HOME/share/hadoop/mapreduce/lib/*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/usr/local/hadoop/yarn_data/hdfs/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/usr/local/hadoop/yarn_data/hdfs/logs</value>
</property>
<property>
<description>Where to aggregate logs</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/usr/local/hadoop/yarn_data/hdfs/rm-logs</value>
</property>
8.配置hadoop和yarn的JAVA_HOME环境变量
8.1 配置hadoop-env.sh和yarn-env.sh,在JAVA_HOME配置区域,配置如下,路径按照实际情况改变
export JAVA_HOME=/usr/local/jdk
9.配置集群分布
9.1 配置masters,指定SecondNameNode所部署的主机的hostname
Hadoop06
9.2 配置slaves,指定DataNode/NodeManager所部署的主机的hostname
Hadoop09
Hadoop10