Hadoop伪分布式集群安装。
Hadoop伪分布式集群安装
1.修改主机名字和时区
[root@hadoop-master bin]# hostname
jd-fztyr012yt6
[root@hadoop-master bin]# hostnamectl set-hostname hadoop-master
[root@hadoop-master bin]# tzselect
2.配置ssh免密登录
#安装ssh服务
yum install openssh-server
#开启ssh服务
systemctl start sshd
#生成密钥对
ssh-keygen
#将公钥文件放置到授权列表文件authorized_keys中
cp id_rsa.pub authorized_keys
#修改授权列表文件authorized_keys的权限
chmod 600 authorized_keys
#验证是否成功配置ssh免密
ssh hadoop-master
#如果显示以下提示信息,说明免密登录成功!
[root@hadoop-master bin]# ssh hadoop-master
Last login: Thu Apr 20 08:25:05 2023 from 192.168.2.73
3.解压缩Hadoop安装包
把下载好的hadoop-2.7.3.tar.gz,安装包解压安装。
tar -zxvf hadoop-2.7.3.tar.gz -C /usr/local
cd /usr/local
mv hadoop-2.7.3 hadoop
hadoop目录结构简介:
主要配置文件说明:
| 配置文件 | 功能描述 |
|---|---|
| hadoop-env.sh | 配置Hadoop运行所需的环境变量 |
| hadoop-env.sh | 配置YARN运行所需的环境变量 |
| core-site.xml | Hadoop核心全局配置文件,可在其它配置文件中引用该文件 |
| hdfs-site.xml | HDFS配置文件,继承core-site.xml配置文件 |
| mapred-site.xml | MapReduce配置文件,继承core-site.xml配置文件 |
| yarn-site.xml | YARN配置文件,继承core-site.xml配置文件 |
| slaves | Hadoop集群所有从节点(DataNode和NodeManager)列表 |
添加Hadoop环境变量:
vim /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile
修改配置文档:
#####################################################
#hadoop-env.sh
export JAVA_HOME=/usr/local/jdk
#####################################################
#core-site.xml
<configuration>
<!-- 制定HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<!--localhost为linux主机名字-->
<value>hdfs://localhost:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
######################################################
#hdfs-site.xml
<configuration>
<property>
<name>dfs.checkpoint.period</name>
<value>3000</value>
</property>
<!-- 指定HDFS副本的数量(集群下,有多台机,可多份,目前就一台)-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
########################################################
#yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 指定YARN的老大(ResourceManager)的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
<!-- reducer获取数据的方式是shuffle方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-master:8088</value>
</property>
</configuration>
###################################################
#mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
启动hadoop的HDFS。
[root@hadoop-master hadoop]# cd /usr/local/hadoop/bin
#格式化 namenode
[root@hadoop-master bin]# hdfs namenode -format
[root@hadoop-master hadoop]# cd sbin
[root@hadoop-master sbin]# ./start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop-master.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop-master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop-master.out
查看HDFS是否启动成功。
[root@hadoop-master sbin]# jps
19872 SecondaryNameNode
19638 DataNode
19438 NameNode
21150 Jps