MapReduce是Hadoop中一套分布式的计算框架,分为Map和Reduce两个部分,Map负责数据的整理,Reduce负责数据的汇总。
接上小节案例,采用HDFS集群启动模式:
[root@master sbin]# start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-slave1.out
master: starting datanode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [slave2]
slave2: starting secondarynamenode, logging to /usr/local/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-slave2.out
[root@master sbin]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.9.2/logs/yarn-root-resourcemanager-master.out
slave2: starting nodemanager, logging to /usr/local/hadoop-2.9.2/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /usr/local/hadoop-2.9.2/logs/yarn-root-nodemanager-slave1.out
master: starting nodemanager, logging to /usr/local/hadoop-2.9.2/logs/yarn-root-nodemanager-master.out
I love spark
I like flink
I love hadoop
I like hadoop
[root@master sbin]# hadoop fs -mkdir /hello/input
[root@master sbin]# hadoop fs -put /usr/local/input.txt /hello/input/
hadoop jar从jar文件执行MapReduce任务,之后跟着的是示例程序包的路径。
#hadoop-2.9.2版本
[root@master sbin]# hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /hello/input /hello/output
#hadoop-3.1.3版本
[root@master sbin]# hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /hello/input /hello/output
运行时如果报以下错误:
错误:Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
停止hadoop的运行。 stop-all.sh 编辑mapred-site.xml vim mapred-site.xml
加入以下内容
#/usr/local/hadoop3.1是我的Hadoop的安装目录,改成你的
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop3.1</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop3.1</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop3.1</value>
</property>
再重启Hadoop,操作后如果还报上面的错误。按照下面步骤解决: 1. 在命令行执行 hadoop classpath 获取 hadoop的环境变量。 2. 编辑yarn-site.xml,添加以下信息。
<property>
<name>yarn.application.classpath</name>
<value>复制的Hadoop classpath信息</value>
</property>
再重启Hadoop,再次执行word count一般都会成功。
执行之后,应该会输出一个文件夹 output,在这个文件夹里有两个文件:_SUCCESS 和 part-r-00000。 其中 _SUCCESS 只是用于表达执行成功的空文件,part-r-00000 则是处理结果,当我们显示一下它的内容。
[root@master sbin]# hadoop fs -ls /hello/output
Found 2 items
-rw-r--r-- 3 root supergroup 0 2022-06-01 21:28 /hello/output/_SUCCESS
-rw-r--r-- 3 root supergroup 43 2022-06-01 21:28 /hello/output/part-r-00000
[root@master sbin]# hadoop fs -cat /hello/output/part-r-00000
I 4
flink 1
hadoop 2
like 2
love 2
spark 1
hadoop fs -rm -r /hello/output