hadoop,  技术分享

hadoop单机模式部署

1.添加环境变量:

vi etc/hadoop/hadoop-env.sh   及profile里配置:
export JAVA_HOME=/opt/jdk1.8.0_25
export HADOOP_PREFIX=/home/zjy/hadoop

2,修改配置文件:
etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
--3。建议无密码登录认证:
#ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
--4。格式化文件系统:
bin/hdfs namenode -format

--5。启动
sbin/start-dfs.sh
     --6.查看
日志:$HADOOP_HOME/logs
前台查看namenode信息: http://localhost:50070/

       --7。创建HDFS 
 $ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>          Copy the input files into the distributed filesystem: 
         --8。运行自带的例子:
 $ bin/hdfs dfs -put etc/hadoop  input            //将 :/etc/hadoop 装入文件系统
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input output 'dfs[a-z.]+'

$ bin/hdfs dfs -get output output      //从HDFS里取文件 
$ cat output/*                               //查看取出的的文件
$ bin/hdfs dfs -cat output/*              //也是查看文件。
 --9。停止:
sbin/stop-dfs.sh                        //停止hadoop

--YARN (新版本特性)  在单机模式下配置:
修改配置文件:
etc/hadoop/mapred-site.xml:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
-------
etc/hadoop/yarn-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
---------
启动:yarn 

 sbin/start-yarn.sh
停止yarn:
 sbin/stop-yarn.sh
------------------------------------------------
常见问题:
1。/tmp/hadoop-zjy-secondarynamenode.pid: Permission denied
fix:chmod -R 777 tmp/ 

2。Java HotSpot(TM) Client VM warning: You have loaded library /home/zjy/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/11/09 23:09:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
fix:
vi ~/.profile 
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
source ~/.profile

3. 往hdfs里扔文件时,报:
put: File /input/file1.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
:原因:是由于多次format导致dfs版本不一致
fix:删除文件系统,重新:  hdfs namenode -format
或是把文件系统版本改成与 $HADOOP_PREFIX/TMP/dfs/version中的版本一致。 

4。Name node is in safe mode  (安全模式)
fix:bin/hadoop dfsadmin -safemode leave
用户可以通过dfsadmin -safemode value   来操作安全模式,
参数value的说明如下:
enter - 进入安全模式
leave - 强制NameNode离开安全模式
get -   返回安全模式是否开启的信息
wait - 等待,一直到安全模式结束。
-------------
5。org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost/user/zjy/input 
目录没有加进去。
fix:hadoop fs -put conf input

6。2014-11-10 01:01:53,107 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: localhost/127.0.0.1:9000
原因:/etc/hosts文件 127.0.0.1 localhost的映射关系导致
fix:vi /etc/hosts
# 127.0.0.1 localhost
再 stop-all.sh
 重新格式化:hdfs namenode -format

7。There are no datanodes in the cluster.
----------
执行下面命令,重新进行启动dataNode即可。
Hadoop启动 格式化集群 
以下用hadoop用户执行  
hadoop namenode -format -clusterid clustername  
启动hdfs 执行 
start-dfs.sh 
开启 hadoop dfs服务       
启动Yarn 
开启 yarn 资源管理服务 
start-yarn.sh   
启动httpfs 
开启 httpfs 服 务 
httpfs.sh start  
-------------------------------------
8。Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to YFCS-S6-APP/10.200.25.154:9000. Exiting
打开hdfs-site.xml里配置的datanode和namenode对应的目录,分别打开current文件夹里的VERSION,可以看到clusterID项正如日志里记录的一样,确实不一致,修改datanode里VERSION文件的clusterID 与namenode里的一致,再重新启动dfs(执行start-dfs.sh)再执行jps命令可以看到datanode已正常启动。
出现该问题的原因:在第一次格式化dfs后,启动并使用了hadoop,后来又重新执行了格式化命令(hdfs namenode -format),这时namenode的clusterID会重新生成,而datanode的clusterID 保持不变。
如果改成一致不能解决,则删除datanode目录下的文件:rm -rf /home/zjy/hadoop/tmp/dfs/data/current/* 再重启

9.eclipse远程运行hadoop自定义的代码报:Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=Administrator, access=WRITE, inode="/":zjy:supergroup:drwxr-xr-x
:原因:本地windows开发环境未安装ssh,或未配置正确,远程hdfs服务器相关目录没有操作权限
       1。解决:安装ssh,并配置好ssh .在本地添加相关用户,最好能添加到Administrators组。
        2.hdfs服务端 :hdfs dfs -chmod 777 -R /tmp     ;hdfs dfs -chmod 777 -R /user/

实例运行成功:

zjy@zjy:/home/zjy/hadoop$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep /input /output 'dfs[a-z.]+'
Java HotSpot(TM) Client VM warning: You have loaded library /home/zjy/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/11/10 17:52:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/11/10 17:52:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/11/10 17:52:34 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/11/10 17:52:34 INFO input.FileInputFormat: Total input paths to process : 1
14/11/10 17:52:35 INFO mapreduce.JobSubmitter: number of splits:1
14/11/10 17:52:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415670144299_0004
14/11/10 17:52:35 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/11/10 17:52:35 INFO impl.YarnClientImpl: Submitted application application_1415670144299_0004
14/11/10 17:52:35 INFO mapreduce.Job: The url to track the job: http://zjy:8088/proxy/application_1415670144299_0004/
14/11/10 17:52:35 INFO mapreduce.Job: Running job: job_1415670144299_0004
14/11/10 17:52:41 INFO mapreduce.Job: Job job_1415670144299_0004 running in uber mode : false
14/11/10 17:52:41 INFO mapreduce.Job:  map 0% reduce 0%
14/11/10 17:52:47 INFO mapreduce.Job:  map 100% reduce 0%
14/11/10 17:52:54 INFO mapreduce.Job:  map 100% reduce 100%
14/11/10 17:52:54 INFO mapreduce.Job: Job job_1415670144299_0004 completed successfully
14/11/10 17:52:54 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=6
                FILE: Number of bytes written=194175
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=115
                HDFS: Number of bytes written=86
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3082
                Total time spent by all reduces in occupied slots (ms)=3570
                Total time spent by all map tasks (ms)=3082
                Total time spent by all reduce tasks (ms)=3570
                Total vcore-seconds taken by all map tasks=3082
                Total vcore-seconds taken by all reduce tasks=3570
                Total megabyte-seconds taken by all map tasks=3155968
                Total megabyte-seconds taken by all reduce tasks=3655680
        Map-Reduce Framework
                Map input records=1
                Map output records=0
                Map output bytes=0
                Map output materialized bytes=6
                Input split bytes=104
                Combine input records=0
                Combine output records=0
                Reduce input groups=0
                Reduce shuffle bytes=6
                Reduce input records=0
                Reduce output records=0
                Spilled Records=0
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=162
                CPU time spent (ms)=1160
                Physical memory (bytes) snapshot=221036544
                Virtual memory (bytes) snapshot=629686272
                Total committed heap usage (bytes)=137498624
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=11
        File Output Format Counters 
                Bytes Written=86
14/11/10 17:52:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/11/10 17:52:54 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/11/10 17:52:54 INFO input.FileInputFormat: Total input paths to process : 1
14/11/10 17:52:54 INFO mapreduce.JobSubmitter: number of splits:1
14/11/10 17:52:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415670144299_0005
14/11/10 17:52:54 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/11/10 17:52:54 INFO impl.YarnClientImpl: Submitted application application_1415670144299_0005
14/11/10 17:52:54 INFO mapreduce.Job: The url to track the job: http://zjy:8088/proxy/application_1415670144299_0005/
14/11/10 17:52:54 INFO mapreduce.Job: Running job: job_1415670144299_0005
14/11/10 17:53:06 INFO mapreduce.Job: Job job_1415670144299_0005 running in uber mode : false
14/11/10 17:53:06 INFO mapreduce.Job:  map 0% reduce 0%
14/11/10 17:53:12 INFO mapreduce.Job:  map 100% reduce 0%
14/11/10 17:53:17 INFO mapreduce.Job:  map 100% reduce 100%
14/11/10 17:53:18 INFO mapreduce.Job: Job job_1415670144299_0005 completed successfully
14/11/10 17:53:18 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=6
                FILE: Number of bytes written=193153
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=220
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=7
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3259
                Total time spent by all reduces in occupied slots (ms)=2955
                Total time spent by all map tasks (ms)=3259
                Total time spent by all reduce tasks (ms)=2955
                Total vcore-seconds taken by all map tasks=3259
                Total vcore-seconds taken by all reduce tasks=2955
                Total megabyte-seconds taken by all map tasks=3337216
                Total megabyte-seconds taken by all reduce tasks=3025920
        Map-Reduce Framework
                Map input records=0
                Map output records=0
                Map output bytes=0
                Map output materialized bytes=6
                Input split bytes=134
                Combine input records=0
                Combine output records=0
                Reduce input groups=0
                Reduce shuffle bytes=6
                Reduce input records=0
                Reduce output records=0
                Spilled Records=0
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=157
                CPU time spent (ms)=1200
                Physical memory (bytes) snapshot=219996160
                Virtual memory (bytes) snapshot=628498432
                Total committed heap usage (bytes)=137498624
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=86
        File Output Format Counters 
                Bytes Written=0
zjy@zjy:/home/zjy/hadoop$       

留言

您的邮箱地址不会被公开。 必填项已用 * 标注