Hadoop2.6.0分布式部署參考手冊_第1頁
Hadoop2.6.0分布式部署參考手冊_第2頁
Hadoop2.6.0分布式部署參考手冊_第3頁
Hadoop2.6.0分布式部署參考手冊_第4頁
Hadoop2.6.0分布式部署參考手冊_第5頁
已閱讀5頁,還剩13頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、.頁腳Hadoop 260分布式部署參考手冊1. 環境說明. 21.1安裝環境說明 . 22.2 Hadoop集群環境說明: . 22. 基礎環境安裝及配置 . 22.1 添加hadoop用戶 . 22.2 JDK 1.7 安裝 . 22.3 SSH無密碼登陸配置 . 32.4修改hosts映射文件 . 33. Hadoop安裝及配置 . 43.1通用部分安裝及配置 . 43.2各節點配置 . 44. 格式化/啟動集群 . 44.1格式化集群HDFS文件系統. 44.2啟動Hadoop集群 . 5附錄1關鍵配置內容參考 . 51core-site.xml . 52hdfs-site.xml .

2、 53mapred-site.xml . 64yar n-site.xml . 65hadoop-e nv.sh . 76slaves . 7附錄2詳細配置內容參考 . 71core-site.xml . 72hdfs-site.xml . 83mapred-site.xml . 84yarn-site.xml . 105hadoop-e nv.sh . 136slaves . 13附錄3詳細配置參數參考 . 13* con f/core-site.xml . 13* con f/hdfs-site.xml . 13o Con figurati ons forNameNode: 13o Con

3、 figurati ons forDataNode: 14* conf/yarn-site.xml . 14o Con figuratio ns forResourceMa nager and NodeMa nager: 14o Con figurati ons forResourceMa nager: 15o Con figurati ons forNodeMa nager: 16o Con figuratio ns for History Server (Needs to be moved elsewhere):.17* con f/mapred-site.xml . 17o Con fi

4、guratio ns for MapReduce Applicati ons:. 17o Con figuratio ns for MapReduce JobHistory Server:. 18.頁腳1.環境說明1.1安裝環境說明本列中,操作系統為 Centos 7.0 , JDK版本為 Oracle HotSpot 1.7 , Hadoop版本為 ApacheHadoop 2.6.0 ,操作用戶為 hadoop。2.2 Hadoop集群環境說明:集群各節點信息參考如下:主機名IP地址角色ResourceMa nagerResourceMa nager & MR JobH

5、istory ServerNameNodeNameNodeSec on daryNameNodeSec on daryNameNodeDataNode01DataNode & NodeMa nagerDataNode02DataNode & NodeMa nagerDataNode03DataNode & NodeMa nagerDataNode04DataNode & NodeMa nagerDataNode05DataNode & NodeMa nag

6、er注:上述表中用”&連接多個角色,如主機”ResourceMa nager ”有兩個角色,分別為ResourceManager 和 MR JobHistory Server 。2.基礎環境安裝及配置2.1添加hadoop用戶useradd hadoop用戶hadoop”即為Hadoop集群的安裝和使用用戶。2.2 JDK 1.7 安裝Centos 7自帶的JDK版本為OpenJDK 1.7,本例中需要將其更換為Oracle HotSpot 1.7版,本例中采用解壓二進制包方式安裝,安裝目錄為/opt/。查看當前JDK rpm包rpm -qa | grep jdk java-1.7.0-ope

7、njdk-1-.el7.x86_64java-1.7.0-openjdk-headless-1-.el7.x86_64.頁腳刪除自帶JDKrpm -e -no depsjava-1.7.0-openjdk-1-.el7.x86_64rpm -e -no depsjava-1.7.0-openjdk-headless-1-.el7.x86_64安裝指定JDK進入安裝包所在目錄并解壓配置環境變量編輯/.bashrc 或者/etc/profile,添加如下內容:#JAVAexport JA

8、VA_HOME=/opt/jdk1.7export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=$JAVA_HOME/libexport CLASSPATH=$CLASSPATH:$JAVA_HOME/jre/lib2.3 SSH無密碼登陸配置需要設置如上表格所示8臺主機間的SSH無密碼登陸。進入hadoop用戶的根目錄下并通過命令ssh-keygen -t rsa生成秘鑰對創建公鑰認證文件authorized_keys并將生成的/.ssh 目錄下的id_rsa.pub 文件的內容輸出至該文件:more id_rsa.pub auhorized_keys

9、分別改變/.ssh目錄和authorized_keys 文件的權限:chmod 700 /.ssh;chmod 600 /.ssh/authorized_keys每個節點主機都重復以上步驟,并將各自的/.ssh/ id_rsa.pub 文件的公鑰拷貝至其他主機。對于以上操作,也可以通過一句命令搞定:rm -rf /.ssh;ssh-keyge n -t rsa;chmod 700 /.ssh;more /.ssh/id_rsa.pub /.ssh/authorized_keys;chmod 600 /.ssh/authorized_keys;注:在centos 6中可以用dsa方式:ssh-k

10、eygen -t dsa命令來設置無密碼登陸,在centos7中只能用rsa方式,否則只能ssh無密碼登陸本機,無能登陸它機。2.4修改hosts映射文件分別編輯各節點上的/etc/hosts文件,添加如下內容:ResourceManagerNameNodeSecondaryNameNodeDataNode01DataNode02.頁腳DataNode03DataNode04DataNode05NodeManager

11、01NodeManager02NodeManager03NodeManager04 NodeManager053.Hadoop安裝及配置3.1通用部分安裝及配置以下操作內容為通用操作部分,及在每個節點上的內容一樣。分別在每個節點上重復如下操作:將hadoop安裝包(hadoop-2.6.0.tar )拷貝至/opt目錄下,并解壓:tar -xvf hadoop-2.6.0.tar解壓后的 hadoop-2.6.0 目錄(/opt/hadoop-2.6.0) 即為hadoop的安裝根目錄更改hadoop安裝目錄had

12、oop-2.6.0的所有者為 hadoop用戶:chow n -R hadoop.hadoop /opt/hadoop-2.6.0添加環境變量:#hadoopexport HAD00P_H0ME=/opt/hadoop-2.6.0export PATH=$PATH:$HADOOP_HOME/binexport PATH=$PATH:$HADOOP_HOME/sbin3.2各節點配置分別將如下配置文件解壓并分發至每個節點的Hadoop “ $HADOOP_HOME/etc/hadoop目錄中,如提示是否覆蓋文件,確認即可。Hadoop配置文件 參考 .zip注:關于各節點的配置參數設置,請參考后

13、面的“附錄1”或“附錄2”4.格式化/啟動集群4.1格式化集群HDFS文件系統安裝完畢后,需登陸NameNode節點或任一 DataNode節點執行 hdfs name node -format格式化集群HDFS文件系統;.頁腳注:如果非第一次格式化HDFS文件系統,則需要在進行格式化操作前分別將NameNode的 .dir 禾口各個 DataNode 節點的 dfs.datanode.data.dir目錄(在本例中為/home/hadoop/hadoopdata) 下的所有內容清空。4.2啟動Hadoop集群分別登陸如下主機并執行相應命令:登陸Resourc

14、eManger執行start-yarn.sh命令啟動集群資源管理系統yarn登陸NameNode執行start-dfs.sh 命令啟動集群 HDFS文件系統分別登陸 SecondaryNameNode、DataNode01、DataNode02、DataNode03、DataNode04節點執行jps命令,查看每個節點是否有如下Java進程運行:ResourceManger 節點運行的進程:ResouceNamagerNameNode?點運行的進程:NameNodeSecondaryNameNod節點運行的進程:SecondaryNameNode 各個 DataNode節點運行的進程:Data

15、Node & NodeManager如果以上操作正常則說明Hadoop集群已經正常啟動。附錄1關鍵配置內容參考1 core-site.xmlconfigurationfs.defaultFShdfs:/NameNode:9000NameNode URI 屬性” fs.defaultFS “表示 NameNode節點地址, 由” hdfs:/ 主機名(或ip):端口號” 組成.dirfile:/home/hadoop/hadoopdata/hdfs/namenodedfs.datanode.data.dirfile:/home/jack

16、/hadoopdata/hdfs/datanode/node.secondary.http-addressSecondaryNameNode:50090v/configuration屬性dfs.n ame no de. name.dir”表示NameNode存儲命名空間和操作日志相關的元數.頁腳據信息的本地文件系統目錄,該項默認本地路徑為/tmp/hadoop-username/dfs/name”;屬性” dfs.datanode.data.dir表示DataNode節點存儲 HDFS文件的本地文件系統目錄,由file:/ 本地目錄組成,該項默認本地路徑為/tm

17、p/hadoop-username/dfs/data 。屬性node.secondary.http-address表示 SecondNameNode主機及端口號(如果無需額外指定 Seco ndNameNode角色,可以不進行此項配置);3yarnExecution framework set to Hadoop YARN. 屬性表示執行 mapreduce任務所使用的運行框架,默認為local,需要將其改為yarn 4yarn

18、-site.xmlyarn.resourcemanager.hostnameResourceManagerResourceManager hostyarn.nodemanager.aux-servicesmapreduce_shuffleShuffle service that needs to be set for Map Reduce applications. .頁腳屬性” yarn.resourcemanager.hostname”用來指定 ResourceManager主機地址;屬性” yarn.nodemanager.aux-service表示MRapplicatons所使用的sh

19、uffle 工具5hadoop-e nv.shJAVA_H0M表示當前的 Java安裝目錄export JAVA_H0ME=/opt/jdk-1.76slaves集群中的 master節點(NameNode ResourceManager)需要配置其所擁有的slaver 節點,其中:NameNode節點的slaves內容為:DataNode01DataNodeO2DataNodeO3DataNode04DataNode05ResourceManager 節點的 slaves 內容為:NodeManager01NodeManager02NodeManager03NodeManager04Node

20、Manager05附錄2詳細配置內容參考注:以下的紅色字體部分的配置參數為必須配置的部分,其他配置皆為默認配置。1 core-site.xmlconfiguration fs.defaultFShdfs:/NameNode:9000NameNode URIio.file.buffer.size131072Size of read/write buffer used in SequenceFiles,The default value is 131072屬性” fs.defaultFS “表示 NameNode節點地址, 由” hdfs:/ 主機名(或ip):端口號”組成。.頁腳hdfs-sit

21、e.xmlconfiguration!-Configurations for NameNode:-.dirfile:/home/hadoop/hadoopdata/hdfs/node.secondary.http-addressSecondaryNameNode:50090dfs.replication3/node.handler.count100dfs.datanode.data.dirfile:/home/hadoop/hadoopdata/hdfs

22、/datanode 屬性dfs.n ame no de. name.dir”表示NameNode存儲命名空間和操作日志相關的元數據信息的本地文件系統目錄,該項默認本地路徑為” /tmp/hadoop-username/dfs/name ”; 屬性” dfs.datanode.data.dir 表示DataNode節點存儲 HDFS文件的本地文件系統目錄,由” file:/ 本地目錄”組成,該項默認本地路徑 為” /tmp/hadoop-username/dfs/data ”。屬性node.secondary.http-address”表示 SecondNameNode主機及端口

23、號(如果無需額外指定 Seco ndNameNode角色,可以不進行此項配置);3mapred-site.xml頁腳yarnExecution framework set to Hadoop YARN.mapreduce.map.memory.mb1024Larger resource limit for maps.mapreduce.map.java.optsXmx1024MLarger heap-size for child jvms of maps.mapreduce.reduce.memory.mb1024Larger resource

24、limit for reduces.mapreduce.reduce.java.optsXmx2560Mmapreduce.task.io.sort.mb512mapreduce.task.io.sort.factor10More streams merged at once while sorting files.mapreduce.reduce.shuffle.parallelcopies5Highernumber of parallel copies run by reduces to fetch outputs from very large numberof maps.!-Confi

25、gurations for MapReduce JobHistory Server:-mapreduce.jobhistory.addressResourceManager:10020MapReduce JobHistory Server host:port Default port is 10020mapreduce.jobhistory.webapp.addressResourceManager:19888MapReduce JobHistory Server Web UI host:port Default port is 19888erm

26、ediate-done-dir/mr-history/tmpDirectory where history files are written by MapReduce jobs. Defalut is/mr-history/tmpmapreduce.jobhistory.done-dir.頁腳/mr-history/doneDirectory where history files are managed by the MR JobHistory Server.Default value is /mr-history/done屬性” 表示執行

27、mapreduce任務所使用的運行框架,默認為local,需要將其改為yarn ”4yarn-site.xmlconfigurationyarn.acl.enablefalseEnable ACLs? Defaults to false. The value of the optional is true orfalseyarn.admin.acl*ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special valu

28、e of * which means anyone. Special value of just space means no one has accessyar n.lo g-aggregation-enablefalseConfiguration to enable or disable log aggregationyarn.resourcemanager.addressResourceManager:8032ResourceManager host:port for clients to submit jobs.NOTES:host:port If set, overrides the

29、 hostname set in yarn.resourcemanager.hostname.yarn.resourcemanager.scheduler.addressResourceManager:8030ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostnameyarn.resourcemanager.r

30、esource-tracker.addressResourceManager:8031ResourceManager host:port for NodeManagers.NOTES:host:port If set, overrides thehostname set in yarn.resourcemanager.hostnameyarn.resourcemanager.admin.addressResourceManager:8033ResourceManager host:port for administrative commands.NOTES:host:port If set,o

31、verrides the hostname set in yarn.resourcemanager.hostname.yarn.resourcemanager.webapp.address.頁腳ResourceManager:8088ResourceManager web-ui host:port. NOTES:host:port If set, overrides the hostname set in yarn.resourcemanager.hostnameyarn.resourcemanager.hostnameResourceManagerResourceManager hostya

32、rn.resourcemanager.scheduler.classorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerResourceManager Scheduler class CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler.The default value is org.apache.hadoop.yarn.server.resourcemanager.s

33、cheduler.capacity.CapacityScheduler.yarn.scheduler.minimum-allocation-mb1024Minimum limit of memory to allocate to each container request at the ResourceManager.NOTES:In MBsyarn.scheduler.maximum-allocation-mb8192Maximum limit of memory to allocate to each container request at the ResourceManager.NO

34、TES:In MBsyar n.lo g-aggregation.retain-seconds-1How long to keep aggregation logs before deleting them. -1 disables. Be careful, setthis too small and you will spam the name node.yar n.lo g-aggregation.retain-check-interval-seconds-1Time between checks for aggregated log retention. If set to 0 or a

35、 negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.yarn.nodemanager.resource.memory-mb8192Resource i.e. available physical memory, in MB, for given NodeManager.The default value is 8192.NOTES:De

36、fines total available resources on the NodeManager to be made available to runningcontainers.頁腳memoryusageyarn.nodemanager.vmem-pmem-ratio2.1Maximum ratio by which virtual memory usage of tasks may exceed physical memory.The default value is 2.1NOTES:The virtual memory usage of each task may exceed

37、its physical memory limit by this ratio. The total amount of virtual memoryused by tasks on the NodeManagermayexceed its physical by this ratio.yarn.nodemanager.local-dir$hadoop.tmp.dir/nm-local-dirComma-separated list of paths on the local filesystem where intermediate data is written.The default v

38、alue is $hadoop.tmp.dir/nm-local-dirNOTES:Multiple paths help spread disk i/o.yarn.nodemanager.log-dirs$yar n.lo g.dir/userlogsComma-separated list of paths on the local filesystem where logs are writtenThe default value is $yar n.lo g.dir/userlogsNOTES:Multiple paths help spread disk i/o.yarn.nodem

39、anager.log.retain-seconds10800Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.The default value is 10800yarn.nodemanager.remote-app-log-dir/logsHDFS directory where the application logs are moved on application completion. Need to set a

40、ppropriate permissions. Only applicable if log-aggregation is enabled.The default value is /logs or /tmp/logsyarn.nodemanager.remote-app-log-dir-suffixlogsSuffix appended to the remote log dir. Logs will be aggregated to $yarn.nodemanager.remote-app-log-dir/$user/$thisParam Only applicable if log-ag

41、gregation is enabled.yarn.nodemanager.aux-servicesmapreduce_shuffleShuffle service that needs to be set for Map Reduce applications. .頁腳屬性” yarn.resourcemanager.hostname”用來指定 ResourceManager主機地址;屬性” yarn.nodemanager.aux-service表示MRapplicatons所使用的shuffle 工具5hadoop-e nv.shJAVA_H0M表示當前的 Java安裝目錄export

42、JAVA_H0ME=/opt/jdk-1.76slaves集群中的 master節點(NameNode ResourceManager)需要配置其所擁有的slaver 節點,其中:NameNode節點的slaves內容為:DataNode01DataNodeO2DataNodeO3DataNode04DataNode05ResourceManager 節點的 slaves 內容為:NodeManager01NodeManager02NodeManager03NodeManager04NodeManager05附錄3詳細配置參數參考Con figuri ng the Hadoop Daem on

43、s in Non-Secure ModeThis section deals with important parameters to be specified in the given configuration files:* conf/core-site.xmlParameterValueNotesfs.defaultFSNameNode URIhdfs:/host:port/io.file.buffer.size131072Size of read/write buffer used in SequenceFiles.* conf/hdfs-site.xmlo Configuratio

44、ns for NameNode:.頁腳ParameterValueN.dirPath on the local filesystem whe re the NameNode stores the names pace and transactions logs persi stently.If this is a comma-delimited lis t of directories then the name t able is replicated in all of the directories, for redundancy.dfs.nam

45、enode.hosts/node.hosts.excludeList of permitted/excluded DataN odes.If necessary, use these files to control the list of allowable d atanodes.dfs.blocksize268435456HDFS blocksize of 256MB for larg e node.handler.count100More NameNode server threads to handle RPCs from la

46、rge number of DataNodes.o Configurations for DataNode:ParameterValueNotesdfs.datanode.data.dirComma separated list of paths on the local filesystem of aDataNode where it should store it s blocks.If this is a comma-delimited lis t of directories, then data will be stored in all named director ies, ty

47、pically on different devi ces.* conf/yarn-site.xmlo Configurations for ResourceManager and NodeManager:ParameterValueNotesyarn.acl.enabletrue / falseEnable ACLs? Defaults tofalse .yarn.admin.aclAdmin ACLACL to set admins on the cluster. ACLs are of forcomma-separated-usersspacecomma-separated-gr oup

48、s. Defaults to special value of * which means anyone. Spec ial value of justspace meansno one has access.yarn .l og-aggregation-enablefalseConfiguration to enable or disab le log aggregation.頁腳o Configurations for ResourceManager:ParameterValueNotesyarn.resourcemanager.addressResourceManager host:po

49、rt for clients to submit jobs.host:portIf set, overrides the hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.scheduler.addressResourceManager host:port for ApplicationMasters to talk to S cheduler to obtain resources.host:portIf set, overrides the hostname set in yarn.resourceman

50、ager.h ostname.yarn.resourcemanager.resourc e-tracker.addressResourceManager host:port for NodeManagers.host:portIf set, overrides the hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.admin.addressResourceManager host:port foradministrative commands.host:portIf set, overrides the

51、hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.webapp. addressResourceManager web-ui host:p ort.host:portIf set, overrides the hostname set in yarn.resourcemanager.h ostname.yarn.resourcemanager.hostnameResourceManager host.hostSingle hostname that can be set in place of setting

52、 allyarn.resourcemanager*address resou rces. Results in default ports f or ResourceManager components.yarn.resourcemanager.scheduler.classResourceManager Scheduler class.CapacityScheduler(recommended), FairScheduler(also recommended), or FifoScheduleryarn.scheduler.minimum-allocation-mbMinimum limit

53、 of memory to alloc ate to each container request at the Resource Manager .In MBsyarn.scheduler.maximum-allocation-mbMaximum limit of memory to alloc ate to each container request at the Resource Manager .In MBs.頁腳yarn.resourcemanager.nodes.i nclude-path/yarn.resourcemanager.nodes.exclude-pathList o

54、f permitted/excluded NodeM anagers.If necessary, use these files to control the list of allowable N odeManagers.o Configurations for NodeManager:ParameterValueNotesyarn.nodemanager.resource.me mory-mbResource i.e. available physical memory, in MB, for givenNodeManagerDefines total available resource

55、s on the NodeManager to be m ade available to running contain ersyarn.nodemanager.vmem-pmem-ratioMaximum ratio by which virtual m emory usage of tasks may exceed physical memoryThe virtual memory usage of each task may exceed its physical me mory limit by this ratio. The to tal amount of virtual mem

56、ory use d by tasks on the NodeManager ma y exceed its physical memory usa ge by this ratio.yarn.nodemanagero cal-dirsComma-separated list of paths on the local filesystem where inte rmediate data is written.Multiple paths help spread disk i/o.yarn.nodemanagero g-dirsComma-separated list of paths on

57、the local filesystem where logs are written.Multiple paths help spread disk i/o.yarn.nodemanager.log.retain-seconds10800Default time (in seconds) to ret ain log files on the NodeManager Only applicable if log-aggregat ion is disabled.yarn.nodemanager.remote-app- log-dir/logsHDFS directory where the applica tion logs are moved on applicati on completion. Need to set appro priate permissions. Only applica ble if log-aggregation is enable d.yarn.nodemanager.remote-app- log-dir-suffixlogsSuffix

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論