大數(shù)據(jù)技術(shù)之Flume_第1頁
大數(shù)據(jù)技術(shù)之Flume_第2頁
大數(shù)據(jù)技術(shù)之Flume_第3頁
大數(shù)據(jù)技術(shù)之Flume_第4頁
大數(shù)據(jù)技術(shù)之Flume_第5頁
已閱讀5頁,還剩18頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

大數(shù)據(jù)技術(shù)之Flume、Flume簡介Flume提供一個分布式的,可靠的,對大數(shù)據(jù)量的日志進行高效收集、聚集、移動的服務(wù),Flume只能在Unix環(huán)境下運行。Flume基于流式架構(gòu),容錯性強,也很靈活簡單。Flume>Kafka用來實時進行數(shù)據(jù)收集,Spark、Storm用來實時處理數(shù)據(jù),impala用來實時查詢。二、Flume角色2.1XSource用于采集數(shù)據(jù),Source是產(chǎn)生數(shù)據(jù)流的地方,同時Source會將產(chǎn)生的數(shù)據(jù)流傳輸?shù)紺hannel,這個有點類似于Java10部分的Channelo22Channel用于橋接Sources和Sinks,類似于一個隊列。2.3、Sink從Channel收集數(shù)據(jù),將數(shù)據(jù)寫到目標(biāo)源(可以是下一個Source,也可以是HDFS或者HBase)。24Event傳輸單元,F(xiàn)lume數(shù)據(jù)傳輸?shù)幕締卧允录男问綄?shù)據(jù)從源頭送至目的地。三、Fhime傳輸過程source監(jiān)控某個文件或數(shù)據(jù)流,數(shù)據(jù)源產(chǎn)生新的數(shù)據(jù),拿到該數(shù)據(jù)后,將數(shù)據(jù)封裝在一個Event中,并put到channel后commit提交,channel隊列先進先出,sink去channel隊列中拉取數(shù)據(jù),然后寫入到HDFS中。四、Flume部署及使用文件配置查詢JAVA_HOME: echo$JAVA_HOME顯示/opt/module/jdk1.8.0_144/opt/module/jdk1.8.0_144安裝Flume[itstar@bigdata11software]$tar-zxvfapache-flume-1.7.0-bin.tar.gz-C/opt/module/改名:[itstar@bigdata11conf]$mvflume-env.sh.templateflume-env.shflume-env.sh涉及修改項:exportJAVA_HOME=/opt/module/jdk1.8.0_144案例案例一:監(jiān)控端口數(shù)據(jù)目標(biāo):Flume監(jiān)控一端Console,另一端Console發(fā)送消息,使被監(jiān)控端實時顯示。分步實現(xiàn):1)安裝telnet工具【本地環(huán)境】$sudorpm-ivhxinetd-2.3.14-40.el6.x86_64.rpm$sudorpm-ivhtelnet-0.17-48.el6.x86_64.rpm$sudorpm-ivhtelnet-server-0.17-48.el6.x86_64.rpmOR【聯(lián)網(wǎng)】NAT[itstar@bigdata11module]$sudoyuminstalltelnetxinetd-2.3.14-40.el6.x86_64.rpm(安裝完第一個,無需安裝剩下的兩條/yum自動解決依賴關(guān)系)[itstar@bigdata11module]$sudoyum-yinstalltelnettelnet-0.17-48.el6.x86_64.rpm[itstar@bigdata11module]$sudoyum-yinstalltelnettelnet-server-0.17-48.el6.x86_64.rpm2)創(chuàng)建FlumeAgent配置文件flume-telnet.confNamethecomponentsonthisagentaLsources=r1aLsinks=k1aLchannels=c1Describe/configurethesourcea1.sources.r1.type=netcat盡量不要用loca山ost,用主機名或者IP代替a1.sources.r1.bind=bigdata11a1.sources.r1.port=44445Describethesinka1.sinks.k1.type=loggerUseachannelwhichbufferseventsinmemorya1.channels.c1.type=memorya1.channels.c1.capacity=1000a1.channels.c1.transactionCapacity=100Bindthesourceandsinktothechannela1.sources.r1.channels=c1a1.sinks.k1.channel=c13)判斷44444端口是否被占用$netstat-tunlp|grep444454)先開啟flume先聽端口在conf下創(chuàng)建job文件夾$bin/flume-ngagent\--confconf/\--nameal\--conf-fileconf/job/flume-telnet.conf-Dflume.root.logger==INFO,console5)使用telnet工具向本機的44444端口發(fā)送內(nèi)容$telnetbigdatall44445案例二:實時讀取本地文件到HDFS目標(biāo):實時監(jiān)控hive日志,并上傳到HDFS中1)創(chuàng)建flume-hdfs.conf文件Namethecomponentsonthisagenta2.sources=r2a2.sinks=k2a2.channels=c2Describe/configurethesourcea2.sources.r2.type=mand=tail-F/opt/module/apache-flume-1.7.0-bin/jobs/FileToHDFSa2.sources.r2.shell=/bin/bash-cDescribethesinka2.sinks.k2.type=hdfsa2.sinks.k2.hdfs.path=hdfs://bigdata11:9000/flume/#%Y%m%d/%H可以加上時間來分割數(shù)據(jù)上傳文件的前綴a2.sinks.k2.hdfs.filePrefix=logs-是否按照時間滾動文件夾a2.sinks.k2.hdfs.round=true多少時間單位創(chuàng)建一個新的文件夾a2.sinks.k2.hdfs.roundValue=1重新定義時間單位a2.sinks.k2.hdfs.roundUnit=hour是否使用本地時間戳a2.sinks.k2.hdfs.useLocalTimeStamp=true積攢多少個Event才flush到HDFS一次a2.sinks.k2.hdfs.batchSize=1000設(shè)置文件類型,可支持壓縮a2.sinks.k2.hdfs.fileType=DataStream多久生成一個新的文件a2.sinks.k2.hdfs.rollInterval=600設(shè)置每個文件的滾動大小a2.sinks.k2.hdfs.rollSize=134217700文件的滾動與Event數(shù)量無關(guān)a2.sinks.k2.hdfs.rollCount=0最小副本數(shù)a2.sinks.k2.hdfs.minBlockReplicas=1Useachannelwhichbufferseventsinmemorya2.channels.c2.type=memorya2.channels.c2.capacity=1000a2.channels.c2.transactionCapacity=100Bindthesourceandsinktothechannela2.sources.r2.channels=c2a2.sinks.k2.channel=c23)執(zhí)行監(jiān)控配置$bin/flume-ngagent--confconf/--namea2--conf-fileconf/job/flume-hdfs.conf案例三:實時讀取目錄文件到HDFS目標(biāo):使用flume監(jiān)聽整個目錄的文件分步實現(xiàn):1)創(chuàng)建配置文件flume-dir.confa3.sources=r3a3.sinks=k3a3.channels=c3Describe/configurethesourcea3.sources.r3.type=spooldira3.sources.r3.spoolDir=/opt/module/apache-flume-1.7.0-bin/uploada3.sources.r3.fileSuffix=.COMPLETEDa3.sources.r3.fileHeader=true忽略所有以.tmp結(jié)尾的文件,不上傳a3.sources.r3.ignorePattern=([A]*\.tmp)Describethesinka3.sinks.k3.type=hdfsa3.sinks.k3.hdfs.path=hdfs://bigdata11:9000/flume/upload/%Y%m%d/%H上傳文件的前綴a3.sinks.k3.hdfs.filePrefix=upload-是否按照時間滾動文件夾a3.sinks.k3.hdfs.round=true多少時間單位創(chuàng)建一個新的文件夾a3.sinks.k3.hdfs.roundValue=1重新定義時間單位a3.sinks.k3.hdfs.roundUnit=hour是否使用本地時間戳a3.sinks.k3.hdfs.useLocalTimeStamp=true積攢多少個Event才flush到HDFS一次a3.sinks.k3.hdfs.batchSize=100設(shè)置文件類型,可支持壓縮a3.sinks.k3.hdfs.fileType=DataStream多久生成一個新的文件a3.sinks.k3.hdfs.rollInterval=600設(shè)置每個文件的滾動大小大概是128Ma3.sinks.k3.hdfs.rollSize=134217700文件的滾動與Event數(shù)量無關(guān)a3.sinks.k3.hdfs.rollCount=0最小副本數(shù)a3.sinks.k3.hdfs.minBlockReplicas=1Useachannelwhichbufferseventsinmemorya3.channels.c3.type=memorya3.channels.c3.capacity=1000a3.channels.c3.transactionCapacity=100Bindthesourceandsinktothechannela3.sources.r3.channels=c3a3.sinks.k3.channel=c32)執(zhí)行測試:執(zhí)行如下腳本后,請向upload文件夾中添加文件試試$bin/flume-ngagent--confconf/--namea3--conf-filejobs/flume-dir.conf尖叫提示:在使用SpoolingDirectorySource時1)不要在監(jiān)控目錄中創(chuàng)建并持續(xù)修改文件2)上傳完成的文件會以.COMPLETED結(jié)尾被監(jiān)控文件夾每500毫秒掃描一次文件變動案例四:Flume與Flume之間數(shù)據(jù)傳遞:單Flume多Channel、Sink,目標(biāo):使用flume-1監(jiān)控文件變動,flume-1將變動內(nèi)容傳遞給flume-2,flume-2負責(zé)存儲到HDFS。同時flume-1將變動內(nèi)容傳遞給flume-3,flume-3負責(zé)輸出到。localfi006Cesystem。分步實現(xiàn):1)創(chuàng)建flume-l.conf,用于監(jiān)控hive.log文件的變動,同時產(chǎn)生兩個channel和兩個sink分別輸送給flume-2和flume3:NamethecomponentsonthisagentaLsources=r1a1.sinks=k1k2aLchannels=c1c2將數(shù)據(jù)流復(fù)制給多個channela1.sources.r1.selector.type=replicatingDescribe/configurethesourcea1.sources.r1.type=mand=tail-F/opt/module/apache-flume-1.7.0-bin/jobs/FileToHDFSa1.sources.r1.shell=/bin/bash-cDescribethesinka1.sinks.k1.type=avroal.sinks.kl.hostname=bigdatalla1.sinks.k1.port=4141a1.sinks.k2.type=avroa1.sinks.k2.hostname=bigdata11a1.sinks.k2.port=4142Describethechannela1.channels.c1.type=memorya1.channels.c1.capacity=1000a1.channels.c1.transactionCapacity=100a1.channels.c2.type=memorya1.channels.c2.capacity=1000a1.channels.c2.transactionCapacity=100Bindthesourceandsinktothechannela1.sources.r1.channels=c1c2a1.sinks.k1.channel=c1a1.sinks.k2.channel=c22)創(chuàng)建flume-2.conf,用于接收flume-1的event,同時產(chǎn)生1個channel和1個sink,將數(shù)據(jù)輸送給hdfs:Namethecomponentsonthisagenta2.sources=r1a2.sinks=k1a2.channels=clDescribe/configurethesourcea2.sources.r1.type=avroa2.sources.r1.bind=bigdatalla2.sources.r1.port=4141Describethesinka2.sinks.k1.type=hdfsa2.sinks.k1.hdfs.path=hdfs://bigdata11:9000/flume2/上傳文件的前綴a2.sinks.k1.hdfs.filePrefix=flume2-是否按照時間滾動文件夾a2.sinks.k1.hdfs.round=true多少時間單位創(chuàng)建一個新的文件夾a2.sinks.k1.hdfs.roundValue=1重新定義時間單位a2.sinks.k1.hdfs.roundUnit=hour是否使用本地時間戳a2.sinks.k1.hdfs.useLocalTimeStamp=true積攢多少個Event才flush到HDFS一次a2.sinks.k1.hdfs.batchSize=100設(shè)置文件類型,可支持壓縮a2.sinks.k1.hdfs.fileType=DataStream多久生成一個新的文件a2.sinks.k1.hdfs.rollInterval=600設(shè)置每個文件的滾動大小大概是128M文件的滾動與Event數(shù)量無關(guān)a2.sinks.k1.hdfs.rollCount=0最小副本數(shù)a2.sinks.k1.hdfs.minBlockReplicas=1Describethechannela2.channels.c1.type=memorya2.channels.c1.capacity=1000a2.channels.c1.transactionCapacity=100Bindthesourceandsinktothechannela2.sources.r1.channels=c1a2.sinks.k1.channel=c13)創(chuàng)建flume-3.conf,用于接收flume-1的event,同時產(chǎn)生1個channel和1個sink,將數(shù)據(jù)輸送給本地目錄:Namethecomponentsonthisagenta3.sources=r1a3.sinks=k1a3.channels=c1Describe/configurethesourcea3.sources.r1.type=avroa3.sources.r1.bind=bigdata11a3.sources.r1.port=4142Describethesinka3.sinks.k1.type=file_rolla3.sinks.k1.sink.directory=/opt/module/apache-flume-1.7.0-bin/jobs/flume3Describethechannela3.channels.c1.type=memorya3.channels.c1.capacity=1000a3.channels.c1.transactionCapacity=100Bindthesourceandsinktothechannela3.sources.r1.channels=c1a3.sinks.k1.channel=c1尖叫提示:輸出的本地目錄必須是已經(jīng)存在的目錄,如果該目錄不存在,并不會創(chuàng)建新的目錄。4)執(zhí)行測試:分別開啟對應(yīng)flume-job(依次啟動flume-1,flume-2,flume-3),同時產(chǎn)生文件變動并觀察結(jié)果:bin/flume-ngagent--confconf/--namea1--conf-filejobs/flume-1.confbin/flume-ngagent--confconf/--namea2--conf-filejobs/flume-2.confbin/flume-ngagent--confconf/--namea3--conf-filejobs/flume-3.confbin/flume-ngagent--confconf/--namea4--conf-fileconf/job/a4.conf-Dflume.root.logger=INFO,console4.2?5、案例五:Flume與Flume之間數(shù)據(jù)傳遞,多Flume匯總數(shù)據(jù)到單Flume目標(biāo):flume-11監(jiān)控文件hive.log,flume-22監(jiān)控某一個端口的數(shù)據(jù)流,flume-11與flume-22將數(shù)據(jù)發(fā)送給flume-33,flume33將最終數(shù)據(jù)寫入到HDFS。分步實現(xiàn):1)創(chuàng)建flume-11.conf,用于監(jiān)控FileToHDFS文件,同時sink數(shù)據(jù)到flume-33:NamethecomponentsonthisagentaLsources=r1aLsinks=k1aLchannels=c1Describe/configurethesourcea1.sources.r1.type=mand=tail-F/opt/module/apache-flume-1.7.0-bin/jobs/FileToHDFSa1.sources.r1.shell=/bin/bash-cDescribethesinka1.sinks.k1.type=avroal.sinks.kl.hostname=bigdatalla1.sinks.k1.port=4141Describethechannela1.channels.c1.type=memorya1.channels.c1.capacity=1000a1.channels.c1.transactionCapacity=100Bindthesourceandsinktothechannela1.sources.r1.channels=c1a1.sinks.k1.channel=c12)創(chuàng)建flume-22.conf,用于監(jiān)控端口44444數(shù)據(jù)流,同時sink數(shù)據(jù)到flume-33:Namethecomponentsonthisagenta2.sources=r1a2.sinks=k1a2.channels=c1Describe/configurethesourcea2.sources.r1.type=netcata2.sources.r1.bind=bigdata11a2.sources.r1.port=44444Describethesinka2.sinks.k1.type=avroa2.sinks.k1.hostname=bigdata11a2.sinks.k1.port=4141Useachannelwhichbufferseventsinmemorya2.channels.c1.type=memorya2.channels.c1.capacity=1000a2.channels.c1.transactionCapacity=100Bindthesourceandsinktothechannela2.sources.r1.channels=c1a2.sinks.k1.channel=c13)創(chuàng)建flume-33.conf,用于接收flume-11與flume-22發(fā)送過來的數(shù)據(jù)流,最終合并后sink至UHDFS:Namethecomponentsonthisagenta3.sources=r1a3.sinks=k1a3.channels=c1Describe/configurethesourcea3.sources.r1.type=avroa3.sources.r1.bind=bigdata11a3.sources.r1.port=4141Describethesinka3.sinks.k1.type=hdfsa3.sinks.k1.hdfs.path=hdfs://bigdata11:9000/flume3/上傳文件的前綴a3.sinks.k1.hdfs.filePrefix=flume3-是否按照時間滾動文件夾a3.sinks.k1.hdfs.round=true多少時間單位創(chuàng)建一個新的文件夾a3.sinks.k1.hdfs.roundValue=1重新定義時間單位a3.sinks.k1.hdfs.roundUnit=hour是否使用本地時間戳a3.sinks.k1.hdfs.useLocalTimeStamp=true積攢多少個Event才flush到HDFS一次a3.sinks.k1.hdfs.batchSize=100設(shè)置文件類型,可支持壓縮a3.sinks.k1.hdfs.fileType=DataStream多久生成一個新的文件a3.sinks.k1.hdfs.rollInterval=600設(shè)置每個文件的滾動大小大概是128Ma3.sinks.k1.hdfs.rollSize=134217700文件的滾動與Event數(shù)量無關(guān)a3.sinks.k1.hdfs.rollCount=0最小冗余數(shù)a3.sinks.k1.hdfs.minBlockReplicas=1Describethechannela3.channels.c1.type=memorya3.channels.c1.capacity=1000a3.channels.c1.transactionCapacity=100Bindthesourceandsinktothechannela3.sources.r1.channels=c1a3.sinks.k1.channel=c14)執(zhí)行測試:分別開啟對應(yīng)flume-job(依次啟動flume-33,flume-22,flume-11),同時產(chǎn)生文件變動并觀察結(jié)果:$bin/flume-ngagent--confconf/--nameal--conf-filejobs/flume-11.conf$bin/flume-ngagent--confconf/--namea2--conf-filejobs/flume-22.conf$bin/flume-ngagent--confconf/--namea3--conf-filejobs/flume-33.conf尖叫提示:測試時記得啟動hive產(chǎn)生一些日志,同時使用telnet向44444端口發(fā)送內(nèi)容,如:$bin/hive$telnetbigdata1144445五、Flume監(jiān)控之GangliaGanglia的安裝與部署1)安裝httpd服務(wù)與phpyum-yinstallhttpdphp2)安裝其他依賴yum-yinstallrrdtoolperl-rrdtoolrrdtool-develyum-yinstallapr-devel3)安裝gangliarpm-Uvh/pub/epel/6/x86_64/epel-release-6-8.noarch.rpmyum-yinstallganglia-gmetadyum-yinstallganglia-webyuminstall-yganglia-gmond4)修改配置文件文件ganglia.conf:〈Location/ganglia>Orderdeny,allowDenyfromallAllowfromallAllowfromAllowfrom::1Allowfrom.〈/Location〉文件gmetad.conf:vi/etc/ganglia/gmetad.conf修改為:data_source"hadoop"04文件gmond.conf:vi/etc/ganglia/gmond.conf修改為:cluster{name="hadoop"owner="unspecified"latlong="unspecified"url="unspecified"}udp_send_channel{#bind_hostname=yes#Highlyrecommended,soontobedefault.Thisoptiontellsgmondtouseasourceaddressthatresolvestothemachine'shostname.Withoutthis,themetricsmayappeartocomefromanyinterfaceandtheDNSnamesassociatedwiththoseIPswillbeusedtocreatetheRRDs.#mcast_join=1host=04port=8649ttl=1udp_recv_channel{mcast_join=1port=8649bind=04retry_bind=trueSizeoftheUDPbuffer.Ifyouarehandlinglotsofmetricsyoureallyshouldbumpituptoe.g.10MBorevenhigher.buffer=10485760文件config:vi/etc/selinux/config修改為:ThisfilecontrolsthestateofSELinuxonthesystem.SELINUX=cantakeoneofthesethreevalues:enforcing-SELinuxsecuritypolicyisenforced.permissive-SELinuxprintswarningsinsteadofenforcing.disabled-NoSELinux

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論