數據挖掘外文翻譯參考文獻_第1頁
數據挖掘外文翻譯參考文獻_第2頁
數據挖掘外文翻譯參考文獻_第3頁
數據挖掘外文翻譯參考文獻_第4頁
數據挖掘外文翻譯參考文獻_第5頁
已閱讀5頁,還剩11頁未讀, 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

數據挖掘外文翻譯參考文獻數據挖掘外文翻譯參考文獻(文檔含中英文對照即英文原文和中文翻譯)外文:WhatisDataMining?Simplystated,dataminingreferstoextractingor“mining”knowledgefromlargeamountsofdata.Thetermisactuallyamisnomer.Rememberthattheminingofgoldfromrocksorsandisreferredtoasgoldminingratherthanrockorsandmining.Thus,“datamining”shouldhavebeenmoreappropriatelynamed“knowledgeminingfromdata”,whichisunfortunatelysomewhatlong.“Knowledgemining”,ashorterterm,maynotreflecttheemphasisonminingfromlargeamountsofdata.Nevertheless,miningisavividtermcharacterizingtheprocessthatfindsasmallsetofpreciousnuggetsfromagreatdealofrawmaterial.Thus,suchamisnomerwhichcarriesboth“data”and“mining”becameapopularchoice.Therearemanyothertermscarryingasimilarorslightlydifferentmeaningtodatamining,suchasknowledgeminingfromdatabases,knowledgeextraction,data/patternanalysis,dataarchaeology,anddatadredging.Manypeopletreatdataminingasasynonymforanotherpopularlyusedterm,“KnowledgeDiscoveryinDatabases”,orKDD.Alternatively,othersviewdataminingassimplyanessentialstepintheprocessofknowledgediscoveryindatabases.Knowledgediscoveryconsistsofaniterativesequenceofthefollowingsteps:·datacleaning:toremovenoiseorirrelevantdata,·dataintegration:wheremultipledatasourcesmaybecombined,·dataselection:wheredatarelevanttotheanalysistaskareretrievedfromthedatabase,·datatransformation:wheredataaretransformedorconsolidatedintoformsappropriateforminingbyperformingsummaryoraggregationoperations,forinstance,·datamining:anessentialprocesswhereintelligentmethodsareappliedinordertoextractdatapatterns,·patternevaluation:toidentifythetrulyinterestingpatternsrepresentingknowledgebasedonsomeinterestingnessmeasures,and·knowledgepresentation:wherevisualizationandknowledgerepresentationtechniquesareusedtopresenttheminedknowledgetotheuser.Thedataminingstepmayinteractwiththeuseroraknowledgebase.Theinterestingpatternsarepresentedtotheuser,andmaybestoredasnewknowledgeintheknowledgebase.Notethataccordingtothisview,dataminingisonlyonestepintheentireprocess,albeitanessentialonesinceituncovershiddenpatternsforevaluation.Weagreethatdataminingisaknowledgediscoveryprocess.However,inindustry,inmedia,andinthedatabaseresearchmilieu,theterm“datamining”isbecomingmorepopularthanthelongertermof“knowledgediscoveryindatabases”.Therefore,inthisbook,wechoosetousetheterm“datamining”.Weadoptabroadviewofdataminingfunctionality:dataminingistheprocessofdiscoveringinterestingknowledgefromlargeamountsofdatastoredeitherindatabases,datawarehouses,orotherinformationrepositories.Basedonthisview,thearchitectureofatypicaldataminingsystemmayhavethefollowingmajorcomponents:1.Database,datawarehouse,orotherinformationrepository.Thisisoneorasetofdatabases,datawarehouses,spreadsheets,orotherkindsofinformationrepositories.Datacleaninganddataintegrationtechniquesmaybeperformedonthedata.2.Databaseordatawarehouseserver.Thedatabaseordatawarehouseserverisresponsibleforfetchingtherelevantdata,basedontheuser’sdataminingrequest.3.Knowledgebase.Thisisthedomainknowledgethatisusedtoguidethesearch,orevaluatetheinterestingnessofresultingpatterns.Suchknowledgecanincludeconcepthierarchies,usedtoorganizeattributesorattributevaluesintodifferentlevelsofabstraction.Knowledgesuchasuserbeliefs,whichcanbeusedtoassessapattern’sinterestingnessbasedonitsunexpectedness,mayalsobeincluded.Otherexamplesofdomainknowledgeareadditionalinterestingnessconstraintsorthresholds,andmetadata(e.g.,describingdatafrommultipleheterogeneoussources).4.Dataminingengine.Thisisessentialtothedataminingsystemandideallyconsistsofasetoffunctionalmodulesfortaskssuchascharacterization,associationanalysis,classification,evolutionanddeviationanalysis.5.Patternevaluationmodule.Thiscomponenttypicallyemploysinterestingnessmeasuresandinteractswiththedataminingmodulessoastofocusthesearchtowardsinterestingpatterns.Itmayaccessinterestingnessthresholdsstoredintheknowledgebase.Alternatively,thepatternevaluationmodulemaybeintegratedwiththeminingmodule,dependingontheimplementationofthedataminingmethodused.Forefficientdatamining,itishighlyrecommendedtopushtheevaluationofpatterninterestingnessasdeepaspossibleintotheminingprocesssoastoconfinethesearchtoonlytheinterestingpatterns.6.Graphicaluserinterface.Thismodulecommunicatesbetweenusersandthedataminingsystem,allowingtheusertointeractwiththesystembyspecifyingadataminingqueryortask,providinginformationtohelpfocusthesearch,andperformingexploratorydataminingbasedontheintermediatedataminingresults.Inaddition,thiscomponentallowstheusertobrowsedatabaseanddatawarehouseschemasordatastructures,evaluateminedpatterns,andvisualizethepatternsindifferentforms.Fromadatawarehouseperspective,dataminingcanbeviewedasanadvancedstageofon-1ineanalyticalprocessing(OLAP).However,datamininggoesfarbeyondthenarrowscopeofsummarization-styleanalyticalprocessingofdatawarehousesystemsbyincorporatingmoreadvancedtechniquesfordataunderstanding.Whiletheremaybemany“dataminingsystems”onthemarket,notallofthemcanperformtruedatamining.Adataanalysissystemthatdoesnothandlelargeamountsofdatacanatmostbecategorizedasamachinelearningsystem,astatisticaldataanalysistool,oranexperimentalsystemprototype.Asystemthatcanonlyperformdataorinformationretrieval,includingfindingaggregatevalues,orthatperformsdeductivequeryansweringinlargedatabasesshouldbemoreappropriatelycategorizedaseitheradatabasesystem,aninformationretrievalsystem,oradeductivedatabasesystem.Datamininginvolvesanintegrationoftechniquesfrommult1pledisciplinessuchasdatabasetechnology,statistics,machinelearning,highperformancecomputing,patternrecognition,neuralnetworks,datavisualization,informationretrieval,imageandsignalprocessing,andspatialdataanalysis.Weadoptadatabaseperspectiveinourpresentationofdatamininginthisbook.Thatis,emphasisisplacedonefficientandscalabledataminingtechniquesforlargedatabases.Byperformingdatamining,interestingknowledge,regularities,orhigh-levelinformationcanbeextractedfromdatabasesandviewedorbrowsedfromdifferentangles.Thediscoveredknowledgecanbeappliedtodecisionmaking,processcontrol,informationmanagement,queryprocessing,andsoon.Therefore,dataminingisconsideredasoneofthemostimportantfrontiersindatabasesystemsandoneofthemostpromising,newdatabaseapplicationsintheinformationindustry.AclassificationofdataminingsystemsDataminingisaninterdisciplinaryfield,theconfluenceofasetofdisciplines,includingdatabasesystems,statistics,machinelearning,visualization,andinformationscience.Moreover,dependingonthedataminingapproachused,techniquesfromotherdisciplinesmaybeapplied,suchasneuralnetworks,fuzzyandorroughsettheory,knowledgerepresentation,inductivelogicprogramming,orhighperformancecomputing.Dependingonthekindsofdatatobeminedoronthegivendataminingapplication,thedataminingsystemmayalsointegratetechniquesfromspatialdataanalysis,Informationretrieval,patternrecognition,imageanalysis,signalprocessing,computergraphics,Webtechnology,economics,orpsychology.Becauseofthediversityofdisciplinescontributingtodatamining,dataminingresearchisexpectedtogeneratealargevarietyofdataminingsystems.Therefore,itisnecessarytoprovideaclearclassificationofdataminingsystems.Suchaclassificationmayhelppotentialusersdistinguishdataminingsystemsandidentifythosethatbestmatchtheirneeds.Dataminingsystemscanbecategorizedaccordingtovariouscriteria,asfollows.1)Classificationaccordingtothekindsofdatabasesmined.Adataminingsystemcanbeclassifiedaccordingtothekindsofdatabasesmined.Databasesystemsthemselvescanbeclassifiedaccordingtodifferentcriteria(suchasdatamodels,orthetypesofdataorapplicationsinvolved),eachofwhichmayrequireitsowndataminingtechnique.Dataminingsystemscanthereforebeclassifiedaccordingly.Forinstance,ifclassifyingaccordingtodatamodels,wemayhavearelational,transactional,object-oriented,object-relational,ordatawarehouseminingsystem.Ifclassifyingaccordingtothespecialtypesofdatahandled,wemayhaveaspatial,time-series,text,ormultimediadataminingsystem,oraWorld-WideWebminingsystem.Othersystemtypesincludeheterogeneousdataminingsystems,andlegacydataminingsystems.2)Classificationaccordingtothekindsofknowledgemined.Dataminingsystemscanbecategorizedaccordingtothekindsofknowledgetheymine,i.e.,basedondataminingfunctionalities,suchascharacterization,discrimination,association,classification,clustering,trendandevolutionanalysis,deviationanalysis,similarityanalysis,etc.Acomprehensivedataminingsystemusuallyprovidesmultipleand/orintegrateddataminingfunctionalities.Moreover,dataminingsystemscanalsobedistinguishedbasedonthegranularityorlevelsofabstractionoftheknowledgemined,includinggeneralizedknowledge(atahighlevelofabstraction),primitive-levelknowledge(atarawdatalevel),orknowledgeatmultiplelevels(consideringseverallevelsofabstraction).Anadvanceddataminingsystemshouldfacilitatethediscoveryofknowledgeatmultiplelevelsofabstraction.3)Classificationaccordingtothekindsoftechniquesutilized.Dataminingsystemscanalsobecategorizedaccordingtotheunderlyingdataminingtechniquesemployed.Thesetechniquescanbedescribedaccordingtothedegreeofuserinteractioninvolved(e.g.,autonomoussystems,interactiveexploratorysystems,query-drivensystems),orthemethodsofdataanalysisemployed(e.g.,database-orientedordatawarehouse-orientedtechniques,machinelearning,statistics,visualization,patternrecognition,neuralnetworks,andsoon).Asophisticateddataminingsystemwilloftenadoptmultipledataminingtechniquesorworkoutaneffective,integratedtechniquewhichcombinesthemeritsofafewindividualapproaches.翻譯:什么是數據挖掘?簡單地說,數據挖掘是從大量的數據中提取或“挖掘”知識。該術語實際上有點兒用詞不當。注意,從礦石或砂子中挖掘黃金叫做黃金挖掘,而不是叫做礦石挖掘。這樣,數據挖掘應當更準確地命名為“從數據中挖掘知識”,不幸的是這個有點兒長。“知識挖掘”是一個短術語,可能它不能反映出從大量數據中挖掘的意思。畢竟,挖掘是一個很生動的術語,它抓住了從大量的、未加工的材料中發現少量金塊這一過程的特點。這樣,這種用詞不當攜帶了“數據”和“挖掘”,就成了流行的選擇。還有一些術語,具有和數據挖掘類似但稍有不同的含義,如數據庫中的知識挖掘、知識提取、數據/模式分析、數據考古和數據捕撈。許多人把數據挖掘視為另一個常用的術語—數據庫中的知識發現或KDD的同義詞。而另一些人只是把數據挖掘視為數據庫中知識發現過程的一個基本步驟。知識發現的過程由以下步驟組成:1)數據清理:消除噪聲或不一致數據,2)數據集成:多種數據可以組合在一起,3)數據選擇:從數據庫中檢索與分析任務相關的數據,4)數據變換:數據變換或統一成適合挖掘的形式,如通過匯總或聚集操作,5)數據挖掘:基本步驟,使用智能方法提取數據模式,6)模式評估:根據某種興趣度度量,識別表示知識的真正有趣的模式,7)知識表示:使用可視化和知識表示技術,向用戶提供挖掘的知識。數據挖掘的步驟可以與用戶或知識庫進行交互。把有趣的模式提供給用戶,或作為新的知識存放在知識庫中。注意,根據這種觀點,數據挖掘只是整個過程中的一個步驟,盡管是最重要的一步,因為它發現隱藏的模式。我們同意數據挖掘是知識發現過程中的一個步驟。然而,在產業界、媒體和數據庫研究界,“數據挖掘”比那個較長的術語“數據庫中知識發現”更為流行。因此,在本書中,選用的術語是數據挖掘。我們采用數據挖掘的廣義觀點:數據挖掘是從存放在數據庫中或其他信息庫中的大量數據中挖掘出有趣知識的過程?;谶@種觀點,典型的數據挖掘系統具有以下主要成分:數據庫、數據倉庫或其他信息庫:這是一個或一組數據庫、數據倉庫、電子表格或其他類型的信息庫??梢栽跀祿线M行數據清理和集成。數據庫、數據倉庫服務器:根據用戶的數據挖掘請求,數據庫、數據倉庫服務器負責提取相關數據。知識庫:這是領域知識,用于指導搜索,或評估結果模式的興趣度。這種知識可能包括概念分層,用于將屬性或屬性值組織成不同的抽象層。用戶確信方面的知識也可以包含在內??梢允褂眠@種知識,根據非期望性評估模式的興趣度。領域知識的其他例子有興趣度限制或閾值和元數據(例如,描述來自多個異種數據源的數據)。數據挖掘引擎:這是數據挖掘系統基本的部分,由一組功能模塊組成,用于特征化、關聯、分類、聚類分析以及演變和偏差分析。模式評估模塊:通常,此成分使用興趣度度量,并與數據挖掘模塊交互,以便將搜索聚集在有趣的模式上。它可能使用興趣度閾值過濾發現的模式。模式評估模塊也可以與挖掘模塊集成在一起,這依賴于所用的數據挖掘方法的實現。對于有效的數據挖掘,建議盡可能深地將模式評估推進到挖掘過程之中,以便將搜索限制在有興趣的模式上。圖形用戶界面:本模塊在用戶和數據挖掘系統之間進行通信,允許用戶與系統進行交互,指定數據挖掘查詢或任務,提供信息、幫助搜索聚焦,根據數據挖掘的中間結果進行探索式數據挖掘。此外,此成分還允許用戶瀏覽數據庫和數據倉庫模式或數據結構,評估挖掘的模式,以不同的形式對模式進行可視化。從數據倉庫觀點,數據挖掘可以看作聯機分析處理(OLAP)的高級階段。然而,通過結合更高級的數據理解技術,數據挖掘比數據倉庫的匯總型分析處理走得更遠。盡管市場上已有許多“數據挖掘系統”,但是并非所有系統的都能進行真正的數據挖掘。不能處理大量數據的數據分析系統,最多是被稱作機器學習系統、統計數據分析工具或實驗系統原型。一個系統只能夠進行數據或信息檢索,包括在大型數據庫中找出聚集的值或回答演繹查詢,應當歸類為數據庫系統

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論