大數據外文翻譯參考文獻綜述_第1頁
大數據外文翻譯參考文獻綜述_第2頁
大數據外文翻譯參考文獻綜述_第3頁
大數據外文翻譯參考文獻綜述_第4頁
大數據外文翻譯參考文獻綜述_第5頁
已閱讀5頁,還剩10頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

大數據外文翻譯參考文獻綜述大數據外文翻譯參考文獻綜述(文檔含中英文對照即英文原文和中文翻譯)原文:DataMiningandDataPublishingDataminingistheextractionofvastinterestingpatternsorknowledgefromhugeamountofdata.Theinitialideaofprivacy-preservingdataminingPPDMwastoextendtraditionaldataminingtechniquestoworkwiththedatamodifiedtomasksensitiveinformation.Thekeyissueswerehowtomodifythedataandhowtorecoverthedataminingresultfromthemodifieddata.Privacy-preservingdataminingconsiderstheproblemofrunningdataminingalgorithmsonconfidentialdatathatisnotsupposedtoberevealedeventothepartyrunningthealgorithm.Incontrast,privacy-preservingdatapublishing(PPDP)maynotnecessarilybetiedtoaspecificdataminingtask,andthedataminingtaskmaybeunknownatthetimeofdatapublishing.PPDPstudieshowtotransformrawdataintoaversionthatisimmunizedagainstprivacyattacksbutthatstillsupportseffectivedataminingtasks.Privacy-preservingforbothdatamining(PPDM)anddatapublishing(PPDP)hasbecomeincreasinglypopularbecauseitallowssharingofprivacysensitivedataforanalysispurposes.Onewellstudiedapproachisthek-anonymitymodel[1]whichinturnledtoothermodelssuchasconfidencebounding,l-diversity,t-closeness,(α,k)-anonymity,etc.Inparticular,allknownmechanismstrytominimizeinformationlossandsuchanattemptprovidesaloopholeforattacks.Theaimofthispaperistopresentasurveyformostofthecommonattackstechniquesforanonymization-basedPPDM&PPDPandexplaintheireffectsonDataPrivacy.Althoughdataminingispotentiallyuseful,manydataholdersarereluctanttoprovidetheirdatafordataminingforthefearofviolatingindividualprivacy.Inrecentyears,studyhasbeenmadetoensurethatthesensitiveinformationofindividualscannotbeidentifiedeasily.AnonymityModels,k-anonymizationtechniqueshavebeenthefocusofintenseresearchinthelastfewyears.Inordertoensureanonymizationofdatawhileatthesametimeminimizingtheinformationlossresultingfromdatamodifications,everalextendingmodelsareproposed,whicharediscussedasfollows.1.k-Anonymityk-anonymityisoneofthemostclassicmodels,whichtechniquethatpreventsjoiningattacksbygeneralizingand/orsuppressingportionsofthereleasedmicrodatasothatnoindividualcanbeuniquelydistinguishedfromagroupofsizek.Inthek-anonymoustables,adatasetisk-anonymous(k≥1)ifeachrecordinthedatasetisin-distinguishablefromatleast(k.1)otherrecordswithinthesamedataset.Thelargerthevalueofk,thebettertheprivacyisprotected.k-anonymitycanensurethatindividualscannotbeuniquelyidentifiedbylinkingattacks.2.ExtendingModelsSincek-anonymitydoesnotprovidesufficientprotectionagainstattributedisclosure.Thenotionofl-diversityattemptstosolvethisproblembyrequiringthateachequivalenceclasshasatleastlwell-representedvalueforeachsensitiveattribute.Thetechnologyofl-diversityhassomeadvantagesthank-anonymity.Becausek-anonymitydatasetpermitsstrongattacksduetolackofdiversityinthesensitiveattributes.Inthismodel,anequivalenceclassissaidtohavel-diversityifthereareatleastlwell-representedvalueforthesensitiveattribute.Becausetherearesemanticrelationshipsamongtheattributevalues,anddifferentvalueshaveverydifferentlevelsofsensitivity.Afteranonymization,inanyequivalenceclass,thefrequency(infraction)ofasensitivevalueisnomorethanα.3.RelatedResearchAreasSeveralpollsshowthatthepublichasanin-creasedsenseofprivacyloss.Sincedataminingisoftenakeycomponentofinformationsystems,homelandsecuritysystems,andmonitoringandsurveillancesystems,itgivesawrongimpressionthatdataminingisatechniqueforprivacyintrusion.Thislackoftrusthasbecomeanobstacletothebenefitofthetechnology.Forexample,thepotentiallybeneficialdataminingre-searchproject,TerrorismInformationAwareness(TIA),wasterminatedbytheUSCongressduetoitscontroversialproceduresofcollecting,sharing,andanalyzingthetrailsleftbyindividuals.Motivatedbytheprivacyconcernsondataminingtools,aresearchareacalledprivacy-reservingdatamining(PPDM)emergedin2000.TheinitialideaofPPDMwastoextendtraditionaldataminingtechniquestoworkwiththedatamodifiedtomasksensitiveinformation.Thekeyissueswerehowtomodifythedataandhowtorecoverthedataminingresultfromthemodifieddata.Thesolutionswereoftentightlycoupledwiththedataminingalgorithmsunderconsideration.Incontrast,privacy-preservingdatapublishing(PPDP)maynotnecessarilytietoaspecificdataminingtask,andthedataminingtaskissometimesunknownatthetimeofdatapublishing.Furthermore,somePPDPsolutionsemphasizepreservingthedatatruthfulnessattherecordlevel,butPPDMsolutionsoftendonotpreservesuchproperty.PPDPDiffersfromPPDMinSeveralMajorWaysasFollows:1)PPDPfocusesontechniquesforpublishingdata,nottechniquesfordatamining.Infact,itisexpectedthatstandarddataminingtechniquesareappliedonthepublisheddata.Incontrast,thedataholderinPPDMneedstorandomizethedatainsuchawaythatdataminingresultscanberecoveredfromtherandomizeddata.Todoso,thedataholdermustunderstandthedataminingtasksandalgorithmsinvolved.ThislevelofinvolvementisnotexpectedofthedataholderinPPDPwhousuallyisnotanexpertindatamining.2)Bothrandomizationandencryptiondonotpreservethetruthfulnessofvaluesattherecordlevel;therefore,thereleaseddataarebasicallymeaninglesstotherecipients.Insuchacase,thedataholderinPPDMmayconsiderreleasingthedataminingresultsratherthanthescrambleddata.3)PPDPprimarily“anonymizes”thedatabyhidingtheidentityofrecordowners,whereasPPDMseekstodirectlyhidethesensitivedata.ExcellentsurveysandbooksinrandomizationandcryptographictechniquesforPPDMcanbefoundintheexistingliterature.Afamilyofresearchworkcalledprivacy-preservingdistributeddatamining(PPDDM)aimsatperformingsomedataminingtaskonasetofprivatedatabasesownedbydifferentparties.ItfollowstheprincipleofSecureMultipartyComputation(SMC),andprohibitsanydatasharingotherthanthefinaldataminingresult.Cliftonetal.presentasuiteofSMCoperations,likesecuresum,securesetunion,securesizeofsetintersection,andscalarproduct,thatareusefulformanydataminingtasks.Incontrast,PPDPdoesnotperformtheactualdataminingtask,butconcernswithhowtopublishthedatasothattheanonymousdataareusefulfordatamining.WecansaythatPPDPprotectsprivacyatthedatalevelwhilePPDDMprotectsprivacyattheprocesslevel.Theyaddressdifferentprivacymodelsanddataminingscenarios.Inthefieldofstatisticaldisclosurecontrol(SDC),theresearchworksfocusonprivacy-preservingpublishingmethodsforstatisticaltables.SDCfocusesonthreetypesofdisclosures,namelyidentitydisclosure,attributedisclosure,andinferentialdisclosure.Identitydisclosureoccursifanadversarycanidentifyarespondentfromthepublisheddata.Revealingthatanindividualisarespondentofadatacollectionmayormaynotviolateconfidentialityrequirements.Attributedisclosureoccurswhenconfidentialinformationaboutarespondentisrevealedandcanbeattributedtotherespondent.Attributedisclosureistheprimaryconcernofmoststatisticalagenciesindecidingwhethertopublishtabulardata.Inferentialdisclosureoccurswhenindividualinformationcanbeinferredwithhighconfidencefromstatisticalinformationofthepublisheddata.SomeotherworksofSDCfocusonthestudyofthenon-interactivequerymodel,inwhichthedatarecipientscansubmitonequerytothesystem.Thistypeofnon-interactivequerymodelmaynotfullyaddresstheinformationneedsofdatarecipientsbecause,insomecases,itisverydifficultforadatarecipienttoaccuratelyconstructaqueryforadataminingtaskinoneshot.Consequently,thereareaseriesofstudiesontheinteractivequerymodel,inwhichthedatarecipients,includingadversaries,cansubmitasequenceofqueriesbasedonpreviouslyreceivedqueryresults.Thedatabaseserverisresponsibletokeeptrackofallqueriesofeachuseranddeterminewhetherornotthecurrentlyreceivedqueryhasviolatedtheprivacyrequirementwithrespecttoallpreviousqueries.Onelimitationofanyinteractiveprivacy-preservingquerysystemisthatitcanonlyanswerasublinearnumberofqueriesintotal;otherwise,anadversary(oragroupofcorrupteddatarecipients)willbeabletoreconstructallbut1.o(1)fractionoftheoriginaldata,whichisaverystrongviolationofprivacy.Whenthemaximumnumberofqueriesisreached,thequeryservicemustbeclosedtoavoidprivacyleak.Inthecaseofthenon-interactivequerymodel,theadversarycanissueonlyonequeryand,therefore,thenon-interactivequerymodelcannotachievethesamedegreeofprivacydefinedbyIntroductiontheinteractivemodel.Onemayconsiderthatprivacy-reservingdatapublishingisaspecialcaseofthenon-interactivequerymodel.Thispaperpresentsasurveyformostofthecommonattackstechniquesforanonymization-basedPPDM&PPDPandexplainstheireffectsonDataPrivacy.k-anonymityisusedforsecurityofrespondentsidentityanddecreaseslinkingattackinthecaseofhomogeneityattackasimplek-anonymitymodelfailsandweneedaconceptwhichpreventfromthisattacksolutionisl-diversity.Alltuplesarearrangedinwellrepresentedformandadversarywilldiverttolplacesoronlsensitiveattributes.l-diversitylimitsincaseofbackgroundknowledgeattackbecausenoonepredictsknowledgelevelofanadversary.Itisobservethatusinggeneralizationandsuppressionwealsoapplythesetechniquesonthoseattributeswhichdoesn’tneedthisextentofprivacyandthisleadstoreducetheprecisionofpublishingtable.e-NSTAM(extendedSensitiveTuplesAnonymityMethod)isappliedonsensitivetuplesonlyandreducesinformationloss,thismethodalsofailsinthecaseofmultiplesensitivetuples.Generalizationwithsuppressionisalsothecausesofdatalosebecausesuppressionemphasizeonnotreleasingvalueswhicharenotsuitedforkfactor.Futureworksinthisfrontcanincludedefininganewprivacymeasurealongwithl-diversityformultiplesensitiveattributeandwewillfocustogeneralizeattributeswithoutsuppressionusingothertechniqueswhichareusedtoachievek-anonymitybecausesuppressionleadstoreducetheprecisionofpublishingtable.

譯文:數據挖掘和數據發布數據挖掘中提取出大量有趣的模式從大量的數據或知識。數據挖掘隱私保護PPDM的最初的想法是將傳統的數據挖掘技術擴展到處理數據修改為屏蔽敏感信息。關鍵問題是如何修改數據以及如何從修改后的數據恢復數據挖掘的結果。隱私保護數據挖掘認為機密數據上運行數據挖掘算法的問題不應該透露方運行算法。相比之下,隱私保護數據發布(PPDP)不一定是綁定到一個特定的數據挖掘任務,和數據挖掘任務時可能是未知的數據發布。PPDP研究如何將原始數據轉換成一個版本接種隱私攻擊,但仍然支持有效的數據挖掘任務。隱私保護數據挖掘(PPDM)和數據發布(PPDP)已成為越來越受歡迎,因為它允許共享隱私的敏感數據進行分析的目的。深入研究方法之一是k-anonymity匿名模型進而導致信心邊界等模型,l-diversity,t-closeness,(α,k)-anonymity,等。特別是,所有已知的機制,盡量減少信息損失,試圖提供一個漏洞攻擊。本文的目的是提出一項調查最常見的攻擊技術即PPDM&PPDP和解釋它們對數據隱私的影響。盡管數據挖掘可能是有用的,很多數據持有者不愿提供他們的數據對數據挖掘的恐懼侵犯個人隱私。近年來,研究了以確保個人敏感信息不能輕易識別。匿名模型(k-匿名)技術一直是研究的焦點,在過去的幾年里。為了確保匿名數據的同時盡量減少所造成的信息損失數據的修改,提出了幾個擴展模型,討論如下。1.k-匿名模型k-anonymity最經典模型之一,加入的攻擊技術,防止泛化和/或抑制微數據發布的一部分,這樣任何個人可以獨特區別一群大小k。k-anonymous表,一個數據集是k-anonymous(k≥1)如果每個記錄的數據集——至少(k區分開來)其他相同的數據集內的記錄。k值越大,更好的隱私保護。英蒂k-anonymity可以確保——viduals不能唯一標識鏈接攻擊。2.擴展模型因為k-anonymity不提供足夠的保護屬性披露。l-diversity的概念試圖解決這個問題,要求每個等價類至少l上流每個敏感屬性值。比k-anonymityl-diversity技術有一定的優勢。因為k-anonymity數據集允許強大的攻擊由于缺乏多樣性的敏感屬性。在這個模型中,一個等價類據說l-diversity如果至少有l上流的敏感屬性的值。因為有語義屬性值之間的關系,以及不同價值觀有不同水平的敏感性。anonymization之后,在任何等價類,一個敏感的頻率(分數)值不超過α。3.相關研究領域一些民意調查顯示,公眾有——有折痕的隱私的失落感。由于數據挖掘通常是信息系統的一個關鍵組成部分,國土安全系統,以及監測和監測系統,它給了一個錯誤的印象,荷蘭國際集團數據隱私入侵的技術。這種缺乏信任已經成為障礙的技術中獲益。例如,潛在的有益的數據挖掘,搜索項目,恐怖主義信息意識(TIA),是由美國國會終止由于其爭議的程序收集、分享和分析個人留下的痕跡。出于隱私問題的數據挖掘工具,一個叫隱私保護的數據挖掘研究領域(PPDM)出現在2000年。PPDM的最初的想法是將傳統的數據挖掘技術擴展到處理數據修改為屏蔽敏感信息。關鍵問題是如何修改數據以及如何從修改后的數據恢復數據挖掘的結果。這些解決方案通常與數據挖掘算法在考慮緊密耦合。相比之下,隱私保護數據發布(PPDP)不一定綁到一個特定的數據挖掘任務,和數據挖掘任務有時是未知的數據發布的時候。此外,一些PPDP解決方案強調保存數據記錄級別的真實性,但是PPDM解決方案通常不保留這樣的財產。PPDP有別于PPDM在幾個主要方面如下:1)PPDP關注技術發布數據,數據挖掘技術。事實上,它預計,標準的數據挖掘技術應用于分析數據。相反,數據持有人在PPDM需要隨機數據的方式,數據挖掘結果可以從隨機數據中恢復過來。為此,持有人必須了解數據挖掘任務的數據和算法。這種級別的預計數據持有人參與PPDP通常不是一個數據挖掘專家。2)隨機化和加密不保存記錄的真實值水平;因此,公布的數據基本上是毫無意義的決策。在這種情況下,數據持有人PPDM

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論