Chapter5RelevanceFeedba._第1頁
Chapter5RelevanceFeedba._第2頁
Chapter5RelevanceFeedba._第3頁
Chapter5RelevanceFeedba._第4頁
Chapter5RelevanceFeedba._第5頁
已閱讀5頁,還剩24頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、Chapter 5Relevance Feedback andQuery Exp ansi onoutlineIntroductionA Framework for Feedback MethodsExp licit Releva nee FeedbackExp licit Feedback Through ClicksImplicit Feedback Through Local AnalysisImplicit Feedback Through Global Analysis難點與重點 1. query reformulation(SW) 2. User Releva nee Feedba

2、ck 3. Local Analysis 4. Global Analysis query reformulation 査詢重構 query expansion 査詢擴展 term reweighting語詞重新加權 User Relevance Feedback用戶相關反 饋聚族、聚類(cluster) 123 局部上下文分析(localcontext analysis) 129Automatic Global Analysis 自動全局 分析相似性敘詞表(similaritythesaurus) 1315 J Introduction Most users find it difficul

3、t to formulate queries that are well designed for retrieval puposes儆 most users often need to reformulate their queries to obtain the results of their interestfl Thus, the first query formulation should be treated as an initial attempt to retrieve relevant information Docume nts in itially retrieved

4、 could be analyzed for re leva nee and used to improve initial query厶 The p rocess of query modification is comm only referred as relevance feedback, when the user provides information on relevant documents to a query, or query expansion, when information related to the query is used to expand it4 W

5、e refer to both of them as feedback methods Two basic approaches of feedback methods: exp licit feedback, in which the informatio n for query reformulation is provided directly by lhe users, and imp licit feedback, in which the in formati on for query reformulation is implicitly derived by the syste

6、m5.2 A Framework for feedback methods Con side r a set of documents 口 that are known to be releva nt to the current query q In relevance feedback, the documents in Dr are used to transform q into a modified query 血 However, obtai ning in formation on documents relevant to a query requires the direct

7、 interference of the user Most users are unwilling to provide this information, particularly in the Webifl Because of this high cost, the idea of releva nee feedback has been relaxed over the years In stead of asking the users for the releva nt documents we could; Look at documents they have clicked

8、 on; or Look at terms belonging to the top documents in the result set In both cases, it is expect that the feedback cycle will preduce results of higher quality A feedback cycle is composed of two basic steps; Determine feedback information that is either related or expectec to be related to the or

9、iginal query (/ and Determine how to tansform query q to take this information effectively into account The first step can be accomplished in two distinct ways Obtain the feedback information explicitly from the users Obtain the feedback information implicitly from the query results or from external

10、 sources such as a thesaurus In an exp licit re leva nee feedback cycle, the feedback information is provided directly by the users However, collect!ng feedback information is expensive and time consuming In the Web, user clicks on search results constitute a new source of feedback information A cli

11、ck indicate a document that is of interest to the user in the con text of the curre nt query Notice that a click does not necessarily indicate a document that is releva nt to the queryExplicit FeedbacklEerqjer/qreeultcleer query qn&Lite(a) rolovanco foodback(b) click feedbackrolovantfMUltelicKed

12、 onrosuttmodified rjmt quary gmodiHod ucat tuhv f In an imp licit relevance feedback cycle, the feedback in formation is derived imp licitly by the system There are two basic app roaches for com piling implicit feedback information: local analysis, which derives the feedback information from the top

13、 ranked documents in the result set globa analysis, which derives the feedback information from external sources such as a thesaurusImp licit Feedbackuserqufifyquser 397 qthMainB«l i|resgresultsLocal enst03: tOpPOJtB -ctisben ng of results modified user quay sQiotxx enoiyois:4 - wteind thdcamje

14、-kira-JKUTiflri dmllariUefi; modified user tjuery 5dxumort r*' coHectoi(a) local analysis(b) global analysis5.3 Explicit Relevance Feedback In a classic releva nee feedback cycle, the user is presented with a list of the retrieved documents Then, the user examines them and marks those that are r

15、elevant In practice, only the top 10 (or 20) ranked documents need to be examined The main idea consists of selecting important terms from the documents that have been identified as relevant, and enhancing the importanee of these terms in a new query formulation Expected effect: the new query will b

16、e moved towards the relevant docs and away from the non-relevant ones Early experiments have shown good improvements in precision for small test collect!ons Releva nee feedback p resents the following characteristics; it shields the user from the details of the query reformulation p rocess (all the

17、user has to p rovide is a releva nee judgeme nt) it breaks down the whole searchi ng task into a seque nee of small steps which are easier to grasp相關性反饋-相關性反饋=用戶在最初獲得的檢索結果上進行 反饋-上要思想是改變查詢使得更為靠近相關文檔的向 量空間-用戶會標定返回的文檔相關或不相關-系統根據川戶的標5通過計算獲得用戶信息需求的一個更好的表達,改變用戶的初始査詢-相關性反饋可以一次或多次迭代進行例子:Initial Querynayana-

18、/1 >d%® Edt yjew 00 Bookmarks Tools 站 ndow belpIhtgnwan日.ecs.ucsb.eHujfia | Horre I % Browsing and .Shop ping related 607,000 images are indexed and classified in the databaseOnly One keyword is allowed!IbikelSearch |Designed by Baris Sumengen and Shawn NewsamPowered by JLAMP2Q00

19、 (Java Linux, Apache,Perl, Windorf s200Q)Browse I Search I Pre* Next I RandomResults for Initial Query(144473,1(54 JS)000.00.0(1047,33314030.00.00.9二* Vf" * *<l444d3, 2646U)DJ)0.00J>*fl44n,W38)os0.000(H44 力 J62$«3)O.Q0.00.00 00.00.0(H«57,a33l34) 000.000Q 機 483Z6SUO0.00.00.0(“440

20、0 265153)0.00.00.0(lUJlfl,337T52)0.30.00.00.0000.00.00.00.0<1<44$6, 250M4>0.00.00.0Releva nee FeedbackBrowse Search Prev Next RandomCH44i7,3JJM0)600.000t0.00.00.0(H4483.XS】S3)0.00.0ox3a444 力”CH44JT,2J31J4)Q0600.0(1444%, 263網OX)0,00.0;X)C)O 'wOQ 00 0 0(1444為,5打4)0 00 00 0(14418, anna:)0.

21、00.0Q000QQa444%,M 處 H)S00.00.0(H44%aOZ)CO0 0Q QBrowse I Search Prsv Neyt RandomResults after Relevance Feedback(114538,325491)0.23 IM40 啊 76CM4J3$,323S33)0.563192X0 26730403於紗0450l 0 351395(14435$, i2553S)0 5842790 23088103033980 6502對0 4U?450 2SSS3O44J58.J237»)0 6«709l9f0 3580330?gOS9(H44

22、73, 16349)0.672103P菊220.37917S(144456, 2伽乳)0 6?50ie640390J111I8(KM5(5, 355693)0 6769010.47G4JOJ904J1(H<475,0 7003390 3090030 39】3J7(144493,3(552154)0 T01707960 339X8<144478,513410) 0 70297 0.4(191110.2»g595.4 Explicit Feedback Through Clicks Web search engine users not only inspect the an

23、swers to their queries, they also click on them The clicks reflect preferences for particular answers in the context of a given query They can be collected in large numbers without interfering with the user actions立接的 I The immediate question is whether they also reflect relevance judgements on the

24、answers Under certain restrictions, the answer is affirmative as we now discuss5.4.1 Eye Tracking and Releva neeJudgements Clickthrough data provides limited information on the user behavior能買到的 One approach to complement information on user behavior is to use eye tracking devices Suchbe used todete

25、rmine the area of the screen the user is focussed in The approach allows correctly detecting the area of the screen of interest to the user in 60-90% of the cases0 Further, the cases for which the method does not work can be determined凝視、掃視、瞳孔擴大、掃視路徑 Eye movements can be classified in four types: fi

26、xations, saccades, pupil dilation, and scan paths Fixations are a gaze at a particular area of the screen lasting for 200-300 milliseconds This time interval is large enough to allow effective brain capture and interpretation of the image displayed Fixations are the ocular activity normally associat

27、edwith visual informatiacquisition and processing That is, fixatio ns are k to inter preting user behavior視覺Relevance Judgements lb evaluate the quality of the results, eye track!ng is not app ropriate This evaluation requires selecting a set of test queries and determini ng releva nee judgeme nts f

28、or them This is also the case if we intend to evaluate the quality of the signal produced by clicks5.4.2 User Behavior Eye tracking experiments have shown that users sean the query results from top to bottom The users inspect the first and second results right away, within the second or third fixati

29、on Further, they tend to scan the top 5 or top 6 answers thoroughly, before scrolling down to see other answers Percentage of times each one of the top results was viewed and clicked on by a user, for 10 test tasks and 29 subjects (Thorsten Joachims et al)rank用戶凝視及點擊找中一個排名在詢的結來的比例圖表說明 In 60-70%of th

30、e tasks the users have a fixation in either the first or the second result. The relative frequency of fixation in the fourth result drops to half. For the results at the bottom of the page, that can only be inspected after a scroll down, the relative frequency of fixation tends to be below 10%.點擊第一條

31、三倍以上 We notice that the users inspect the top 2 answers almost equally, but they click three times more in the first This might be indicative of a user bias towards the search engine That is, that the users tend to trust the search engine in recommendi ng a top result that is re leva nt This can be

32、better understood by presenting test subjects with two distinct result sets; the normal ranking returned by the search engine and交換 a modified ranking in which the top 2 results have their positions swapped Analysis suggest that the user displays a trust bias in the search engine that favors the top

33、 resultfl That is, the position of the result has great influence on the user's decision to click on it543 Clicks as a Metric of Preferences Thus, It is clear that interp reting clicks as a direct indicative of relevance is not the best approach More premising is to interpret clicks as a metric

34、of user preferences For in stance, a user can look at a result and decide to skip it to click on a result that appears lower In this case, we say that the user prefers the result clicked on to the result shown upper in theanking This type of preference relation takes into account: the results clicke

35、d on by the user the results that were inspected and not clicked onClicks within a Same Query Empirical results indicate that user clicks are in agreement with judgements on the relevance of results in roughly 80% of the casesI Thus, the clicks of the users can be used as a strong indicative of pers

36、onal preferencesI Further, they also can be used as a strong indicative of the relative relevance of the results for a given queryClicks within a Query Chain The discussion above was restricted to the context of a single query However, in practice, users issue more than one query in their search for

37、 answers to a same task The set of queries associated with a same task can be identified in live query streams This set constitute what is referred to as a query chain The purpose of analysing query chains is to produce new p refers nee relations We notice that the second strategy produces twice mor

38、e preference relations than the first These p reference relations must be comp a red with the relevance judgements of the human assessors The following conclusions were derived: Both strategies produce preferenee relations in agreement with the relevance judgements in roughly 80% of the cases Simila

39、r agreements are observed even if we swap the first and second results Similar agreements are observed even if we reverse the order of the results連續迭代 These results suggest: The users provide negatiyfe feedback on whole result sets (by not clicking on them) The users learn wittythe process and refor

40、mulate better queries on the subsequent iterations5.5 implicit feedback through Local AnalysisLocal analysis Local analysis consists in deriving feedback information from the documents retrieved for a given query q This is simila r to a releva nee feedback cycle but done without assistance from the

41、user Two local strategies are discussed here: local clustering and local con text an alysis5.5.1 Implicit Feedback through Local ClusteringLocal Clustering Adoption of clustering techniques for query expansion has been a basic approach in information retnevsil The standard pracedure is to quantify t

42、erm correlations and then use the correlated terms for query exp ansi on Term correlations can be quantified by using global structures, such as association matrices However, global structures might not adapt well to the local context defined by the current query To deal with this problem, local clu

43、stering can be used, as we now discuss Each element Cn,v e Cg expresses a correlation between terms and 仏 This relationship between the terms is based on their joint co-occurrences inside documents of the collectior Higher the number of documents in which the two term co-occur, stronger is this corr

44、elation Correlation strengths can be used to define local clusters of neighbor terms Terms in a same cluster can then be used for query exp ansion We consider three types of clusters here: association clusters, metric clusters, and scalar clusters.-關聯聚類、距離聚類和標量聚類聚族基本思想利用局部文檔對term進行聚類,即將相關的term聚在一 起,

45、聚類的結果稱為一個個簇(cluster),于是利用簇中 的相關term對査詢q進行擴展。關鍵:定義termZ間的相似度,不同的相似度定義得 到不同的簇.三種簇定義:關聯験Association clusters度 屋簇Metric clusters標量鎂Scalar clusters.a:association clusters關聯聚類(族) An association cluster is computed from a local correlation matrix For that, we re-define the correlation factors c.v between a

46、ny pair of terms 斤“ and A® as follows: In this case the correlation matrix is referred to as a local association matrix The motivation is that terms that co-occur frequently inside documents have a synonymity association關聯聚類(族)-關聯聚集是基于文檔內詞干同時發生的情 況。這種思想的依抑:-是:如果一些詞條常 常同時出現在文檔中,則這些同時出現的 詞條具有同義性關

47、系。 Keyword-based local clustehng and stem local clusteri ng.此聚類包括關鍵詞聚類和詞干聚類。b: metric clusters (度量聚類) Association clusters do not take into account where the terms occur in a document However, two terms that occur in a same sente nee tend to be more correlated A metric cluster re-defines the correla

48、tion factors 血衛 as a function of theidistances in documentsb: metric clusters (度量聚類)-關聯聚集是基于文檔中詞條同時出現的頻率,而沒有考慮 詞條在文檔屮出現的位置。度量聚集的思想就是在同一個 句子中出現的兩個詞條耍比文檔中相隔很遠的詞條關聯密 切。-距離較近的同時出現在一份文檔中的共現詞被認為關系密 切,可以作為相關詞進行擴展。c: scalar clusters標量聚類 The correlation between two local terms can also be defined by com pari

49、ng the n eighborhoods of the two terms The idea is that two terms with similar neighborhoods have some synonymity relationship In this case we say that the relatio nship is in direct or induced by the neighborhood We can qua ntify this relationship com paring the neighborhoods of the terms through a

50、 scalar measure For instance, the cosine of the angle belween the two vectors is a popular scalar similarity measure此聚類認為兩個擁有相似臨近詞的詞干有 著間接或潛在的誘導關系,量化這種相鄰 關系的方法是將兩個詞干與他們的臨近詞 的關聯值進行排列,然后通過余弦比較兩 組向量的相妝度或相關度Query expansion with neighbor termsNeighbor Terms Terms that belong to clusters associated to the

51、 query terms can be used to exp and the origi nal query Such terms are called neighbors of the query terms and are characterized as follows A term that belongs to a cluster C/n), associated with another term is said to be a neighbor of k* Often, neighbor terms represent distinct keywords that arft c

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論