最新跨媒體檢索介紹主題講座課件_第1頁
最新跨媒體檢索介紹主題講座課件_第2頁
最新跨媒體檢索介紹主題講座課件_第3頁
最新跨媒體檢索介紹主題講座課件_第4頁
最新跨媒體檢索介紹主題講座課件_第5頁
已閱讀5頁,還剩33頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、什么是跨媒體?從應用平臺方面理解電視機電腦手機報紙Ipad以文字搜文字以圖片搜圖片以文字搜圖片以文字搜視頻什么是跨媒體?從檢索研究方面理解什么是跨媒體? 2010年1月Nature發表的“2020 Vision”論文指出:文本、圖像、語音、視頻及其交互屬性將緊密混合(mix)在一起,即“跨媒體”。2011年2月Science開燈“Dealing with Data”專輯:數據的組織和使用體現跨媒體計算。趨勢:從“多媒體”研究向“跨媒體”發展!什么是跨媒體?跨媒體特性即多媒體數據之間以及用戶互動與多媒體數據之間存在著內容跨越與語義關聯。吳飛, 莊越挺. 互聯網跨媒體分析與檢索:理論與算法. 計算

2、機輔助設計與圖形學學報,Vol.22, No.1, pp.1-9, 2010.跨媒體的主要研究范疇跨媒體檢索:用戶向計算機提交一種類型的多媒體對象作為查詢例子,系統可以自動找到其它不同類型及語義上相似的多媒體對象。跨媒體推理:跨媒體推理是指從一種類型的多媒體數據,經過問題求解轉向另外一種類型的多媒體數據。(OCR等)跨媒體存儲:現有處理海量數據的檢索技術主要是針對文本信息,如google和百度等搜索引擎。跨媒體存儲研究高效壓縮、索引和分片等方法,以及對用戶行為的個性化索引等技術。驚濤駭浪?AudioVideoWebpageCorrelated multi-modal DataShared sp

3、aceHow to bridge both semantic-gap and heterogeneity gap?Japan Earthquake跨媒體分析的挑戰From FeiWu跨媒體的內容鴻溝視覺特征空間聽覺特征空間高層語義空間爆炸、海洋、天空、鳥。語義鴻溝內容鴻溝基于線性變換的子空間映射算法視覺特征空間聽覺特征空間投影子空間Heterogeneous Metric Learning with Joint Graph Regularizationfor Cross-Media RetrievalXiaohua Zhai, Yuxin Peng and Jianguo XiaoInstit

4、ute of Computer Science & technology, Peking UniversityAAAI 2013Existing metric learning methods have previously been designed primarily for single-media data and cannot be directly applied to cross-media data.Make full use of the structure information of the whole heterogeneous spaces.MotivationHet

5、erogeneous Metric Learning Given two sets of heterogeneous pairwise constraintsS is the set of similarity constraints and D is the set of dissimilarity constraints . Each pairwise constraints (xi,yj) indicates if two heterogeneous media objects xi and yj are relevant or irrelevant inferred from the

6、category label.Joint Graph Regularized Heterogeneous MetricThey propose to learn multiple linear transformation matrices U and V , they can map the heterogeneous media data to a common output spaces.The distance measure is defined as:Joint Graph Regularized Heterogeneous MetricObjective functionThe

7、formulation of the general regularization framework for heterogeneous distance metric learning is defined as:f (U, V) is the loss function defined on the sets of similarity and dissimilarity constraints S and D g(U, V) and r(U, V) are regularizer defined on the target parameter matrices U, V. , are

8、the balancing parameters.Joint Graph Regularized Heterogeneous MetricLoss functionThe minimization of the loss function will result in minimizing (maximizing) the distances between the media objects with the similarity (dissimilarity) constraintsNormalize the elements of Z column by column to make s

9、ure that the sum of each column is zero - to balance the influence of the similarity constraints and dissimilarity constraints.Joint Graph Regularized Heterogeneous MetricScale regularization r(U,V) is used to control the scale of the parameters matrices and reduce overfitting.Joint Graph Regularize

10、d Heterogeneous MetricJoint graph regularizationDefining a joint undirected graph, G = (V, W) on the dataset. Each element wij of the similarity matrix W = wij(m+n)(m+n) means the similarity between the i-th media object and j-th media object. Using label information to construct the symmetric simil

11、arity matrix: whereJoint Graph Regularized Heterogeneous MetricJoint graph regularizationSetting wii = 0 for 1 i m+n to avoid self-reinforcement. And the normalized graph Laplacian L is defined as: Where I is an (m+n)(m+n) identity matrix and D is an (m+n)(m+n) diagonal matrix with . is symmetric an

12、d positive semidefinite, with eigenvalue in the interval 0,2. where O represents for all of media objects in the learned metric space. denotes the normalized graph Laplacian. Joint Graph Regularized Heterogeneous MetricJoint graph regularizationThe formulation of g(U,V) :Minimizing g(U, V) encourage

13、s the smoothness of a mapping over the joint data graph, which is constructed from the initial label informationJoint Graph Regularized Heterogeneous Metric Iterative optimizationObtain orthogonal transformation matrices U and V , they minimize the following object function:where X and Y represent f

14、or two sets of coupled media objects from different media with the same labels. U and V define two orthogonal transformation spaces where media objects in X and Y can be projected as close to each other as possible.Maximize tr(XTUVTY) will minimize function, its singular value decomposition:Joint Gr

15、aph Regularized Heterogeneous MetricFix V and update U Different Q(U,V) with respect to U and V setting it to zero, respectively: Obtain the analytical solution U and V as We alternate between updates to U and V for several iterations to find a locally optimal solution. Here the iteration continues

16、until the cross-validation performance decreases on the training set. In practice, the iteration only repeats several rounds.Joint Graph Regularized Heterogeneous MetricDatasetsWikipedia: 2866 image-text pairs with label from the 10 semantic categories. This dataset is randomly split into a training

17、 set of 2173 documents and a test set of 693 documents. XMedia dataset : 5000 texts, 5000 images, 1000 audio, 500 videos and 500 3D models. This dataset is randomly split into a training set of 9600 media objects and a test set of 2400 media objects.ExperimentsFeatures Images: using bag-of-word mode

18、l. Each image is represented as a histogram of 128-codeword SIFT codebook. texts: each text represented as a 10-topic latent Dirichlet Allocation(LDA) model.Audio: 29-dim MFCC features to represent each clip of audio.Videos: segmenting each clip of video into video shots. Then 128-dimension BoW hist

19、ogram features are extracted for each video keyframe. The final similarity for video is obtained by averaging all of the similarities of the video keyframes. 3D model: Each 3D model is firstly represented as the concatenated 4700-dimension vector of a set of Light-Field descriptors as described in .

20、Then the concatenated vector is reduced to 128-dimension vector based on Principal Component Analysis (PCA)ExperimentsBaseline methods and Evaluation metricsCCA (Canonical correlation analysis): Through CCA we could learn the subspace that maximizes the correlation between two sets of heterogeneous

21、data.CFA(cross-modal factor analysis): it adopts a criterion of minimizing the Frobenius norm between pairwise data in the transformed domainCCA+SMN is current state-of-the-art , since it consider not only correlation analysis but also semantic abstraction for dierent modalities.ExperimentsMAP score

22、sExperimentsPrecision-Recall curvesExperiments多媒體數據的統一表達多媒體數據的表達是指采用哪個一定的數據結構來表示多媒體樣本。例如,采用四元組表示web頁面中的一幅圖像,或者提取圖像的底層視覺特征,構成多維向量來表示數據庫中的圖像。跨媒體檢索屬于基于內容的多媒體檢索范疇,只不過在檢索對象上從單一類型的多媒體數據擴充到多種不同類型的多媒體數據,支持數據間的靈活跨越。跨媒體檢索的性能很大程度上依賴于相似度匹配算法,而相似度匹配正式以不同類型的多媒體數據所采用的表達方式為依據的。因此數據表達模型的設計師非常基礎和重要的。多媒體數據的統一表達設有尚未標注的

23、圖像和音頻數據集合 ,作為訓練數據集合,已知覆蓋了Z個語義類別,映射算法描述如下:步驟1 聚類1)對于每一個語義類別Zi,分別提取其中包括的圖像和音頻數據的底層內容特征,建立相應的特征矩陣SI,SA;2)對于每一個語義類別Z,隨機選取m個圖像例子Ii進行語義標注;3)計算Ii在底層特征空間上的聚類質心ICri;4)與ICri為起始條件,對數據庫中所有的圖像數據進行kmeans聚類;5)聚類結果中屬于相同類別的圖像被賦予與Ii相同的語義標記;6)對音頻數據集重復1-4。多媒體數據的統一表達相關性保持映射1)分析圖像和音頻之間在底層內容特征上的典型相關性,即計算SI和SA對應的子空間基向量Wx和W

24、y; 2)求取視覺和聽覺特征響亮映射到子空間中的向量表示:Web環境中的跨媒體相關性推理在具體的應用環境中,如web,往往包含了一些具體的數據特征,這些特征比多媒體數據本身的內容特征蘊含更直接的語義信息,可以用來輔助內容特征進行跨媒體檢索,提高檢索效率。例如,web連接就可以作為一種輔助特征。跨媒體關聯圖圖模型是一種常用的數據關系表達方式,可以用途模型表達web環境中的圖像,以及圖像相關的各種特征。這種表達方式不但可以清楚地描述數據之間的各種聯系,而且有助于發現數據之間的互補信息。對于多媒體數據而言,多種類型的多媒體數據之間存在著復雜的數據關系,主要可以劃分為模態內部(intra-media

25、correlation)和模態之間(cross-media correlation)兩種數據關系。鏈接關系分析分別用V,I,A表示視頻、圖像和音頻數據集,m,n,k分別是數據集V,I,A中的樣本個數,用xVi,xIi,xAi分別表示數據庫中第i個視頻、第i個圖像,以及第i個音頻數據的特征向量。根據如下兩個啟發式規則,可以利用web環境中多媒體數據所在網頁之間的鏈接關系,度量不同類型多媒體數據之間的相關性(cross-media distance)大小:規則1:如果兩個媒體對象a和b同屬于一個web頁面,則a和b在語義具有相似性;規則2:如果web頁面A指向另一頁面B和C,則B中包含的多媒體對象

26、和C中包含的多媒體對象在語義上具有相似性。鏈接關系分析根據上述啟發規則,建立視頻-圖像、圖像-音頻和音頻-視頻的跨媒體關聯矩陣LVI,LIA,LAV,以LIA為例,其矩陣元素rij表示多媒體數據 之間的相關值,rij計算方法如下:輸入:從web頁面獲取的圖像和音頻數據輸出:跨媒體相關矩陣LIA1.2. 3.4.5. Construct a symmetric matrix LIA, whose cell lij is the normalized values of rij.基于圖模型的全局相關性推理圖像音頻IaIbIcIdAaAbAc近年來的研究熱點Cross-media RetrievalCross-media RankingCross-media HashingCross-collection Topic ModelingFrom FeiWuMission: learn one appropriate metric for ranking multi-modal data to preserve the orders of relevance. For

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論