|
Multi-View Dimensionality Reduction via Canonical Correlation Analysis
We analyze the multi-view regression problem where we have two views (X1,X2) of the input data and a real target variable Y of interest. In a semi-supervised learning setting, we consider two separate assumptions (one based on redundancy and the other based on (de)correlation) and show how, under either assumption alone, dimensionality reduction (based on CCA) could reduce the labeled sample complexity. The basic semi-supervised algorithm is as follows: with unlabeled data, perform CCA; with the labeled data, project the inputs onto a certain CCA subspace (i.e. perform dimensionality reduction) and then do least squares regression in this lower dimensional space. We show how, under either assumption, the number of labeled samples could be significantly reduced (in comparison to the single view setting) - in particular, we show how this dimensionality reduction only introduces little bias but could drastically reduce the variance. The two assumptions we consider are a redundancy assumption and an uncorrelated assumption. Under the redundancy assumption, we have that the best predictor from each view is roughly as good as the best predictor using both views. Under the uncorrelated assumption, we have that conditioned on Y the views X1 and X2 are uncorrelated. We show that under either of these assumptions, CCA is appropriate as a dimensionality reduction technique. We are also in the process of large scale experiments on word disambiguation (using theWikipedia, with the disambiguation pages as helping to provide labels). This work presents extensions of ideas in Ando and Zhang [2007] and Kakade and Foster [2007].
Video Length: 0
Date Found: October 13, 2010
Date Produced: December 20, 2008
View Count: 0
|