|
On a Theory of Similarity Functions for Learning and Clustering
Kernel methods have become powerful tools in machine learning. They perform well in many applications, and there is also a well-developed theory of what makes a given kernel useful for a given learning problem. However, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this talk I will describe work on developing a theory that just views a kernel as a measure of similarity between data objects, and describes the usefulness of a given kernel (or more general similarity function) in terms of fairly intuitive, direct properties of how the similarity function relates to the task at hand, without need to refer to any implicit spaces. I will also talk about an extension of this framework to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to cluster well: to learn well without any label information at all. We find that if we are willing to relax the objective a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if some pruning is close to the desired clustering), then this question leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. This work can be viewed defining a kind of PAC model for clustering. (This talk based on work joint with Maria-Florina Balcan, Santosh Vempala, and Nati Srebro).
Video Length: 0
Date Found: October 13, 2010
Date Produced: July 30, 2009
View Count: 0
|