Home Research Publications Awards Resources

Multi-Label Dimensionality Reduction

Multi-Label Dimensionality Reduction via Hypergraph

Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. One fundamental challenge in multi-label learning is how to effectively identify and incorporate correlations among class labels into the classification model. We have proposed Hypergraph Spectral Learning (HSL) [1] to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. A hypergraph is a generalization of the traditional graph in which the edges, called hyperedges, are arbitrary non-empty subsets of the vertex set. To capture the correlation among labels, we create a hyperedge for each label and include all data points relevant to this label into the hyperedge. A hypergraph example, including its table representation, graph representation, as well as the corresponding clique expansion and star expansion, is demonstrated in the figure below. Based on the Laplacian of the constructed hypergraph, we propose the hypergraph spectral learning framework for learning a low-dimensional embedding through a linear transformation by solving an optimization problem preserving the inherent relationship among data points captured by the Laplacian. Intuitively, data points that share many common labels tend to be close to each other in the embedded space. HSL has been applied successfully for Drosophila gene expression pattern annotation by integrating different types of features from the images [2].

Hypergraph Example

 

Selected Publications:

[1] Liang Sun, Shuiwang Ji, and Jieping Ye. Hypergraph Spectral Learning for Multi-label Classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008).
[2] Shuiwang Ji, Liang Sun, Rong Jin, Sudhir Kumar, and Jieping Ye. Automated Annotation of Drosophila Gene Expression Patterns using a Controlled Vocabulary. Bioinformatics, 24(17):1881-1888, 2008.

Canonical Correlation Analysis (CCA)

Canonical Correlation Analysis (CCA) is a classical tool in statistics and machine learning. CCA is commonly applied for multi-label learning in which the two sets of variables are derived from the data and the class labels, respectively. We have proposed a sparse CCA formulation by formulating CCA into an equivalent least squares problem [1] so that the 1-norm regularization can be incorporated into the least squares formulation. Compared with other sparse CCA algorithms which employ iterative greedy schemes to compute the projection vectors sequentially, our algorithm can compute all projection vectors simultaneously and results in larger correlation coefficients.

The use of regularization in CCA has natural statistical interpretations. In practice, regularization is commonly enforced for both sets of multidimensional variables in CCA, as it is generally believed that the CCA solution is dependent on the regularization on both variables. Our research on CCA shows that the CCA projection for one set of variables is independent of the regularization on the other set of multidimensional variables, providing new insights on the effect of regularization on CCA [2,3].

Selected Publications:

[1] Liang Sun, Shuiwang Ji, and Jieping Ye. A Least Squares Formulation for Canonical Correlation Analysis. In Proceedings of the 25th International Conference on Machine Learning (ICML 2008).
[2] Liang Sun, Shuiwang Ji, Shipeng Yu, and Jieping Ye. On the Equivalence Between Canonical Correlation Analysis and Orthonormalized Partial Least Squares. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009).
[3] Liang Sun, Shuiwang Ji, and Jieping Ye. Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 194-200, 2011.

Efficient Implementations via Least Squares

Many popular dimensionality reduction algorithms can be formulated as a generalized eigenvalue problem (GEP), such as CCA and HSL. Since large-scale generalized eigenvalue problem is more challenging to solve, we propose two efficient implementations for a class of dimensionality reduction algorithms based on least squares: 1) the direct least squares approach and 2) the two-stage approach for the following dimensionality reduction algorithms:

Direct Least Squares Approach

In this approach, we proposed a direct least squares formulation to solve these dimensionality reduction algorithms [1, 2]. We rigorously prove that under a mild condition these dimensionality reduction algorithms can be equivalently reformulated as a least squares problem with a specific target matrix. As a result, efficient algorithms for least squares, such as the iterative conjugate gradient algorithm LSQR can be readily applied. Another benefit of this equivalence relationship is that these dimensionality reduction algorithms can be extended by incorporating the regularization (2-norm or 1-norm) into the corresponding least squares formulations.

Selected Publications:

[1] Liang Sun, Shuiwang Ji, and Jieping Ye. A Least Squares Formulation for a Class of Generalized Eigenvalue Problems in Machine Learning. In Proceedings of the 26th International Conference on Machine Learning (ICML 2009).
[2] Liang Sun, Betul Ceran, and Jieping Ye. Least Squares Dimensionality Reduction. ACM Transactions on Knowledge Discovery from Data. Invited Paper.  

Two-Stage Approach

One limitation of the direct least squares approach is that the equivalence relationship only holds when the data points are linearly independent, which may not be the case for low-dimensional data. Furthermore, the equivalence relationship between the least squares formulation and the original generalized eigenvalue problem fails when the regularization is considered.

We have proposed a novel two-stage approach, and showed that the two-stage approach is equivalent to solving the generalized eigenvalue problem directly without any assumption, in both unregularized and regularized settings [1,2]. Specifically, in the first stage, we solve a least squares problem by regressing the data on some target matrix using the iterative conjugate gradient algorithm; in the second stage, we solve a generalized eigenvalue problem by replacing the original data matrix with the projected data using the solution of the first stage as the projection matrix. Since the data dimensionality is reduced dramatically after the projection, the resulting generalized eigenvalue problem in the second stage can be solved efficiently. This work [1] won the best research paper award honorable mention at the prestigious ACM SIGKDD 2010 conference.

More details on the two-stage approach can be found in [1, 2] and here.

Selected Publications:

[1] Liang Sun, Betul Ceran, Jieping Ye. A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010).
[2] Liang Sun, Betul Ceran, and Jieping Ye. Least Squares Dimensionality Reduction. ACM Transactions on Knowledge Discovery from Data. Invited Paper.  

Sparse Learning, Compressed Sensing and their Applications

Theory and Algorithms in Compressed Sensing and Sparse Learning

One interesting problem in Compressed Sensing (CS) is how the additional structure in sparse signals can lead to improved recovery performance. In my doctoral research, the recovery conditions for block-sparse signals in both the Mutual Incoherence Property (MIP) framework and the Restricted Isometry Property (RIP) framework are investigated [1]. Block-sparse signals arise in various applications, including multi-band signal processing, equalization of sparse communication channels, DNA microarrays, etc. In particular, the Multiple Measurement Vector (MMV) model, in which the signal, represented as a matrix consisting of a set of jointly sparse vectors, can be considered as a special case of the block-sparse model. To facilitate the processing of the block structures in CS, a general block p-norm for block matrices is proposed. In particular, we establish an upper bound for the recovery error in the noisy case under the same recovery condition as the noiseless case.

In terms of the recovery algorithm, we have proposed an efficient algorithm [2] to recover jointly sparse vectors in the MMV model by solving the dual problem, which can be reformulated as a min-max problem and can be solved efficiently via the prox-method with a nearly dimension-independent convergence rate. Compared with existing algorithms which solve it using second-order cone programming (SOCP) or semidefinite programming (SDP), our algorithm can scale to larger problems while achieving high accuracy.

Selected Publications:

[1] Liang Sun and Jieping Ye. On the Exact Recovery Condition for Block-Sparse Signals. Submitted to IEEE Transactions on Signal Processing, 2010.
[2] Liang Sun, Jun Liu, Jianhui Chen, and Jieping Ye. Efficient Recovery of Jointly Sparse Vectors. In Advances in Neural Information Processing Systems 22 (NIPS 2009). 

Sparse Inverse Covariance Estimation and its Application in Alzheimer's Disease (AD)

We have developed a novel algorithm for Sparse Inverse Covariance Estimation (SICE) [1]. SICE originates from the broader problem of covariance matrix estimation from data, and it has a clear interpretation that the off-diagonal elements correspond to partial correlations. It has been shown to be useful in various applications, including evaluating patterns of association among variables, exploration of genetic networks, senator voting records analysis, hyperspectral image classification, and speech recognition.Unlike most existing algorithms, we estimate the inverse covariance matrix directly. One appealing feature of the proposed algorithm is that it allows the user feedback (e.g., prior domain knowledge) to be incorporated into the estimation process, while the connectivity patterns can be discovered automatically. The algorithm has been applied to analyze the connectivity patterns among different brain regions in the study of Alzheimer's disease (AD) [1,2,3], which holds great promise for identifying image-based markers used to distinguish among AD, MCI (Mild Cognitive Impairment), and Normal Control (NC). Our studies on the Alzheimer's Disease Neuroimaging Initiative (ADNI) data set reveal several interesting connectivity patterns consistent with literature findings, and also some new patterns that can help the knowledge discovery of AD.

Selected Publications:

[1] Liang Sun, Rinkal Patel, Jun Liu, Kewei Chen, Teresa Wu, Jing Li, Eric Reiman, and Jieping Ye. Mining Brain Region Connectivity for Alzheimer's Disease Study via Sparse Inverse Covariance Estimation. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009).
[2] Shui Huang, Jing Li, Liang Sun, Jun Liu, Teresa Wu, Kewei Chen, Adam Fleisher, Eric Reiman, and Jieping Ye. Learning Brain Connectivity of Alzheimer's Disease from Neuroimaging Data. In Advances in Neural Information Processing Systems 22 (NIPS 2009).
[3] Shuai Huang, Jing Li, Liang Sun, Jieping Ye, Adam Fleisher, Teresa Wu, Kewei Chen, and Eric Reiman. Learning Brain Connectivity of Alzheimer's Disease by Sparse Inverse Covariance Estimation. NeuroImage, 50, pp. 935-949, 2010.