Beyond Feature Selection and
Extraction
- An Integrated Framework for
High-Dimensional Data of Small Labeled Samples
Description
High-dimensional data is ubiquitous in real-world applications - from text categorization,
to image processing, and to Web searches. The shortage of labeled data,
resulting from high labeling costs, necessitates the need to explore machine
learning approaches beyond classic classification and clustering paradigms.
Semi-supervised learning is one such approach that demonstrates its potential
in handling data with small labeled samples and reducing the need for expensive
labeled data. However, high-dimensional data with small labeled samples permits
too large a hypothesis space yet with too few constraints (labeled instances).
The combination of the two data characteristics manifests a new research
challenge. Employing computational and statistical learning theory, we analyze
specific challenges presented by such data, show preliminary studies, delineate
the need to integrate feature selection and extraction in a novel framework to
reduce hypothesis space, propose to design efficient and novel algorithms, and
conduct theoretical and empirical studies to understand complex relationships
between high-dimensional data and classification performance.
Publications
- Journal Articles
- Z. Zhao, L. Wang, H. Liu and J. Ye "On Similarity Preserving Feature Selection"
,IEEE Transactions on Knowledge and Data Engineering (TKDE), forthcoming.
- L. Yuan, Y. Wang, P. Thompson, V. Narayan and J. Ye, Multi-source Feature Learning for Joint Analysis of Incomplete Multiple Heterogeneous Neuroimaging Data, NeuroImage Volume 61, Issue 3, 2 July 2012, Pages 622-632
- Z. Zhao and H. Liu. "Multi-Source Feature Selection
via Geometry-Dependent Covariance Analysis", JMLR Workshop and
Conference Proceedings Volume 4: New challenges for feature selection in
data mining and knowledge discovery, 4:36-47, 2008
- Z. Zhao and H. Liu. "Searching for Interacting
Features in Subset Selection", Intelligent Data Analysis - An
International Journal, 13:207-228, 2009.
- M. Berens, H. Liu, L. Parsons, L. Yu, and Z. Zhao.
“Fostering Biological Relevance in Feature Selection for Microarray
Data”, Trends and Controversies,[PDF], pp
71 - 73. November/December 2005, IEEE Intelligent Systems.
- H. Liu and L. Yu. "Toward Integrating Feature
Selection Algorithms for Classification and Clustering", IEEE
Trans. on Knowledge and Data Engineering, pdf,
17(4), 491-502, 2005.
- Jieping Ye, Jianhui Chen, Ravi Janardan, and Sudhir
Kumar. Developmental Stage Annotation of Drosophila Gene Expression
Pattern Images via an Entire Solution Path for LDA. ACM Transactions on Knowledge Discovery
from Data. special issue on Bioinformatics. Vol. 2, No. 1,
pp. 1-21, 2008. [ PDF]
- Conferences and Workshops
- J. Tang and H. Liu. Unsupervised Feature Selection for Linked Soical Media Data, The ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2012). [PDF]
- J. Tang and H. Liu. Feature Selection with Linked Data in Social Media, SIAM International Conference on Data Mining (SDM2012) [PDF]
- L. Yuan, Y. Wang, P. Thompson, V. Narayan and J. Ye, Multi-Source Learning for Joint Analysis of Incomplete Multi-Modality Neuroimaging Data, The ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2012)
- L. Yuan, J. Liu and J. Ye, Efficient Methods for Overlapping Group Lasso, Twenty-Fifth Annual Conference on Neural Information Processing Systems (NIPS 2011)
- J. Liu, L. Yuan, and J. Ye, An Efficient Algorithm for a Class of Fused Lasso Problems, The Sixteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2010).
- Z. Zhao, L. Wang, and H. Liu. Efficient Spectral Feature Selection with Minimum Redundancy. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2010 . [PDF, Supplementary]
- Z. Zhao, J. Wang, S. Sharma, N. Agarwal, H. Liu, and Y. Chang. An Integrative Approach to Identifying Biologically Relevant Genes. In Proceedings of SIAM International Conference on Data Mining (SDM), 2010. [PDF]
- Z. Zhao, J. Wang, H. Liu, and Y. Chang. Biological relevance detection via network dynamic analysis. In Proceedings of 2nd International Conference on Bioinformatics and Computational Biology (BICoB), 2010. BEST PAPER AWARD [PDF]
- J. Liu, L. Yuan, and J. Ye. An Efficient Algorithm for a Class of Fused Lasso Problems. The Sixteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2010). [PDF]
- L. Sun, B. Ceran, and J. Ye. A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques. The Sixteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2010).
- J. Chen, J. Liu, and J. Ye. Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks. The Sixteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2010).
- H. Liu, H. Motoda, R. Setiono, and Z. Zhao. Feature Selection: An Ever Evolving Frontier in Data Mining, Journal of Machine Learning Research, Workshop and Conference Proceedings Volume 10, 10:4-13, 2010.[PDF]
- L. Sun, J. Liu, J. Chen, and J. Ye. Efficient Recovery of Jointly Sparse Vectors. The Twenty-Third Annual Conference on Neural Information Processing Systems (NIPS 2009). [PDF]
- J. Liu, S. Ji, and J. Ye. Multi-task Feature Learning via Efficient L2,1-Norm Minimization. The Twenty-fifth Conference on Uncertainty in Artificial Intelligence (UAI 2009).[PDF]
- J. Liu, J. Chen, and J. Ye. Large-Scale Sparse Logistic Regression. The Fifteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2009), pp. 547-556.
- L. Sun, S. Ji, and J. Ye. A Least Squares Formulation for a Class of Generalized Eigenvalue Problems in Machine Learning. The Twenty-Sixth International Conference on Machine Learning (ICML 2009). [PDF]
- S. Ji and J. Ye. Linear Dimensionality Reduction for Multi-label Classification. The Twenty-first International Joint Conference on Artificial Intelligence (IJCAI 2009).[PDF]
- Z. Zhao, J. Wang, S. Sharma, N. Agarwal, H. Liu and Y. Chang. " A Knowledge-Oriented Framework for Gene Selection", Poster. Tuscon, Arizona, May 18-21. RECOMB'09
- Z. Zhao, L. Sun, S. Yu, H. Liu, J. Ye. "Multiclass Probabilistic Kernel Discriminant Analysis", IJCAI'09 [PDF]
- Z. Zhao, J. Wang, H. Liu, J. Ye, and Y. Chang.
"Identifying Biologically Relevant Genes via Multiple Heterogeneous
Data Sources", KDD'08: 839 - 847. [PDF]
- Z. Zhao and H. Liu. ``Spectral Feature Selection for
Supervised and Unsupervised Learning''. International Conference on Machine
Learning (ICML-07), June 20-24, 2007, Corvallis, Oregon. [PDF]
- Z. Zhao and H. Liu. ``Semi-supervised Feature Selection
via Spectral Analysis", SIAM International Conference on Data Mining
(SDM-07), April
26-28, 2007, Minneapolis, Minnesoda. [PDF]
- Z. Zhao and H. Liu. ``Searching for Interacting
Features", The 20th International Joint Conference on AI (IJCAI-07), January 6-12 Hyderabad,
India. [PDF].
Software
available.
- Jieping Ye. Least Squares Linear Discriminant
Analysis. The Twenty-Fourth International Conference on Machine
Learning (ICML
2007), pp. 1087-1093. Technical Report TR-06-003,
Department of Computer Science and Engineering, Arizona State University
, March, 2006. [PDF]
- Books or Chapters
- Z. Zhao and H. Liu., "Spectral Feature Selection for Data Mining
", December 2011, ISBN 978-1439862094, by
Chapman and Hall/CRC
- Huan Liu and Hiroshi Motoda, "Feature Selection for
Knowledge Discovery and Data Mining", July 1998, ISBN 0-7923-8198-X,
by Kluwer Academic
Publishers
- Huan Liu and Hiroshi Motoda, “Computational Methods of
Feature Selection”, editors, 2008, Chapman and Hall/CRC Press.
- H. Liu and Z. Zhao. "Manipulating Data and
Dimensionality Reduc-tion Methods: Feature Selection", in
Encyclopedia of Complexity and Systems Science, Robert Meyers (Ed.),
Springer. 2009.
- H. Liu. "Feature Selection: An Overview", in
Encyclopedia of Machine Learning, Claude Sammut (Ed.), Springer.
Forthcoming.
- Z. Zhao and H. Liu. "On Interacting Features in
Subset Selection", in Encyclopedia of Data Warehousing and Mining,
2nd Edition, Idea Group, Inc. pp 1079 -- 1084, September, 2008.
- Technical Reports
- Z. Zhao and H. Liu. ``Semi-supervised Feature Selection
via Spectral Analysis", Technical Report, TR-06-022,
Department of Computer Science and Engineering, Arizona State University,
Tempe, AZ 85287, 2006.
- Y. Ye, L. Yu, and H. Liu. ``Sparse Linear
Discriminant Analysis", Technical Report, TR-06-010, Department of
Computer Science and Engineering, Arizona State University, Tempe, AZ
85287, 2006.
- Thesis
- Z. Zhao. Spectral Feature Selection for Mining Ultrahigh Dimensional Data [PDF]
- Resources
Related Activities
Project Members
Acknowledgments
This project is sponsored by NSF (#0812551), 9/2008 -
8/2012.
Created on Oct 26, 2008.
Contact: Huan Liu via Email:
huan.liuATasu.edu.
Webmaster: Jiliang Tang, Email: Jiliang.TangATasu.edu
Last Upadted: Tuesday, May 22, 2012