Beyond Feature Selection and
Extraction
- An Integrated Framework for
High-Dimensional Data of Small Labeled Samples
Description
High-dimensional data is ubiquitous in real-world applications - from text categorization,
to image processing, and to Web searches. The shortage of labeled data,
resulting from high labeling costs, necessitates the need to explore machine
learning approaches beyond classic classification and clustering paradigms.
Semi-supervised learning is one such approach that demonstrates its potential
in handling data with small labeled samples and reducing the need for expensive
labeled data. However, high-dimensional data with small labeled samples permits
too large a hypothesis space yet with too few constraints (labeled instances).
The combination of the two data characteristics manifests a new research
challenge. Employing computational and statistical learning theory, we analyze
specific challenges presented by such data, show preliminary studies, delineate
the need to integrate feature selection and extraction in a novel framework to
reduce hypothesis space, propose to design efficient and novel algorithms, and
conduct theoretical and empirical studies to understand complex relationships
between high-dimensional data and classification performance.
Publications
- Journal Articles
- Z. Zhao and H. Liu. "Multi-Source Feature Selection
via Geometry-Dependent Covariance Analysis", JMLR Workshop and
Conference Proceedings Volume 4: New challenges for feature selection in
data mining and knowledge discovery, 4:36-47, 2008
- Z. Zhao and H. Liu. "Searching for Interacting
Features in Subset Selection", Intelligent Data Analysis - An
International Journal, 13:207-228, 2009.
- M. Berens, H. Liu, L. Parsons, L. Yu, and Z. Zhao.
“Fostering Biological Relevance in Feature Selection for Microarray
Data”, Trends and Controversies, pdf, pp
71 - 73. November/December 2005, IEEE Intelligent Systems.
- H. Liu and L. Yu. "Toward Integrating Feature
Selection Algorithms for Classification and Clustering", IEEE
Trans. on Knowledge and Data Engineering, pdf,
17(4), 491-502, 2005.
- Jieping Ye, Jianhui Chen, Ravi Janardan, and Sudhir
Kumar. Developmental Stage Annotation of Drosophila Gene Expression
Pattern Images via an Entire Solution Path for LDA. ACM Transactions on Knowledge Discovery
from Data. special issue on Bioinformatics. Vol. 2, No. 1,
pp. 1-21, 2008. PDF
- Conferences
- Z. Zhao, J. Wang, S. Sharma, N. Agarwal, H. Liu and Y. Chang. " A Knowledge-Oriented Framework for Gene Selection", Poster. Tuscon, Arizona, May 18-21. RECOMB'09
- Z. Zhao, L. Sun, S. Yu, H. Liu, J. Ye. "Multiclass Probabilistic Kernel Discriminant Analysis", IJCAI'09
- Z. Zhao, J. Wang, H. Liu, J. Ye, and Y. Chang.
"Identifying Biologically Relevant Genes via Multiple Heterogeneous
Data Sources", KDD'08: 839 - 847. pdf.
- Z. Zhao and H. Liu. ``Spectral Feature Selection for
Supervised and Unsupervised Learning''. International Conference on Machine
Learning (ICML-07), June 20-24, 2007, Corvallis, Oregon. pdf.
- Z. Zhao and H. Liu. ``Semi-supervised Feature Selection
via Spectral Analysis", SIAM International Conference on Data Mining
(SDM-07), April
26-28, 2007, Minneapolis, Minnesoda. pdf.
- Z. Zhao and H. Liu. ``Searching for Interacting
Features", The 20th International Joint Conference on AI (IJCAI-07), January 6-12 Hyderabad,
India. pdf.
Software
available.
- Jieping Ye. Least Squares Linear Discriminant
Analysis. The Twenty-Fourth International Conference on Machine
Learning (ICML
2007), pp. 1087-1093. Technical Report TR-06-003,
Department of Computer Science and Engineering, Arizona State University
, March, 2006. PDF
- Books or Chapters
- Huan Liu and Hiroshi Motoda, "Feature Selection for
Knowledge Discovery and Data Mining", July 1998, ISBN 0-7923-8198-X,
by Kluwer Academic
Publishers
- Huan Liu and Hiroshi Motoda, “Computational Methods of
Feature Selection”, editors, 2008, Chapman and Hall/CRC Press.
- H. Liu and Z. Zhao. "Manipulating Data and
Dimensionality Reduc-tion Methods: Feature Selection", in
Encyclopedia of Complexity and Systems Science, Robert Meyers (Ed.),
Springer. Forthcoming.
- H. Liu. "Feature Selection: An Overview", in
Encyclopedia of Machine Learning, Claude Sammut (Ed.), Springer.
Forthcoming.
- Z. Zhao and H. Liu. "On Interacting Features in
Subset Selection", in Encyclopedia of Data Warehousing and Mining,
2nd Edition, Idea Group, Inc. pp 1079 -- 1084, September, 2008.
- Technical Reports
- Z. Zhao and H. Liu. ``Semi-supervised Feature Selection
via Spectral Analysis", Technical Report, TR-06-022,
Department of Computer Science and Engineering, Arizona State University,
Tempe, AZ 85287, 2006.
- Y. Ye, L. Yu, and H. Liu. ``Sparse Linear
Discriminant Analysis", Technical Report, TR-06-010, Department of
Computer Science and Engineering, Arizona State University, Tempe, AZ
85287, 2006.
Related Activities
Project Members
Acknowledgments
This project is sponsored by NSF (#0812551), 9/2008 -
8/2011.
Created on Oct 26, 2008.
Contact: Huan Liu via Email:
huan.liuATasu.edu.
Webmaster: Zheng Zhao, Email: zhaozhengATasu.edu
Last Upadted: Wednesday, June 3, 2009