III:Small: Transforming Feature Selection to Harness the Power of Social Media

Description

The growth of social media data in size and variety accelerates rapidly as more people use social media such as Facebook, Twitter, LinkedIn, among others. It is a massive ``treasure trove" interesting to researchers and practitioners of different disciplines, and a great source for data mining. However, attribute-value data in classic data mining differs from social media data besides both are large-scale. Social media data is noisy, incomplete, comprised of multiple sources, and embedded with multi-mode and multi-dimensional networks. Furthermore, its data points are inherently not independent and identically distributed (i.i.d.), but linked. These unique properties present unprecedented challenges for mining social media data. Existing feature selection algorithms that have been proven effective for data mining are unequipped for social media mining.

We propose a new kind of feature selection to facilitate the computational understanding of social media, investigating associated fundamental research issues and developing new, effective algorithms. We define the problem of feature selection with linked data and present a preliminary study to demonstrate how link information can be integrated into supervised feature selection for social media data. A prominent characteristic of social media is that its data comes from a range of multiple sources. As data of each source can be noisy, partial, or redundant, selecting relevant sources and using them together can help effective linked feature selection. We define types of sources and propose to study unsupervised feature selection by using source information. Unique complications with social media include its multi-mode and multi-dimensional networks. As one's network expands and his relationships become increasingly complex, the discerning need arises as the complications can confuse feature selection algorithms. We propose to develop new algorithms to enable the capability of exploiting multi-mode and multi-dimensional social media networks.

The project lies at the confluence of feature selection, social media analysis, and data mining. This makes it ideal for teaching data mining concepts and social media analysis, and providing students with a new context in which various computational components may fit together. Students are adept users of social media and have varied instincts to develop computational tools that can harness the power of social media. Hence, the proposed project can be used effectively in undergraduate and graduate courses as well as in student research projects. Students from under-represented groups will be involved in teaching and research activities. The impact of this work will also extend to understanding collective behavior in social media, employing social media for crisis response and disaster relief, and studying social and political movements with novel data mining means.

Publications

Books or Book Chapters

Suhang Wang, Jiliang Tang, and Huan Liu. ''Feature Selection'', Encyclopedia of Machine Learning and Data Mining, Forthcoming
Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. ``Social Media Mining: An Introduction", Cambridge University Press, ISBN: 9781107018853.
Salem Alelyani, Jiliang Tang and Huan Liu.``Feature Selection for Clustering: A Review'', in Data Clustering: Algorithms and Applications, Editor: Charu Aggarwal and Chandan Reddy, CRC Press, 2013.
Jiliang Tang, Salem Alelyani, and Huan Liu.``Feature Selection for Classification: A Review'', in Data Classification: Algorithms and Applications, Editor: Charu Aggarwal, CRC Press, 2013.

Tutorials

Jiliang Tang, Jie Tang and Huan Liu.``Recommendation in Social Media - Recent Advances and New Frontier'', 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD2014)
Jiliang Tang and Huan Liu. "Trust in Social Computing", The 23rd International World Wide Web Conference, April 7-11, 2014
Xia Hu and Huan Liu. "Mining Spammers in Social Media: Techniques and Applications", The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 13-16, 2014
Reza Zafarani, Mohammad Ali Abbasi, Huan Liu. "Social Media Mining: Fundamental Issues and Challenges", IEEE International Conference on Data Mining 2013, December 7-10, 2013. Dallas, TX.

Techinical Report

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, Huan Liu: Feature Selection: A Data Perspective. CoRR abs/1601.07996

Journal Papers

Jundong Li and Huan Liu. ``Challenges of Feature Selection for Big Data Analytics", Special Issue on Big Data, IEEE Intelligent Systems. Forthcoming.
Jiliang Tang, Yi Chang and Huan Liu.``Mining Social Media with Social Theories: A Survey'', SIGKDD Explorations
Jiliang Tang and Huan Liu.``An Unsupervised Feature Selection Framework for Social Media Data'', IEEE Transactions on Knowledge and Data Engineering, Forthcoming
Jiliang Tang and Huan Liu.``Feature Selection for Social Media Data'', ACM Transactions on Knowledge Discovery from Data, Forthcoming
Xinwang Liu, Lei Wang, Jian Zhang, Jianping Yin, and Huan Liu, ``Global and Local Structure Preservation for Feature Selection", IEEE Transactions on Neural Networks and Learning Systems. Forthcoming.

Conferences

Suhang Wang, Jiliang Tang, Fred Morstatter and Huan Liu. ``Paired Restricted Boltzmann Machine for Linked Data", the 25th ACM International Conference on Information and Knowledge Management, 2016.
Suhang Wang, Jiliang Tang, Charu Aggarwal and Huan Liu. ``Linked Document Embedding for Classification", the 25th ACM International Conference on Information and Knowledge Management, 2016.
Kewei Cheng, Jundong Li and Huan Liu. ``FeatureMiner: A Tool for Interactive Feature Selection", demo paper, the 25th ACM International Conference on Information and Knowledge Managemen, 2016
Ling Jian, Jundong Li, Kai Shu, and Huan Liu. ``Multi-Label Informed Feature Selection", the 25th International Joint Conference on Artificial Intelligence, 2016.
Jundong Li, Xia Hu, Liang Wu, and Huan Liu. ``Robust Unsupervised Feature Selection on Networked Data". SIAM International Conference on Data Mining, 2016.
Jundong Li, Jiliang Tang, Xia Hu, and Huan Liu. ``Unsupervised Streaming Feature Selection in Social Media'', the 24th ACM International Conference on Information and Knowledge Management, 2015.
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu Aggarwal, and Thomas Huang. ``Heterogeneous Network Embedding via Deep Architectures'', the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2015.
Suhang Wang, Jiliang Tang, Yilin Wang, and Huan Liu. ``Exploring Implicit Hierarchical Structures for Recommender Systems'', the 24th International Joint Conference on Artificial Intelligence, 2015.
Yilin Wang, Suhang Wang, Jiliang Tang, Huan Liu and Baoxin Li. ``Unsupervised Sentiment Analysis for Social Media Image'', the 24th International Joint Conference on Artificial Intelligence, 2015.
Jiliang Tang, Chikashi Nobata, Anlei Dong, Yi Chang and Huan Liu. ``Propagation-based Sentiment Analysis for Microblogging Data", the 15th SIAM International Conference on Data Mining, 2015.
Suhang Wang, Jiliang Tang, and Huan Liu. ``Embedded Unsupervised Feature Selection''. the AAAI Conference on Artificial Intelligence, 2015.
Jiliang Tang, Xia Hu, Huiji Gao, and Huan Liu. ``Discriminant Analysis for Unsupervised Feature Selection'', the 14th SIAM International Conference on Data Mining,2014.
Jiliang Tang, Xia Hu, Huiji Gao, and Huan Liu. ``Exploiting Local and Global Social Context for Recommendation'', the 23rd International Joint Conference on Artificial Intelligence, 2013
Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. ``Social Spammer Detection in Microblogging'', the 23rd International Joint Conference on Artificial Intelligence, 2013.
Jiliang Tang and Huan Liu. ``CoSelect: Feature Selection with Instance Selection for Social Media Data'', SIAM International Conference on Data Mining, 2013.
Jiliang Tang, Xia Hu, Huiji Gao and Huan Liu. ``Unsupervised Feature Selection for Multi-view Data in Social Media'', SIAM International Conference on Data Mining, 2013.
Salem Alelyani and Huan Liu. ``Supervised Low Rank Matrix Approximation for Stable Feature Selection". In the 11th International Conference on Machine Learning and Applications, 2012.
Jiliang Tang and Huan Liu. `` Unsupervised Feature Selection for Linked Social Media Data'', the Eighteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2012
Jiliang Tang and Huan Liu. ``Feature Selection with Linked Data in Social Media'', SIAM International Conference on Data Mining, 2012.

Thesis

Jiliang Tang. Computing Distrust in Social Media.
Salem Alelyani. On Feature Selection Stability - A Data Perspective.

Resources

Related Codes

Feature Selection Repository

Related Datasets

Related Activities

Project Members

Huan Liu (Professor)
Suhang Wang (PhD Student)
Jundong Li (PhD Student)
Kewei Cheng (Master Student)
Jiliang Tang (Graduated with PhD in 2015 and joined Yahoo Labs, Sunnyvalue, CA, USA as a Research Scientist)
Ling Jian (Visiting Professor Ling Jian, China University of Petroleum, Qingdao, Shandong, September, 2015 - August 2016)
Salem Alelyani (Graduated with PhD in 2012 and joined King Khalid University, Abha, Asir, Saudi Arabia as an Assistant Professor)

REU Funded Students

Grant Marshll (Undergraduate Student)
Daniel Howe (Undergraduate Student)

Undergraduate Students

Andrew Dudley
Daniel Baird (Research Preparation for FURI)
Kian Fakhri (Honors Thesis)

Acknowledgments

This project is suported by National Science Foundation under Grant No. IIS-1217466. Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Created by Huan Liu who can be reached at huan.liu at asu.edu.
Webmaster: Jiliang Tang, Email: Jiliang.TangATasu.edu

Last Upadted: September, 6th 2016