III:Small: Transforming Feature Selection to Harness the Power of Social Media


The growth of social media data in size and variety accelerates rapidly as more people use social media such as Facebook, Twitter, LinkedIn, among others. It is a massive ``treasure trove" interesting to researchers and practitioners of different disciplines, and a great source for data mining. However, attribute-value data in classic data mining differs from social media data besides both are large-scale. Social media data is noisy, incomplete, comprised of multiple sources, and embedded with multi-mode and multi-dimensional networks. Furthermore, its data points are inherently not independent and identically distributed (i.i.d.), but linked. These unique properties present unprecedented challenges for mining social media data. Existing feature selection algorithms that have been proven effective for data mining are unequipped for social media mining.

We propose a new kind of feature selection to facilitate the computational understanding of social media, investigating associated fundamental research issues and developing new, effective algorithms. We define the problem of feature selection with linked data and present a preliminary study to demonstrate how link information can be integrated into supervised feature selection for social media data. A prominent characteristic of social media is that its data comes from a range of multiple sources. As data of each source can be noisy, partial, or redundant, selecting relevant sources and using them together can help effective linked feature selection. We define types of sources and propose to study unsupervised feature selection by using source information. Unique complications with social media include its multi-mode and multi-dimensional networks. As one's network expands and his relationships become increasingly complex, the discerning need arises as the complications can confuse feature selection algorithms. We propose to develop new algorithms to enable the capability of exploiting multi-mode and multi-dimensional social media networks.

The project lies at the confluence of feature selection, social media analysis, and data mining. This makes it ideal for teaching data mining concepts and social media analysis, and providing students with a new context in which various computational components may fit together. Students are adept users of social media and have varied instincts to develop computational tools that can harness the power of social media. Hence, the proposed project can be used effectively in undergraduate and graduate courses as well as in student research projects. Students from under-represented groups will be involved in teaching and research activities. The impact of this work will also extend to understanding collective behavior in social media, employing social media for crisis response and disaster relief, and studying social and political movements with novel data mining means.


  • Books or Book Chapters
  • Tutorials
  • Techinical Report
  • Journal Papers
  • Conferences
  • Thesis
  • Related Links


    Related Activities

    Project Members

    REU Funded Students

    Undergraduate Students


    This project is suported by National Science Foundation under Grant No. IIS-1217466. Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

    Created by Huan Liu who can be reached at huan.liu at asu.edu.
    Webmaster: Jiliang Tang, Email: Jiliang.TangATasu.edu

    Last Upadted: September, 6th 2016