Blogs have many fast growing communities on the Internet. Discovering such communities in the blogosphere is important for sustaining and encouraging new blogger participation. We focus on extracting communities based on two key insights – (a) communities form due to individual blogger actions that are mutually observable; (b) semantics of the hyperlink structure are different from traditional web analysis problems. Our approach involves developing computational models for mutual awareness that incorporates the specific action type, frequency and time of occurrence. We use the mutual awareness feature with a rankingbased community extraction algorithm to discover communities. To validate our approach, four performance measures are used on the WWW2006 Blog Workshop dataset and the NEC focused blog dataset with excellent quantitative results. The extracted communities also demonstrate to be semantically cohesive with respect to their topics of interest.
Spam blogs (splogs) have become a major problem in the increasingly popular blogosphere. Splogs are detrimental in that they corrupt the quality of information retrieved and they waste tremendous network and storage resources. We study several research issues in splog detection. First, in comparison to web spam and email spam, we identify some unique characteristics of splog. Second, we propose a new online task that captures the unique characteristics of splog, in addition to tasks based on the traditional IR evaluation framework. The new task introduces a novel time-sensitive detection evaluation to indicate how quickly a detector can identify splogs. Third, we propose a splog detection algorithm that combines traditional content features with temporal and link regularity features that are unique to blogs. Finally, we develop an annotation tool to generate ground truth on a sampled subset of the TREC-Blog dataset. We conducted experiments on both offline (traditional splog detection) and our proposed online splog detection task. Experiments based on the annotated ground truth set show excellent results on both offline and online splog detection tasks.
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms and splogs corrupt blog search results as well as waste network resources. In our approach we exploit unique blog temporal dynamics to detect splogs. The key idea is that splogs exhibit high temporal regularity in content and post time, as well as consistent linking patterns. Temporal content regularity is detected using a novel autocorrelation of post content. Temporal structural regularity is determined using the entropy of the post time difference distribution, while the link regularity is computed using a HITS based hub score measure. Experiments based on the annotated ground truth on real world dataset show excellent results on splog detection tasks with 90% accuracy.
In this paper, we present a framework to analyze and summarize the temporal dynamics within personal blogs. Blog temporal dynamics are difficult to capture using a few class descriptors. Our approach comprises (1) a representation of blog dynamics using self-similarity matrices, (2) theme extraction using non-negative self-similarity matrix factorization, and (3) a visualization representing blog theme evolution. Summaries based on large real-world blog datasets reveals interesting temporal characteristics for four blog types - personal blog, cooperative blog, power blog and spam blogs.
This paper focuses on spam blog (splog) detection. Blogs are
highly popular, new media social communication mechanisms and splogs
degrade blog search results as well as waste network resources. In our
approach we exploit unique blog temporal dynamics to detect splogs.
There are three key ideas in our splog detection framework. We first
represent the blog temporal dynamics using self-similarity matrices
defined on the histogram intersection similarity measure of the time,
content, and link attributes of posts. Second, we show via a novel
visualization that the blog temporal characteristics reveal attribute
correlation, depending on type of the blog (normal blogs and splogs).
Based on these observations, we propose the use of temporal structural
properties computed from self-similarity matrices across different
attributes. In a splog detector, these novel features are combined with
content based features. We extract a content based feature vector from
different parts of the blog – URL’s, post content,
etc. The dimensionality of the feature vector is reduced by Fisher
linear discriminant analysis. We have tested an SVM based splog
detector using proposed features on real world datasets, with excellent
results (90% accuracy).
This paper addresses the problem of spam blog (splog)
detection using temporal and structural regularity of content, post
time and links. Splogs are undesirable blogs meant to attract search
engine traffic, used solely for promoting affiliate sites. Blogs
represent popular online media, and splogs not only degrade the quality
of search engine results, but also waste network resources. The splog
detection problem is made difficult due to lack of stable content
descriptors.
We have developed new technique for detecting splogs, based on the
observation that a blog is a dynamic, growing sequence of entries (or
posts) rather than a collection of individual pages. In our approach,
splogs are recognized by their temporal characteristics and content.
There are three key ideas in our splog detection framework. (a) We
represent the blog temporal dynamics using self-similarity matrices
defined on the histogram intersection similarity measure of the time,
content, and link attributes of posts. The self-similarity matrices
function as a generalized spectral analysis tool. It allows
investigation of the temporal changes within the post
sequence. (b) We study the blog temporal characteristics
based on a visual transformation derived from the self-similarity
measures. We show that the blog temporal characteristics reveal
correlation between attributes, depending on type of the blog (normal
blogs and splogs). (c) We propose two types of novel temporal features
to capture the splog temporal characteristics – regularity
features computed along the off-diagonals and the coherent blocks of
the self-similarity matrices on a single attribute, and joint features
computed from self-similarity matrices across different attributes. In
our splog detector, these novel features are combined with content
based features. We extract a content based feature vector from
different parts of the blog – URLs, post content, etc. The
dimensionality of the feature vector is reduced by Fisher linear
discriminant analysis. We have tested an SVM based splog detector using
proposed features on real world datasets, with appreciable results (90%
accuracy).
There are information needs involving costly decisions that cannot be efficiently satisfied through conventional web search engines. Alternately, community centric search can provide multiple viewpoints to facilitate decision making. We propose to discover and model the temporal dynamics of thematic communities based on mutual awareness, where the awareness arises due to observable blogger actions and the expansion of mutual awareness leads to community formation. Given a query, we construct a directed action graph that is time-dependent, and weighted with respect to the query. We model the process of mutual awareness expansion using a random walk process and extract communities based on the model. We propose an interaction space based representation to quantify community dynamics. Each community is represented as a vector in the interaction space and its evolution is determined by a novel interaction correlation method. We have conducted experiments with a real-world blog dataset and have promising results for detection as well as insightful results for community evolution.
We discover communities from social network data, and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as paper citation networks. Also, communities may evolve over time, due to changes to individuals’ roles and social status in the network as well as changes to individuals’ research interests. We present an innovative algorithm that deviates from the traditional two-step approach to analyze community evolutions. In the traditional approach, communities are first detected for each time slice, and then compared to determine correspondences. We argue that this approach is inappropriate in applications with noisy data. In this paper, we propose FacetNet for analyzing communities and their evolutions through a robust unified process. In this novel framework, communities not only generate evolutions, they also are regularized by the temporal smoothness of evolutions. As a result, this framework will discover communities that jointly maximize the fit to the observed data and the temporal evolution. Our approach relies on formulating the problem in terms of non-negative matrix factorization, where communities and their evolutions are factorized in a unified way. Then we develop an iterative algorithm, with proven low time complexity, which is guaranteed to converge to an optimal solution. We perform extensive experimental studies, on both synthetic datasets and real datasets, to demonstrate that our method discovers meaningful communities and provides additional insights not directly obtainable from traditional methods.
We present a framework for automatically summarizing social group activity over time. The problem is important in understanding large scale online social networks, which have diverse social interactions and exhibit temporal dynamics. In this work we construct summarization by extracting activity themes. We propose a novel unified temporal multi-graph framework for extracting activity themes over time. We use non-negative matrix factorization (NMF) approach to derive two interrelated latent spaces for users and concepts. Activity themes are extracted from the derived latent spaces to construct group activity summary. Experiments on real-world Flickr datasets demonstrate that our technique outperforms baseline algorithms such as LSI, and is additionally able to extract temporally representative activities to construct meaningful group activity summary.
This paper presents a novel social media summarization framework. Summarizing media created and shared in large scale online social networks unfolds challenging research problems. The networks exhibit heterogeneous social interactions and temporal dynamics. Our proposed framework relies on the co-presence of multiple important facets: who (users), what (concepts and media), how (actions) and when (time). First, we impose a syntactic structure of the social activity (relating users, media and concepts via specific actions) in our temporal multi-graph mining algorithm. Second, important activities along each facet are extracted as activity themes over time. Experiments on Flickr datasets demonstrate that our technique captures nontrivial evolution of media use in social networks.
We discover communities from social network data, and analyze the community evolution. These communities are inherent characteristics of human interaction in online social networks, as well as paper citation networks. Also, communities may evolve over time, due to changes to individuals' roles and social status in the network as well as changes to individuals' research interests. We present an innovative algorithm that deviates from the traditional two-step approach to analyze community evolutions. In the traditional approach, communities are ¯rst detected for each time slice, and then compared to determine correspondences. We argue that this approach is inappropriate in applications with noisy data. In this paper, we propose FacetNet for analyzing communities and their evolutions through a robust uni¯ed process. This novel framework will discover communities and capture their evolution with temporal smoothness given by historic community structures. Our approach relies on formulating the problem in terms of maximum a posteriori (MAP) estimation, where the community structure is estimated both by the observed networked data and by the prior distribution given by historic community structures. Then we develop an iterative algorithm, with proven low time complexity, which is guaranteed to converge to an optimal solution. We perform extensive experimental studies, on both synthetic datasets and real datasets, to demonstrate that our method discovers meaningful communities and providesadditional insights not directly obtainable from traditional methods.
Social media websites promote diverse user interaction on media objects as well as user actions with respect to other users. The goal of this work is to discover community structure in rich media social networks, and observe how it evolves over time, through analysis of multi-relational data. The problem is important in the enterprise domain where extracting emergent community structure on enterprise social media, can help in forming new collaborative teams, aid in expertise discovery, and guide long term enterprise reorganization. Our approach consists of three main parts: (1) a relational hypergraph model for modeling various social context and interactions; (2) a novel hypergraph factorization method for community extraction on multi-relational social data; (3) an on-line method to handle temporal evolution through incremental hypergraph factorization. Extensive experiments on real-world enterprise data suggest that our technique is scalable and can extract meaningful communities. To evaluate the quality of our mining results, we use our method to predict users’ future interests. Our prediction outperforms baseline methods (frequency counts, pLSA) by 36-250% on the average, indicating the utility of leveraging multi-relational social context by using our method.
This paper presents JAM (Joint Action Matrix Factorization), a novel framework to summarize social activity from rich media social networks. Summarizing social network activities requires an understanding of the relationships among concepts, users, and the context in which the concepts are used. Our work has three contributions: First, we propose a novel summarization method which extracts the co-evolution on multiple facets of social activity – who (users), what (concepts), how (actions) and when (time), and constructs a context rich summary called "activity theme". Second, we provide an efficient algorithm for mining activity themes over time. The algorithm extracts representative elements in each facet based on their co-occurrences with other facets through specific actions. Third, we propose new metrics for evaluating the summarization results based on the temporal and topological relationship among activity themes. Extensive experiments on real-world Flickr datasets demonstrate that our technique significantly outperforms several baseline algorithms. The results explore nontrivial evolution in Flickr photo-sharing communities.
In this paper we develop a recommendation framework to connect image content with communities in online social media. The problem is important because users are looking for useful feedback on their uploaded content, but finding the right community for feedback is challenging for the end user. Social media are characterized by both content and community. Hence, in our approach, we characterize images through three types of features: visual features, user generated text tags, and social interaction (user communication history in the form of comments). A recommendation framework based on learning a latent space representation of the groups is developed to recommend the most likely groups for a given image. The model was tested on a large corpus of Flickr images comprising 15,689 images. Our method outperforms the baseline method, with a mean precision 0.62 and mean recall 0.69. Importantly, we show that fusing image content, text tags with social interaction features outperforms the case of only using image content or tags.
Online social networking sites such as Flickr and Facebook provide a diverse range of functionalities that foster online communities to create and share media content. In particular, Flickr groups are increasingly used to aggregate and share photos about a wide array of topics or themes. Unlike photo repositories where images are typically organized with respect to static topics, the photo sharing process as in Flickr often results in complex time-evolving social and visual patterns. Characterizing such time-evolving patterns can enrich media exploring experience in a social media repository. In this paper, we propose a novel framework that characterizes distinct time-evolving patterns of group photo streams. We use a non-negative joint matrix factorization approach to incorporate image content features and contextual information, including associated tags, photo owners and post times. In our framework, we consider a group as a mixture of themes – each theme exhibits similar patterns of image content and context. The theme extraction is to best explain the observed image content features and associations with tags, users and times. Extensive experiments on a Flickr dataset suggest that our approach is able to extract meaningful evolutionary patterns from group photo streams. We evaluate our method through a tag prediction task. Our prediction results outperform baseline methods, which indicate the utility of our theme based joint analysis.
This paper aims at discovering community structure in rich media social networks, through analysis of time-varying, multi-relational data. Discovering latent community structure that represents the social context of user actions is important in social media information tasks such as media search and recommendation. There are several challenges: relational learning adaptable to different social media contexts, evolutionary characterization of communities in time-varying social networks, and analysis of multi-dimensional data. In this paper we propose MetaFac (MetaGraph Factorization), a framework that extracts community structures from various social contexts and interactions. Our work has three key contributions: (1) metagraph, a novel relational hypergraph representation for modeling multi-relational and multi-dimensional social data; (2) an efficient factorization method for community extraction on a given metagraph; (3) an on-line method to handle time-varying relations through incremental metagraph factorization. Extensive experiments on real-world social data collected from the Digg social network suggest that our technique is scalable and is able to extract meaningful communities based on the social media contexts. We illustrate the usefulness of our framework through prediction tasks. Our prediction significantly outperforms baseline methods (including aspect model, tensor analysis), indicating the utility of metagraphs for handling time-varying social relational contexts.
Transdisciplinary collaborations call for dynamic, responsive slide-ware presentations beyond the linear structure afforded by traditional tools. The NextSlidePlease application addresses this through a novel authoring and presentation interface. The application also features an innovative algorithm to enhance presentation time management. The cross-platform Java application is currently being evaluated in a variety of real-world presentation contexts.
In this video presentation, we introduce NextSlidePlease, a novel slide authoring and presentation application. The video begins with a dramatization illustrating the shortcomings of existing slide-ware tools identified through our prior research. We then describe our theoretical framework for addressing these identified problems and present a dramatization of the process by which our NextSlidePlease application can be used to overcome such issues in a business context. In addition, we illustrate the novel functional aspects of our application algorithm that enable effective time management and flexible presentations. Finally, we present promising results from two user studies.