Mining time evolving networks
Time evolving networks arise in multiple domains. They can characterize traffic variations in transportation networks, information flow in communication networks, variation of trade rates in a network of trading agents or phases of pathway switching in gene interaction networks. Regardless of the rich representative power of time-evolving network in modeling processes in all the above domains, they have received limited attention from in the data mining and database communities.
One model of a time-varying network is based on fixed network structure (edges and nodes) and a time-varying characteristic of the edges. This scenario is applicable to modeling traffic in transportation, communication and information networks. We propose a method for mining a time-evolving network for heavy-edge sub-networks. In the transportation network scenario such sub-networks correspond to congested regions, while in a communication network they can correspond to links that are overloaded with information traffic.
Collaboration with: Misael Mogiovi
Discovering influential groups of agents in composite networks
We consider the problem of discovering effective collaboration cliques of agents in complex organizations in which agents are related based on one or multiple homophily principles, based on geographical, demographic or social commonalities. Independent of the aforementioned relations, agents are grouped in teams to perform a specific task. While there exists a pairwise method of assessing team performance as a whole for a given task, we assume there is no way of directly evaluating the performance of individuals as their success or ineffectiveness is dependent on their teammates. At task completion teams regroup and new tasks are assigned.
The above scenario arises in many real-world situations. For example, it can represent the dynamics of employees in a company working on team projects. The management forms teams for specific projects and evaluate each team's performance at project completion.
We propose a model of individual and group effectiveness, based on historical data and agent social and demographic ties. We take into account the performance of dynamic large groups for specific tasks and extrapolate individual performance, conditioned on group success. We discover core subgroups that drive team success and relate these to history of collaboration and proximity in the social graph.
Collaboration with: Benjamin Baumer (CUNY) and Prithwish Basu (BBN)
Interaction networks from collaborative content creation in Wikipedia
Collaborative systems such as Wikipedia produce high-quality content based on democratic principles of contribution. The high quality is due to a well-balanced self-regulated community of contributors. We propose a method to characterize and study the community structure of contributors of Wiki-based systems based on raw-content generation analysis. We leverage topic modelling in order to capture agreement and opposition of contributors and analyze these multi-modal relations to map communities in the contributor base.
The key steps of our approach include (i) modeling of pairwise variable-strength contributor interactions that can be both positive and negative, (ii) synthesis of a global network incorporating all pairwise interactions, and (iii) detection and analysis of community structure encoded in such networks. Analysis of the discovered community structure reveals coalitions of common-interest editors who back each other in promoting certain topics and collectively oppose other coalitions or single authors.
A different trust within this project addresses the problem of quality of content and its dependence on a timely convergence of opposing viewpoints to an unbiased state. One problem that hinders this process is unnoticed controversial content due to multiple succeeding contributions that move the focus away from it.
We develop a predictive model of editor behavior based on the content dynamics throughout the revision history of an article. A central component of our approach is the detection of the level of local agreement between a pair of contributors based on consecutive revisions. We employ these pairwise interactions (i) to estimate the level of controversy of a new contribution and (ii) to rank contributors based on their sensitivity to a specific new edit. Our ranking is based on the underlying content and the social structure of the editors encoded in a global contributor interaction network.
Collaboration with: Nicholas Larusso
Publications:
Petko Bogdanov, Nicholas D Larusso, Ambuj Singh, "Towards Community Discovery in Signed Collaborative Interaction Networks", Social Interaction Analysis and Service Providers (SIASP) at IEEE ICDM 2010
Petko Bogdanov, Nicholas D Larusso, Ambuj Singh, "Identifying Communities with Coherent and Opposing Views", Graduate Student Workshop in Computing (GSWC) UCSB, 2010
Scalable nearest neighbors in large and composite networks
Pairwise relations among entities and agents in networks arise from different sources and can be based on multiple features. For example, people may have accounts in several online social networks, targeted towards different interaction types. A single individual can be a user of Facebook to keep up with her friends, LinkedIn to organize and maintain her business contacts, and Last.fm to discuss musical trends with fellow listeners. This diverse social ambiance is complemented by an information layer in the form of email communication, generation and discussion of blog entries or shared photographs. Supporting exploratory and analysis tasks in such composite systems has to be flexible and scalable in order to reflect frequent network changes and user-centric prioritization of the different components.
In the context of composite networks, we address the problem of k-Nearest Neighbor search, according to a random walk proximity measure called Effective Importance. Our approach retrieves the exact top neighbors at query time without relying on off-line indexing or summaries of the entire network. This makes it suitable for very large dynamic networks, as well as for composite network overlays mixed at query time.
Publications:
Petko Bogdanov, Ambuj Singh, "Fast nearest neighbors in large and composite networks", Army Science Conference 2010
Gene function prediction (past)
A gene network models the interactions between genes in the cell. Not all genes in such networks are labeled with the function they perform. In this project, a two-phase approach was developed for the problem of function prediction - feature extraction from a partially labeled network coupled with classification. The approach is robust to the network structure and dominates existing approaches.
For details, please referto to my TCBB 2009 paper: available here.
A previous shorter verion appeared in BioKDD 2008: available here.
Graph indexing (past)
This project dealt with indexing of a collection of relatively small graphs. For details, see here
Motifs in gene networks (past)
This project aimed at mining for motifs in a single large graph. For details, see here