Current Projects
-
My current research focuses on developing novel methods for querying and mining spatial and spatio-temporal data,
specifically problems in which data uncertainty is prevalent.
Towards Community Discovery in Signed Collaborative Interaction Networks
-
Collaboration is central to the online world of today. The popularity of diverse
collaboration-centered applications has led to the formation of unique and productive
ecosystems, comprised of individuals who collectively annotate maps, organize photos,
accumulate and author encyclopedic knowledge and even build software systems. Understanding
the community structure of such systems is a necessary step for reducing administrative
overhead and redirecting contributor efforts to new content generation, as well as for
assessing the quality and objectiveness of the final product. Analysis of how collaborators
interact can also lead to insights into the formation and architecture of ad-hoc online
societies held together by the content generation as an ultimate activity goal and lacking
any explicit hierarchical structure.
We propose a framework for discovery of collaborative community structure in Wiki-based
knowledge repositories based on raw-content generation analysis. We leverage topic modelling
in order to capture agreement and opposition of contributors and analyze these multi-modal
relations to map communities in the contributor base. The key steps of our approach include
(i) modeling of pairwise variable-strength contributor interactions that can be both positive
and negative, (ii) synthesis of a global network incorporating all pairwise interactions, and
(iii) detection and analysis of community structure encoded in such networks.
The global community discovery algorithm we propose outperforms existing alternatives in
identifying coherent clusters according to objective optimality criteria. Analysis of the
discovered community structure reveals coalitions of common-interest editors who back
each other in promoting some topics and collectively oppose other coalitions or single
authors. We couple contributor interactions with content evolution and reveal the global
picture of opposing themes within the self-regulated community base for both controversial
and featured articles in Wikipedia.
Summarizing Probabilistic Data
-
Many real world applications produce data with uncertainties. As
such, it is important to provide scalable methods capable of properly
managing this uncertain data. In this paper, we address the
problem of building a space constrained synopsis over a probabilistic
dataset where tuples are defined over a continuous domain. The
primary goal of this work is to aid in exploratory tasks by providing
quick approximate query results and statistical analysis with
error bounds. Our approach differs from other summarization techniques
in that we retain the shape of the probability distribution for
each tuple. This provides us with a great deal of versatility in that
we can approximately answer queries over uncertain datasets using
our synopsis. In fact, given the proper query execution engine,
our synopsis can be used to answer any query, the limiting factor
being the error bounds we are capable of providing. We use
minimax polynomials, polynomials which minimize the L1 error,
to approximate the probability distribution of each tuple and introduce
efficient methods to provide further space reduction while still
bounding the error of our synopsis.
Opsin protein homology
-
This is a project I worked on to determine if type-I and type-II opsins
were homologous (came from the same origin). Here is the software we
wrote to compare the proteins.
3-d 4x4x4 Tic-Tac-Toe
-
A minimax implementation for a 3D version of tic-tac-toe on a 4x4x4 board.
Spades AI
-
A (somewhat) intelligent Spades agent we made for cs265.
It uses a fairly simple rule-base to decide which card to play.
Feel free to give it a whirl, or to download the source.
|