The Data Mining and Bioinformatics Lab (DBL)

NSF grant IIS-0612327

Award Information
    Award Number: IIS-0612327
    Duration: 7/15/06 -- 6/30/2010
    Title: Scalable Querying and Mining of Graphs

Project Summary

A number of scientific endeavors are generating data that can be modeled as graphs: high-throughput biological experiments on protein interactions, high-throughput screening of chemical compounds, social networks, ecological networks and food webs, database schemas and ontologies. Mining and analysis of these annotated and probabilistic graphs is crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. The goal of this research project is to develop a set of scalable querying and mining tools for graph databases by integrating techniques from the fields of databases, bioinformatics, machine learning, and algorithms. New algorithms are being developed, and these are being examined for their quality and running time on real datasets. The first set of algorithms addresses subgraph and similarity querying in graph databases. The second set considers the mining of significant subgraphs or motifs. A novel significance model which transforms graphs into histograms of primitive components and examines the significance of motifs in the transformed domain is being developed. The third set of algorithms targets the discovery of well-connected clusters in large probabilistic graphs. The project integrates research and education by introducing the results of the research into undergraduate and graduate courses.

PI: Ambuj K Singh

Department of Computer Science
University of California
Santa Barbara, CA 93106
Phone: (805) 893-3236
Fax : (805) 893-8553
Email: ambuj@cs.ucsb.edu
URL: http://www.cs.ucsb.edu/~ambuj

Recently Supported Students
    Petko Bogdanov
    Kyle Chipman
    Kathy Macropol
    Sayan Ranu

Recent Publications
 

Sayan Ranu, and Ambuj Singh, Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification, Journal of Chemical Information and Modeling, 2009.

Kathy Macropol, Tolga Can, and Ambuj Singh, RRW: repeated random walks on genome-scale protein networks for local cluster discovery, BMC Bioinformatics, 10:283, 2009.

Kyle Chipman and Ambuj Singh, Predicting genetic interactions with random walks on biological networks, BMC Bioinformatics, 10:17, 2009.

Sayan Ranu and Ambuj Singh, GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases in 25th International Conference on Data Engineering (ICDE), 2009.

Petko Bogdanov and Ambuj K. Singh, "Function Prediction Using Neighborhood Patterns," in BioKDD, August, 2008.

Huahai He and Ambuj K. Singh, "Graphs-at-a-time: Query Language and Access Methods for Graph Databases," in SIGMOD, June, 2008.

Vebjorn Ljosa and Ambuj K. Singh, "Top-k Spatial Joins of Probabilistic Objects," in Proceedings of the 24th International Conference on Data Engineering (ICDE), April, 2008.


Undergraduate Students Supported through REU Supplement
    Carl Jantzen
    Evan Senter