The Data Mining and Bioinformatics Lab (DBL)

NSF grant IIS-0917149

Award Information
   Award Number: IIS-0917149
   Duration: 10/09 -- 9/12
   Title: Techniques for Integrated Analysis of Graphs with Applications to Cheminformatics and Bioinformatics

Project Summary

A number of scientific endeavors generate data that can be modeled as graphs: high-throughput biological experiments on protein interactions, high throughput screening of chemical compounds, social networks, ecological networks and food-webs, database schemas and ontologies. Access and analysis of the resulting annotated and probabilistic graphs are crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. This project aims to develop a set of scalable querying and mining tools for graph databases by integrating techniques from databases and data mining. The proposed research work is theoretical as well as empirical. New theoretical ideas and algorithms are being developed and these are being applied to the domains of Cheminformatics and Bioinformatics. The first research thrust examines primitives for graph data management and graph mining. A declarative query language for graphs is being investigated. This language is based on a formal language for graphs and a graph algebra, and separates the concerns of specification and implementation. Scalability of techniques for similarity search on graphs and mining for significant patterns is being investigated as a part of this thrust. The second research thrust applies the developed techniques to the domain of Cheminformatics. Specific tasks that are being examined are search for similar compounds, mining for significant motifs, diversity analysis, and analysis of macromolecular complexes. The final research thrust applies the developed methods to the domain of Bioinformatics. There has been an explosion of data of widely diverse biological data types, arising from genome-wide characterization of transcriptional profiles, protein-protein interactions, genomic structure, genetic phenotype, gene interactions, gene expression, proteomics, and other techniques. Techniques being developed can integrate and analyze data from multiple sources and models efficiently, while accelerating (interaction and function) prediction, and pathway discovery.

PI: Ambuj K Singh

Department of Computer Science
University of California
Santa Barbara, CA 93106
Phone: (805) 893-3236
Fax : (805) 893-8553
Email: ambuj@cs.ucsb.edu
URL: http://www.cs.ucsb.edu/~ambuj

Recently Supported Students
    Petko Bogdanov
    Kyle Chipman
    Kathy Macropol
    Sayan Ranu

Recent Publications
 

Sayan Ranu and Ambuj K. Singh, "Answering top-k queries over a mixture of attractive and repulsive dimensions" in PVLDB, 2012 (to appear).

S. Joshua Swamidass, Sayan Ranu, Bradley T. Calhoun, Ambuj K. Singh, "Probabilistic Substructure Mining from Small-Molecule Screens" in the 242nd American Chemical Society National Meeting. American Chemical Society, 2011.

Sayan Ranu, and Ambuj Singh, Tutorial titled "Topological Indexing and Mining of Chemical Compounds", ACM BCB, 2011.

Sayan Ranu, Bradley T. Calhoun, Ambuj K. Singh, S. Joshua Swamidass, "Probabilistic Substructure Mining from Small-Molecule Screens" in Molecular Informatics, 2011.

Sayan Ranu, and Ambuj Singh, Novel Method for Pharmacophore Analysis by Examining the Joint Pharmacophore Space, Journal of Chemical Information and Modeling, 2011.

Kyle Chipman and Ambuj K. Singh, Using Stochastic Causal Trees to Augment Bayesian Networks for Modeling eQTL Datasets, in BMC Bioinformatics, 2011.

Sayan Ranu, and Ambuj Singh, Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification, Journal of Chemical Information and Modeling, 2009.