Award Information
Award Number: IIS-0917149
Duration: 10/09 -- 9/12
Title: Techniques for Integrated Analysis of Graphs with Applications to Cheminformatics and Bioinformatics
Project Summary
A number of scientific endeavors generate data that can be modeled as graphs: high-throughput biological experiments on protein interactions, high throughput screening of chemical compounds, social networks, ecological networks and food-webs, database schemas and ontologies. Access and analysis of the resulting annotated and probabilistic graphs are crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. This project aims to develop a set of scalable querying and mining tools for graph databases by integrating techniques from databases and data mining. The proposed research work is theoretical as well as empirical. New theoretical ideas and algorithms are being developed and these are being applied to the domains of Cheminformatics and Bioinformatics. The first research thrust examines primitives for graph data management and graph mining. A declarative query language for graphs is being investigated. This language is based on a formal language for graphs and a graph algebra, and separates the concerns of specification and implementation. Scalability of techniques for similarity search on graphs and mining for significant patterns is being investigated as a part of this thrust. The second research thrust applies the developed techniques to the domain of Cheminformatics. Specific tasks that are being examined are search for similar compounds, mining for significant motifs, diversity analysis, and analysis of macromolecular complexes. The final research thrust applies the developed methods to the domain of Bioinformatics. There has been an explosion of data of widely diverse biological data types, arising from genome-wide characterization of transcriptional profiles, protein-protein interactions, genomic structure, genetic phenotype, gene interactions, gene expression, proteomics, and other techniques. Techniques being developed can integrate and analyze data from multiple sources and models efficiently, while accelerating (interaction and function) prediction, and pathway discovery.
PI: Ambuj K Singh
Department of Computer Science
University of California
Santa Barbara, CA 93106
Phone: (805) 893-3236
Fax : (805) 893-8553
Email: ambuj@cs.ucsb.edu
URL: http://www.cs.ucsb.edu/~ambuj
Recently Supported Students
Petko Bogdanov
Kyle Chipman
Kathy Macropol
Sayan Ranu
Recent Publications
Sayan Ranu, and Ambuj Singh, Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification, Journal of Chemical Information and Modeling, 2009.