The Data Mining and Bioinformatics, Lab (DBL)

About the lab

People

The laboratory of distributed systems, databases, and bioinformatics is headed by Prof. Ambuj Singh. There are currently six graduate students in the research group. For more information on the group members, please visit our People page.

Our research focuses on image informatics and scalable querying and mining of graphs. A brief summary follows. Please refer to the Projects page for the details.

Graph querying and mining

A number of scientific endeavors are generating data that can be modeled as graphs: high-throughput biological experiments on protein interactions, high-throughput screening of chemical compounds, social networks, ecological networks and food webs, database schemas and ontologies. Mining and analysis of these annotated and probabilistic graphs is crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. The goal of this research project is to develop a set of scalable querying and mining tools for graph databases by integrating techniques from the fields of databases, bioinformatics, machine learning, and algorithms.

Image informatics

Increasing amounts of imaging data are being generated in multiple scientific areas such as biology, environmental monitoring, and manufacturing. Such data contains invaluable spatio-temporal information, the extraction of which requires content-based search using sensitive distance measures. The scientific challenges are in developing techniques that ensure a good quality of results and index structures that can scale to millions of images. Closely coupled with simple querying is the task of mining the images for frequent and significant spatio-temporal patterns. Finally, much of the extracted information from images is rife with uncertainty: variability in measurements, spatial distributions, aggregate measurements, and missing and predicted values. We are developing methods and systems for managing, querying, and mining such uncertain information.

Protein Network Synthesis and Analysis

Genome-level protein interaction networks can be constructed from integration of high-throughput sources (microarrays, RNAi, bioimages) with genomics and literature data. These networks are inherently probabilistic and understanding them can provide new insights into biology. A systems level understanding of the signaling pathways and networks in a disease model allows one to evaluate the consequences of modulating activity, expression levels, or post-translational modification of a potential drug discovery target. Understanding protein interactions within a pathway and interaction between pathways permits selection of a target which, when modulated, addresses disease condition with minimum impact on other physiological processes. Systems level information about protein-protein interactions provides novel opportunities for drug discovery by expanding knowledge of protein function, while generating a large new class of potential targets.