|
|
|
|
Ambuj K Singh
Department of Biomolecular Science and Engineering Email: ambuj@cs.ucsb.edu Office phone: (805)-893-3236 Lab phone: (805)-893-4276
|
|
Teaching |
Indexing in Multimedia Databases (Graduate)
Bioinformatics (Graduate-Undergraduate)
Programming
Languages (CS 162)
|
Research |
My research interests are broadly in the areas of bioimage
informatics, graph querying and mining, sensor networks, and searching high-dimensional
data (recent papers).
|
Education |
|
Bio-image Informatics |
Information technology
research has played a significant role in the genomics revolution over the past
decade, from aiding with large-scale sequence assembly to automating gene
identification to efficiently searching databases by sequence similarity.
The tremendous amount of information gathered from genomics will be dwarfed in
the next decade by the knowledge to be gained from comprehensive, systematic
studies of the properties and behaviors of all proteins and other biomolecules. High resolution imaging of molecules and
cells will be critical for understanding complex systems such as the nervous
system, whether it be for the localization of specific neuron types within a
region of the central nervous system, the branching pattern of dendritic trees, or the localization of molecules at the subcellular level. Further, knowing how these distribution
patterns and subcellular locations change as a
function of time is critical to understanding how cells respond to stress,
injury, aging and disease.
The Center for Bio-Image Informatics brings together Biologists, Computer Scientists and Engineers from UCSB, Berkeley, and CMU to participate in interdisciplinary research that will not only advance the state of the art in imaging, pattern recognition, and data mining, but also will result in a better understanding of complex biological processes at the cellular and sub-cellular level. During the five-year duration of this project, we expect to develop, test, and deploy a unique fully operational distributed digital library of bio-molecular image data accessible to researchers around the world. Such searchable databases will make it possible to optimally understand and interpret the data, leading to a more complete and integrated understanding of cellular structure, function and regulation.
|
Scalable Querying and Mining of Graphs |
A number of scientific endeavors are generating data that can be modeled as graphs: high-throughput biological experiments on protein interactions, high-throughput screening of chemical compounds, social networks, ecological networks and food webs, database schemas and ontologies. Mining and analysis of these annotated and probabilistic graphs is crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. The goal of this research project is to develop a set of scalable querying and mining tools for graph databases by integrating techniques from the fields of databases, bioinformatics, machine learning, and algorithms. New algorithms are being developed, and these are being examined for their quality and running time on real datasets. The first set of algorithms addresses subgraph and similarity querying in graph databases. The second set considers the mining of significant subgraphs or motifs. A novel significance model that transforms graphs into histograms of primitive components and examines the significance of motifs in the transformed domain is being developed. The third set of algorithms targets the discovery of well-connected clusters in large probabilistic graphs.
Large-scale sensor networks are being deployed
for applications such as habitat monitoring, seismic monitoring, and location
tracking systems. Although sensors in these networks provide an unprecedented
scale of access for monitoring phenomena, their limited resources (in terms of
processing speed, communication power, and memory) render conventional data
management tools and techniques ineffective. We are developing novel
distributed techniques for mining and summarizing spatio-temporal
data in resource constrained sensor networks. In the DIST project, we designed
an index-structure to track moving objects at various spatio-temporal
resolutions. The DIST system can be used to answer range queries and is
scalable with respect to update, storage and query costs. The next project, ELink, was designed to capture spatio-temporal
correlations in sensor data. Data is compressed in the temporal dimension
locally at the sensor node using AR models, and then sensors with similar
models are spatially clustered using in-network algorithms. In the third
project, we transform raw data into symbolic models, and hierarchically
aggregate two or more constituent models into a single composite model.
|
Mining and Searching in High Dimensional Spaces |
Large-scale data-intensive systems require new
methods for accessing and processing large data volumes. We are developing
index structures for efficiently accessing data embedded in high-dimensional
spaces such as time-series data, images and videos, and string data. We are
also investigating the problem of statistics and aggregate maintenance over
data streams that are useful in telecommunications network monitoring,
trend-related analysis, web-click streams, stock tickers, and other
time-variant data.
|
Students |
|
|
|
|
Other Information |