Graph Information System


PI
Xifeng Yan,  University of California at Santa Barbara
 
Project Summary
Publications

CAREER: Graph Information System: Deciphering Complex Networks, funded by NSF Career IIS-0954125.

Graduate Students: Nan Li (oDesk, now Apple), Arijit Khan (PostDoc, ETH)

Undergraduate Students: Bruce Liu (Pasadena Community College/UCI)

Project Summary

Graphs and networks are ubiquitous, encoding complex relationships ranging from chemical bonds to social interactions. Hidden in these networks are the answers to many important questions in biology, business, and sociology. In order to analyze complex networks, users have to master sophisticated computing and programming skills. It indeed becomes a pain point for many scientists and engineers.

This project is to change the state of the art by developing a general graph information system, which is able to address the needs of searching and mining complex networks. Real-life networks are complex, not only having topological structures, but also containing heterogeneous contents and attributes associated with nodes and edges. The mixture of structures and contents raises two challenges that require new solutions for smarter and faster graph analysis.  First, new types of graph search and mining operations, such as graph aggregation, graph association, and graph pattern mining, are emerging. Second, when graphs become complex and large, most of existing graph mining algorithms cannot scale well. This project addresses these challenges and performs a comprehensive study of a general graph information system. The proposed system includes three major components: complex graph search, graph pattern mining, and graph indexing. It covers emerging structure queries in social, biological, and information networks, new graph mining operators such as graph summarization and association, and innovative indexing methodologies, e.g., differential graph index.

This research is tightly integrated with education through student mentoring and curriculum development. Publications, software and course materials resulted from this project are disseminated on this website
.

Publications

  1. SLQ: A User-friendly Graph Querying System,
    by S. Yang, Y. Xie, Y. Wu, T. Wu, H. Sun, J. Wu, X. Yan,
    SIGMOD'14 (Proc. 2014 Int. Conf. on Management of Data) (demo paper), 2014. [pdf] [demo]
  2. Schemaless and Structureless Graph Querying,
    by S. Yang, Y. Wu, H. Sun, X. Yan,
    VLDB'14 (Proc. of the 40th Int. Conf. on Very Large Databases), 2014. [pdf]
  3. A Probabilistic Approach to Uncovering Attributed Graph Anomalies,
    by N. Li, H. Sun, K. Chipman, J. George, X. Yan,
    SDM'14
    (Proc. 2014 SIAM Int. Conf. on Data Mining), 2014. [pdf]
  4. Cloud Service Placement via Subgraph Matching,
    by B. Zong, R. Raghavendra, M. Srivatsa, X. Yan, A. Singh, and K.-W. Lee,
    ICDE'14 (
    Proc. 2014 Int. Conf. on Data Engineering), 2014 [pdf]
  5. Summarizing Answer Graphs Induced by Keyword Queries,
    by Y. Wu, S. Yang, M. Srivatsa, A. Iyengar, X. Yan,
    VLDB'14 (
    Proc. of the 40th Int. Conf. on Very Large Databases), 2014.[pdf]
  6. Noise-Resistant Bicluster Recognition,
    by H. Sun, G. Miao, X. Yan,
    ICDM'13 (Proc. 2013 IEEE Int. Conf. on Data Mining), Dec 2013. [pdf] [software release]
  7. Mining Evidences for Named Entity Disambiguation,
    by Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan,
    KDD'13 (Proc. of the 19th Int. Conf. on Knowledge Discovery and Data Mining), Aug 2013. [pdf]
  8. Memory Efficient Minimum Substring Partitioning,
    by Y. Li, P. Kamousi, F. Han, S. Yang, X. Yan, S. Suri,
    VLDB'13 (Proc. of the 39th Int. Conf. on Very Large Databases), Aug 2013. [pdf] [software release]
  9. NeMa: Fast Graph Search with Label Similarity,
    by A. Khan, Y. Wu, C. Aggarwal, X. Yan,
    VLDB'13 (Proc. of the 39th Int. Conf. on Very Large Databases ), Aug 2013. [pdf] [software release]
  10. Ontology-based Subgraph Querying,
    by Y. Wu, S. Yang, X. Yan,
    ICDE'13 (
    Proc. 2013 Int. Conf. on Data Engineering), Apr 2013. [pdf] [poster](Best Poster Award)
  11. Neighborhood Based Fast Graph Search in Large Networks,
    by A. Khan, N. Li, Z. Guan, X. Yan, S. Chakraborty, and S. Tao,
    SIGMOD'11 (Proc. 2011 Int. Conf. on Management of Data), June 2011  [pdf]
  12. Content-Aware Resolution Sequence Mining for Ticket Routing,
    by P. Sun, S. Tao, X. Yan, N. Anerousis, Y. Chen,
    BPM'10(The 8th Int. Conf. on Business Process Management),  Sep. 2010 [pdf]

Dissertations

2013 Nan Li, Ph.D., "Uncovering Anomalous Patterns in Large Attributed Graphs."
2013 Arijit Khan, Ph.D., "Towards Querying and Mining of Large-Scale Networks."