Huahai He

Ph.D.

Department of Computer Science
University of California, Santa Barbara
Goleta, CA 93106-5110

Email: huahai at cs.ucsb.edu


I received my PhD in Computer Science from the University of California, Santa Barbara in 2007. My advisor was Ambuj K. Singh. I joined Google in Oct 2007.

DBL lab


Education


Research

My research interests are in Graph Databases, Graph Mining, and Bioinformatics.

Graphs have become popular for modeling complex data: chemical compounds, protein structures, protein interaction networks, database schemas, social networks, Internet, multimedia, etc. As a result, graph queries are becoming common and graph indexing plays an essential role in query processing. We have developed C-Tree,  a novel index structure for graph queries. C-Tree organizes graphs hierarchically where each node summarizes its descendants by a graph closure, a structural bounding box of the constituent graphs. C-Tree can efficiently support subgraph queries and similarity queries. Subgraph queries find graphs that contain a specific structural pattern, whereas similarity queries find graphs similar to a given graph. For subgraph queries, we develop an approximation algorithm called pseudo subgraph isomorphism for subgraph isomorphism tests. For similarity queries, we measure graph similarity through graph edit distance using heuristic graph mapping methods. [pdf]

Mining of recurring subgraphs is useful for understanding the intrinsic characteristics of a graph dataset. Existing graph mining techniques aim to find subgraphs above a frequency threshold. However, not all frequent subgraphs are statistically significant, and there is no way to rank the frequent subgraphs. We propose GraphRank, a technique that evaluates and ranks frequent subgraphs by their statistical significance (p-values). GraphRank models the probability distribution of frequency of a subgraph in the feature space so that efficient computational methods can be derived. As graphs are represented as feature vectors, we also address the problem of feature vector mining, and develop an algorithm that finds significant sub-vectors. GraphRank has discovered practically significant subgraphs in the real data. [pdf]


Courses


Teaching Assistant


Publications

  1. Huahai He and Ambuj K. Singh, "Closure-Tree: An Index Structure for Graph Queries",
    In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), Atlanta, 2006. [pdf, slides]
  2. Huahai He and Ambuj K. Singh, "GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space",
    In Proceedings of the 2006 International Conference on Data Mining (ICDM'06), Hong Kong, 2006. [short, long, slides]
  3. Huahai He and Ambuj K. Singh, "Efficient Algorithms for Mining Significant Substructures in Graphs with Quality Guarantees",
    In Proceedings of the 2007 International Conference on Data Mining (ICDM'07), Omaha NE, 2007. [pdf, slides]
  4. Huahai He and Ambuj K. Singh, "Graphs-at-a-time: Query Language and Access Methods for Graph Databases",
    In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'08), Vancouver, Canada, 2008. [pdf, slides]

Skills

Java, C/C++, C#, Perl, SQL, JSP, JavaCC, XML