CluE: Towards Scalable Primitives for Graph Operations
Many of today's data-intensive application domains, including searches on social networks and module searches in biological pathways, require complex queries on large graph datasets. Their highly connected nature means graph operations tend to "crawl" across many links, resulting in very large memory footprints that strain resources on today's commodity servers. In addition, current application platforms lack abstractions for graph operations, leaving developers to implement a variety of complex standalone graph query components. The PIs of the UCSB Massive Graphs In Clusters (MAGIC) project are developing a scalable infrastructure to provide high level abstractions for graph primitives, simplifying design of complex queries while addressing difficult challenges of maximizing data parallelism and adaptive graph partitioning across clusters.
Advances from this project will include data partitioning techniques, novel primitives for graph operations, and a software infrastructure that together expand the applicability of scalable cluster infrastructures such as MapReduce and Dryad to graph-oriented queries in social networks and biological graphs.