Report ID
2011-10
Report Authors
Adam Lugowski, David Alber, Aydin Buluc, John Gilbert, Steve Reinhardt, Yun Teng, Andrew Waranis
Report Date
Abstract

The Knowledge Discovery Toolbox (KDT) enables domain experts to perform complex analyses of huge datasets on supercomputers using a high-level language without grappling with the difficulties of writing parallel code, calling parallel libraries, or becoming a graph expert. KDT provides a flexible Python interface to a small set of high-level graph operations; composing a few of these operations is often sufficient for a specific analysis. Scalability and performance are delivered by linking to a state-of-the-art back-end compute engine that scales from laptops to large HPC clusters. KDT delivers very competitive performance from a general-purpose, reusable library for graphs on the order of 10 billion edges and greater. We demonstrate speedup of 1 and 2 orders of magnitude over PBGL and Pegasus, respectively, on some tasks. Examples from simple use cases and key graph-analytic benchmarks illustrate the productivity and performance realized by KDT users. Semantic graph abstractions provide both flexibility and high performance for real-world use cases. Graph-algorithm researchers benefit from the ability to develop algorithms quickly using KDT's graph and underlying matrix abstractions for distributed memory. KDT is available as open-source code to foster experimentation.

Document
2011-10.pdf529.16 KB