Sharing Graphs using Differentially Private Graph Models

Alessandra Sala
Xiaohan Zhao
Christo Wilson
Haitao Zheng
Ben Y. Zhao

Proceedings of The 11th ACM SIGCOMM Internet Measurement Conference (IMC 2011)

[Full Text in PDF Format, 461KB]

Paper Abstract

Continuing success of research on social and computer networks requires open access to realistic measurement datasets. While these datasets can be shared, generally in the form of social or Internet graphs, doing so often risks exposing sensitive user data to the public. Unfortunately, current techniques to improve privacy on graphs only target specific attacks, and have been proven to be vulnerable against powerful de-anonymization attacks.

Our work seeks a solution to share meaningful graph datasets while preserving privacy. We observe a clear tension between strength of privacy protection and maintaining structural similarity to the original graph. To navigate the tradeoff, we develop a differentially-private graph model we call Pygmalion. Given a graph G and a desired level of e-differential privacy guarantee, Pygmalion extracts a graph's detailed structure into degree correlation statistics, introduces noise into the resulting dataset, and generates a synthetic graph G'. G' maintains as much structural similarity to G as possible, while introducing enough differences to provide the desired privacy guarantee. We show that simply applying differential privacy to graphs results in the addition of significant noise that may disrupt graph structure, making it unsuitable for experimental study. Instead, we introduce a partitioning approach that provides identical privacy guarantees using much less noise. Applied to real graphs, this technique requires an order of magnitude less noise for the same privacy guarantees. Finally, we apply our graph model to Internet, web, and Facebook social graphs, and show that it produces synthetic graphs that closely match the originals in both graph structure metrics and behavior in application-level tests.