Sharing Graphs using Differentially Private Graph Models
Alessandra Sala
Xiaohan Zhao
Christo Wilson
Haitao Zheng
Ben Y. Zhao
Proceedings of The 11th ACM SIGCOMM Internet Measurement Conference (IMC 2011)
[Full Text in PDF Format, 461KB]
Paper Abstract
Continuing success of research on social and computer networks requires open access to realistic measurement datasets. While these datasets can be shared, generally in the form of social or Internet graphs, doing so often risks exposing sensitive user data to the public. Unfortunately, current techniques to improve privacy on graphs only target specific attacks, and have been proven to be vulnerable against powerful de-anonymization attacks.
Our work seeks a solution to share meaningful graph datasets while
preserving privacy. We observe a clear tension between strength of
privacy protection and maintaining structural similarity to the
original graph. To navigate the tradeoff, we develop a
differentially-private graph model we call Pygmalion. Given a
graph G and a desired level of e-differential privacy
guarantee, Pygmalion extracts a graph's detailed structure into
degree correlation statistics, introduces noise into the resulting
dataset, and generates a synthetic graph G'. G' maintains as
much structural similarity to G as possible, while introducing
enough differences to provide the desired privacy guarantee. We
show that simply applying differential privacy to graphs results in
the addition of significant noise that may disrupt graph structure,
making it unsuitable for experimental study. Instead, we
introduce a partitioning approach that provides identical privacy
guarantees using much less noise. Applied to real graphs, this
technique requires an order of magnitude less noise for the same
privacy guarantees. Finally, we apply our graph model to Internet,
web, and Facebook social graphs, and show that it produces synthetic
graphs that closely match the originals in both graph structure
metrics and behavior in application-level tests.